Data science & AI · Reference

What is feature engineering?

Feature engineering is the process of selecting, creating, and transforming the input variables used by a machine-learning model, with the aim of representing the data in a way that makes patterns easier for the model to learn.

Why features matter

A machine-learning model can only learn from the features it is given. How the data is represented therefore strongly shapes what the model can find: a pattern that is hard to learn from raw inputs may become obvious once the right derived variable is added. Feature engineering is the craft of constructing this representation — choosing which variables to include, deriving new ones, and transforming existing ones. It is often said that better features beat better algorithms, because no model can recover information that the chosen features fail to express.

Common techniques

Typical operations include creating new variables (for example, deriving "day of week" from a date), encoding categorical values into numbers a model can use, scaling or normalising numerical features so they are comparable, and handling missing values sensibly.

Other steps include combining or decomposing variables, binning continuous values, and reducing dimensionality. The right transformations depend on the data and the model: some models need carefully scaled inputs, while others are insensitive to scaling.

Domain knowledge and automation

Effective feature engineering draws heavily on domain knowledge — understanding what the variables mean and which combinations are likely to matter. This human insight is one reason it can be more impactful than model tuning. Deep learning shifts some of this burden by learning features automatically from raw data (representation learning), which has reduced manual feature engineering for images and text. For structured, tabular data, however, hand-crafted features often remain decisive.

Feature engineering in research

In research, feature engineering is a key methodological step that must be reported for results to be reproducible, since the features chosen are part of the model. A critical pitfall is data leakage: deriving features using information that would not be available at prediction time, or using the whole dataset (including the test set) when transforming features, inflates performance misleadingly. Feature transformations should be fitted only on training data and applied consistently, ideally within cross-validation.

Key facts

At a glance

Definition: selecting, creating and transforming model inputs
Goal: represent data so patterns are easier to learn
Techniques: deriving, encoding, scaling, handling missing values
Relies on: domain knowledge
Deep learning automates some of it (representation learning)
Key pitfall: data leakage from improper feature creation

Common questions

FAQ

Why is feature engineering important?+

A model can only learn from the features it is given, so how the data is represented strongly affects performance. Well-chosen and well-transformed features can improve results more than changing the algorithm, which is why feature engineering is often decisive.

What are common feature-engineering techniques?+

Common steps include creating new variables from existing ones, encoding categorical values numerically, scaling or normalising numbers, handling missing data, binning continuous values, and reducing dimensionality. The right choices depend on the data and the model used.

Does deep learning remove the need for feature engineering?+

Partly. Deep learning learns features automatically from raw data, reducing manual feature engineering for images, audio, and text. For structured, tabular data, however, hand-crafted features often still make a substantial difference.

Going deeper

Related on CASRAI

Sources

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.