Data science & AI · Reference

What is a decision tree?

A decision tree is a machine-learning model that makes predictions by following a branching sequence of simple tests on the input features, splitting the data into ever more specific groups until it reaches a decision.

How a decision tree makes a prediction

A decision tree is read from the top down. The root node tests one feature — for example, "is the value above a threshold?" — and the answer sends the example down one branch or another. Each subsequent internal node applies another test, partitioning the data into smaller groups, until the example reaches a leaf, which gives the prediction. For classification the leaf holds a class; for regression it holds a value. Because the prediction is just the chain of tests followed, the model is easy to follow and explain.

How a tree is built

A tree is grown by repeatedly choosing the feature and split that best separate the data, according to a measure of "purity" such as Gini impurity or information gain for classification, or variance reduction for regression.

The process is greedy: at each node it picks the locally best split without reconsidering earlier ones. Splitting continues until a stopping rule is met. Left unchecked, a tree can grow until it perfectly fits the training data — which leads to overfitting.

Strengths and weaknesses

Decision trees are prized for interpretability: their logic can be inspected and explained, which matters where decisions must be justified. They handle numerical and categorical features, need little data preparation, and make no assumption about the data's distribution. Their main weakness is instability: a single deep tree easily overfits and can change markedly with small changes in the data. This is why trees are often combined into ensembles such as random forests, which trade some interpretability for much greater accuracy and robustness.

Decision trees in research

In research, decision trees are useful where an interpretable, rule-based model is needed and the reasoning behind a prediction must be transparent — for instance, in settings requiring explainable decisions. Their tendency to overfit must be managed through pruning, depth limits, or cross-validation. When raw predictive accuracy matters more than interpretability, tree ensembles are usually preferred, and the trade-off between the two is a deliberate methodological choice.

Key facts

At a glance

Definition: tree of feature tests leading to a prediction
Structure: root, internal nodes, branches, leaves
Tasks: classification and regression
Split criteria: Gini impurity, information gain, variance reduction
Key strength: interpretable, rule-based
Key weakness: prone to overfitting; unstable

Common questions

FAQ

How does a decision tree work?+

It applies a sequence of tests on the input features, starting at the root and branching at each internal node, until it reaches a leaf that gives the prediction. The path of tests followed is the model's reasoning, which makes it easy to interpret.

Why do decision trees overfit?+

A tree can keep splitting until each leaf matches a few training examples exactly, fitting noise rather than the general pattern. This is controlled by pruning, limiting depth, or requiring a minimum number of examples per leaf.

What is the difference between a decision tree and a random forest?+

A decision tree is a single model; a random forest is an ensemble of many decision trees whose predictions are combined. The forest is usually more accurate and stable, at the cost of the single tree's interpretability.

Going deeper

Related on CASRAI

Sources

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.