Editorial · CASRAI · AI and ML research outputs

Overfitting and Underfitting in Machine Learning Explained

Overfitting happens when a model memorises training data instead of learning general patterns, while underfitting means it is too simple to capture them. This guide explains the bias-variance trade-off, regularisation, cross-validation and the threat to reproducible ML.

ByCASRAI Editorial Board

Published 19 Jun 2026· 4 minute read

Overfitting occurs when a machine learning model learns the noise and quirks of its training data so closely that it performs well on that data but poorly on new, unseen data. Its opposite, underfitting, occurs when a model is too simple to capture the underlying pattern and performs poorly even on the training data. Balancing these two failure modes is one of the central challenges of building reliable, reproducible models.

The bias-variance trade-off

Underfitting and overfitting are two sides of the bias-variance trade-off. Bias is error from overly simplistic assumptions; a high-bias model misses real structure and underfits. Variance is error from excessive sensitivity to the training sample; a high-variance model chases noise and overfits. As you make a model more flexible, bias falls but variance rises. The art is to find the sweet spot where total error, the sum of both, is lowest. A model that generalises well sits between the extremes.

Aspect	Underfitting	Good fit	Overfitting
Model complexity	Too low	Appropriate	Too high
Bias	High	Balanced	Low
Variance	Low	Balanced	High
Training accuracy	Poor	Good	Excellent
Test accuracy	Poor	Good	Poor

The tell-tale sign of overfitting is a large gap between strong training performance and weak test performance. Underfitting shows up as poor performance on both.

Train, validation and test splits

Diagnosing these problems requires holding data back. The convention is a three-way split: the training set fits the model, the validation set tunes choices such as model complexity and stopping point, and the test set is touched only once, at the end, to estimate real-world performance. Evaluating on data the model trained on always flatters it and hides overfitting. Keeping the test set genuinely untouched is fundamental to honest evaluation, a point we stress across our AI and ML research outputs coverage.

Regularisation: penalising complexity

Regularisation discourages a model from becoming too complex by adding a penalty for large or numerous parameters. L1 (lasso) regularisation can shrink some weights to zero, effectively performing feature selection. L2 (ridge) regularisation shrinks weights smoothly towards zero without eliminating them. In neural networks, techniques such as dropout, which randomly disables units during training, and early stopping, which halts training before the model starts memorising, serve the same goal. Each nudges the model towards simpler, more generalisable solutions.

Cross-validation: a more robust check

A single train-validation split can be lucky or unlucky. Cross-validation guards against this by rotating the validation role across the data. In k-fold cross-validation, the data is divided into k parts; the model trains on k-1 parts and validates on the remaining one, repeating until every part has served as validation once. Averaging the results gives a more stable estimate of how the model will generalise, and a smaller chance of being fooled by a single fortunate split. To learn how these ideas fit into the wider discipline, see what is machine learning.

Why this threatens reproducible ML

Overfitting is a leading cause of results that fail to replicate. A model tuned too tightly to one dataset, or evaluated with leakage between training and test data, can report impressive accuracy that collapses when applied elsewhere. Honest splits, regularisation, cross-validation and full reporting of hyperparameters are the defences. We discuss these safeguards in depth in reproducibility of machine learning research, and the consistent terminology to describe them lives in the CASRAI dictionary. As with classical statistics, adequate data matters: too few examples make overfitting almost inevitable, echoing the concerns in our guide to sample size and statistical power.

Frequently asked questions

How can I tell if my model is overfitting?

Compare training and test performance. A model that scores very high on training data but noticeably worse on held-out test data is overfitting. If it performs poorly on both, it is underfitting.

What is the simplest way to reduce overfitting?

Gather more representative data, simplify the model, and apply regularisation or early stopping. Cross-validation helps you confirm that your fix genuinely improves generalisation rather than just luck.

What is the bias-variance trade-off in one sentence?

It is the tension between a model being too simple to capture the pattern (high bias, underfitting) and too flexible so it captures noise (high variance, overfitting), with the best model balancing the two.

Why does overfitting harm reproducibility?

An overfitted model reports performance specific to one dataset that does not carry over to new data, so its results fail to replicate. Honest data splits and transparent reporting, as described in our guidance for authors, are the remedy.

Related editorial in this domain

More on AI and ML research outputs

20 Jun 2026

Quantum Computing: Principles and Research Implications

Quantum computing uses qubits, superposition and entanglement to process information in ways classical computers cannot. This explainer defines the principles, situates the NISQ era, and assesses realistic research implications without overstating present-day capabilities.

19 Jun 2026

Supervised vs Unsupervised Learning Explained

Supervised learning trains on labelled data to predict outcomes; unsupervised learning finds structure in unlabelled data. This explainer defines both paradigms, notes reinforcement learning as a third, and compares their tasks, methods and research uses.

19 Jun 2026

Neural Networks and Deep Learning Explained

A neural network is a machine-learning model of interconnected units that transform inputs through weighted layers. This explainer covers neurons, weights and activations, training by backpropagation and gradient descent, and what makes a network “deep”.