Data science & AI · Reference

What is supervised learning?

Supervised learning is the branch of machine learning in which a model is trained on labelled examples, learning a mapping from inputs to known outputs so it can predict the correct label for new, unseen data.

Learning from labelled examples

In supervised learning, each training example is a pair: an input (the features) and the correct output (the label) supplied by a human or an existing source of truth. The algorithm searches for a function that reproduces these known answers and, crucially, extends them to inputs it has not seen. The presence of labels is what makes the method "supervised": the correct answers act as a teacher during training. This contrasts with unsupervised learning, which works with no labels at all.

Classification and regression

Supervised tasks divide into two kinds by the type of output. Classification predicts a discrete category — for instance, whether an email is spam, or which species a specimen belongs to. Regression predicts a continuous quantity, such as a temperature or a price.

Many algorithms address both, including decision trees, random forests, support vector machines, and neural networks. The choice depends on the data, the need for interpretability, and how the model is to be evaluated.

Training, generalisation and overfitting

A supervised model is fitted by minimising a measure of error between its predictions and the known labels on the training set, typically using gradient descent or another optimiser. The goal is not to memorise the training data but to generalise to new examples. A model that fits the training labels too closely, capturing noise rather than signal, suffers from overfitting; this is detected by evaluating on a held-out test set and controlled with techniques such as cross-validation and regularisation.

Supervised learning in research

Supervised learning is widely used to build predictive models across the sciences. Its reliability rests on the quality and representativeness of the labels: biased, noisy, or unrepresentative labelled data produces biased models. Sound practice separates training, validation, and test sets, guards against data leakage, reports appropriate metrics for the task, and compares against simple baselines. Because labels can encode human assumptions, careful evaluation across subgroups is part of responsible methodology.

Key facts

At a glance

Field: subtype of machine learning
Core idea: learn from labelled input–output pairs
Classification: predicts a discrete category
Regression: predicts a continuous value
Evaluation: on held-out, unseen data
Key risk: overfitting to the training labels

Common questions

FAQ

What is the difference between classification and regression?+

Both are supervised tasks. Classification predicts a discrete category, such as spam or not-spam, while regression predicts a continuous value, such as a price or temperature. The distinction is the type of output being predicted.

What is the difference between supervised and unsupervised learning?+

Supervised learning trains on labelled examples, where each input has a known correct output. Unsupervised learning works with unlabelled data and instead looks for structure such as clusters. The presence or absence of labels is the key difference.

Why does supervised learning need labelled data?+

The labels are the known correct answers the model learns to reproduce and generalise. Without them, there is no target to fit, which is why labelling effort and label quality are central to supervised learning.

Going deeper

Related on CASRAI

Sources

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.