Explainer · Plain-language

Sampling Bias: Definition, Meaning & Examples | CASRAI

Sampling bias is a systematic error that arises when the sample studied is not representative of the population it is meant to reflect, because some members are more likely to be included than others. It undermines the generalisability of findings.

CASRAI plain-language explainers — clear answers to recurring research-administration questions

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

When a sample misrepresents the population

Sampling bias is fundamentally a representativeness problem: the sample differs systematically from the population because the selection process favoured some groups over others. If a survey about working hours is collected only during office hours, it under-samples people in fixed jobs; conclusions about the population will be skewed. The hallmark is direction — sampling bias pushes estimates consistently one way, unlike random sampling error, which scatters around the true value and shrinks as the sample grows.

Where it comes from

Two common roots are a flawed sampling frame and non-probability selection. If the frame — the list from which the sample is drawn — omits or under-covers parts of the population (coverage error), the sample inherits that gap. If selection is by convenience, self-selection, or other non-random means, those who are easiest to reach or most willing to take part dominate. Famous failures, such as polls drawn from unrepresentative lists, show how confidently wrong a large but biased sample can be.

Common forms

Sampling bias appears in several recognisable shapes. Convenience sampling over-represents whoever is easy to access. Self-selection (volunteer) bias over-represents people motivated to respond. Undercoverage omits hard-to-reach groups entirely. Non-response bias arises when those who decline differ systematically from those who answer. Each produces a sample whose composition diverges from the population in a way that no amount of additional sampling, by the same flawed method, will correct.

Preventing and detecting it

The chief defence is probability sampling — giving every member of a well-defined population a known, non-zero chance of selection — built on a complete and accurate sampling frame. Maximising response rates and following up non-respondents reduce non-response bias. After collection, researchers can compare sample characteristics with known population figures and apply weighting to correct imbalances. Transparent reporting of the sampling method, frame, and response rate lets readers judge how far results may generalise.

Key facts

At a glance

Definition: Systematic error from an unrepresentative sample
Relation: A sub-type of selection bias at the sampling stage
Cause: Flawed sampling frame or non-probability selection
Examples: Convenience, self-selection, undercoverage, non-response
Threatens: External validity / generalisability of findings
Defence: Probability sampling and a complete sampling frame

Common misconceptions

What people often get wrong

Often heard: A very large sample cannot be biased.

Actually: No — size does not fix representativeness. A large sample drawn by a flawed method is systematically skewed, as famous polling failures have shown.

Often heard: Sampling bias and random sampling error are the same.

Actually: No — random error scatters around the true value and shrinks with size; sampling bias is systematic and pushes estimates consistently in one direction.

Often heard: A high response rate guarantees no sampling bias.

Actually: No — bias can still arise from an incomplete sampling frame or undercoverage, even when most of those approached respond.

Going deeper