Editorial · CASRAI · Reproducibility and computational research

The Normal Distribution Explained

Reproducibility and computational research

The normal distribution is a symmetric, bell-shaped curve defined by its mean and standard deviation. This explainer covers the 68-95-99.7 rule, the central limit theorem, and what is and is not normally distributed in research.

ByCASRAI Editorial Board

Published 18 Jun 2026· 3 minute read

The normal distribution, also called the Gaussian distribution, is a continuous probability distribution that is symmetric about its mean and forms a bell-shaped curve. It is fully described by two parameters: the mean, which locates the centre of the curve, and the standard deviation, which controls its width. Most values lie near the mean, and values become increasingly rare as they move further away in either direction.

Shape, symmetry and parameters

A normal curve is perfectly symmetric, so its mean, median and mode coincide at the centre. The two tails extend infinitely in both directions, approaching but never touching the horizontal axis. Changing the mean shifts the curve left or right; changing the standard deviation stretches or compresses it. A larger standard deviation produces a flatter, wider bell; a smaller one produces a taller, narrower peak.

The 68-95-99.7 rule

For any normal distribution, a fixed proportion of values falls within a given number of standard deviations of the mean. This is known as the empirical rule, or the 68-95-99.7 rule.

Within	Approximate proportion
±1 standard deviation	68%
±2 standard deviations	95%
±3 standard deviations	99.7%

This rule underpins the interpretation of confidence intervals and the identification of outliers, since values beyond about three standard deviations are unusual under normality.

The central limit theorem

The normal distribution is central to statistics largely because of the central limit theorem. This theorem states that the sampling distribution of the mean of a sufficiently large number of independent observations is approximately normal, regardless of the shape of the underlying population, provided the population has a finite variance. In practice, sample means tend towards normality as sample size increases, often by around n = 30 for moderately skewed data. This is why many tests that compare means, such as the t-test, can be applied even when the raw data are not perfectly normal.

Why it matters for inference

Because the behaviour of the normal distribution is exactly known, it provides the mathematical basis for many inferential procedures, including the calculation of p-values and significance tests. Standardising a value into a z-score, by subtracting the mean and dividing by the standard deviation, lets researchers compare observations on a common scale and look up exact probabilities.

What is and is not normally distributed

Many measurements approximate a normal distribution, including heights, blood pressure and measurement errors. However, normality should never be assumed. Reaction times, incomes and counts of rare events are typically skewed, and some variables are bounded or bimodal. Always check the distribution using histograms or quantile-quantile plots before applying methods that assume normality. Defining variables and their distributions clearly supports the reproducibility standards set out in the CASRAI dictionary and our guidance for authors.

Frequently asked questions

What is the difference between the normal and standard normal distribution?

The standard normal distribution is a special case with a mean of 0 and a standard deviation of 1. Any normal distribution can be converted to the standard normal by calculating z-scores.

Does my data have to be normal to use statistics?

Not always. Thanks to the central limit theorem, tests based on means are robust to non-normality at larger sample sizes. For small samples or strongly skewed data, non-parametric alternatives or transformations may be more appropriate.

How can I check whether data are normally distributed?

Use graphical tools such as histograms and quantile-quantile plots, supplemented by formal tests like Shapiro-Wilk. Visual inspection is often the most informative, as formal tests can be over-sensitive in large samples.

Related editorial in this domain

More on Reproducibility and computational research

20 Jun 2026

Reporting Molecular Methods: PCR, qPCR and the MIQE Guidelines

PCR and quantitative PCR are core molecular methods, and the MIQE guidelines define what must be reported for results to be reproducible. This guide explains PCR at a high level and the minimum information MIQE requires for transparent qPCR experiments.

20 Jun 2026

Outliers in Statistics: Definition, Detection and Principled Handling

An outlier is a data point that lies an unusual distance from the bulk of a distribution. This guide defines outliers, separates measurement error from genuine extremes, and sets out detection methods and principled handling that you report rather than delete silently.

20 Jun 2026

PRISMA: The 2020 Reporting Standard for Systematic Reviews and Meta-Analyses

PRISMA is the Preferred Reporting Items for Systematic reviews and Meta-Analyses, a reporting standard whose 2020 update supplies a 27-item checklist and a flow diagram so that reviews are transparent, complete and reproducible by other researchers.