Variance is a measure of how spread out a set of values is, defined as the average of the squared deviations of each value from the mean. A large variance means the data points are widely dispersed; a small variance means they cluster tightly around the mean. Because the deviations are squared, variance is always non-negative and is expressed in squared units of the original measurement.
The definition of variance
To calculate variance, you first find the mean of the data, then subtract the mean from each value to get the deviations. Squaring each deviation removes the sign (so positive and negative deviations do not cancel) and gives greater weight to values far from the mean. The average of these squared deviations is the variance.
Variance is the foundation of many statistical methods, including the analysis of variance (ANOVA), regression diagnostics and the construction of confidence intervals. Reporting it transparently supports the goals set out in our reproducibility coverage.
Population variance versus sample variance
The formula depends on whether your data are the entire population or a sample drawn from it. For a population, you divide the sum of squared deviations by the number of values, N. For a sample, you divide by n − 1 instead of n. This adjustment, known as Bessel’s correction, produces an unbiased estimate of the population variance, because using the sample mean slightly underestimates the spread.
| Quantity | Symbol | Divisor |
|---|---|---|
| Population variance | σ² | N |
| Sample variance | s² | n − 1 |
A worked conceptual example
Suppose five replicate measurements give 4, 8, 6, 5 and 2. The mean is (4 + 8 + 6 + 5 + 2) / 5 = 5. The deviations from the mean are −1, 3, 1, 0 and −3. Squaring these gives 1, 9, 1, 0 and 9, which sum to 20. Treating the five values as a population, the variance is 20 / 5 = 4. Treating them as a sample, the variance is 20 / 4 = 5. The sample figure is slightly larger, reflecting Bessel’s correction.
Variance and the standard deviation
Variance and the standard deviation describe the same property of spread, but in different units. The standard deviation is simply the square root of the variance, which returns the measure to the original units of the data. In our worked example the population standard deviation is √4 = 2. Because the standard deviation is easier to interpret alongside the mean, it is often reported in papers; see our companion piece on the standard deviation for detail. Variance, however, has convenient mathematical properties, which is why it underlies so many statistical procedures.
Interpreting variance correctly
Because variance is in squared units, its absolute size is hard to interpret in isolation. A variance of 4 cm² is meaningful only relative to the scale of the measurement. Variance is also sensitive to outliers: squaring magnifies the effect of extreme values, so a single anomalous point can inflate the variance substantially. Always inspect your data distribution before reporting variance, and define the term consistently in your methods. The CASRAI dictionary and our author guidance encourage precise, reproducible statistical reporting.
Frequently asked questions
Why is variance squared rather than absolute?
Squaring the deviations keeps the measure mathematically tractable and differentiable, which makes it the natural basis for least squares estimation and many other techniques. The absolute deviation is an alternative but lacks these convenient properties.
When should I divide by n − 1 instead of n?
Divide by n − 1 whenever your data are a sample used to estimate the variance of a wider population. Divide by N only when your data genuinely represent the entire population of interest.
Is a high variance bad?
Not inherently. High variance simply means greater spread. Whether that is good or bad depends on context: high variance in measurement error is undesirable, but natural biological variation may be expected and informative.
Leave a Reply