Tag: variance

  • Standard Deviation in Research: A Clear Statistical Definition

    Standard deviation is a measure of how spread out a set of values is around its mean. It expresses, in the original units of the data, the typical distance of an observation from the average. A small standard deviation means values cluster tightly around the mean; a large standard deviation means they are widely dispersed. It is one of the most widely reported summary statistics in quantitative research because it captures variability that a mean alone conceals.

    Standard deviation and the mean

    Two datasets can share an identical mean yet behave very differently. Consider two classes whose mean test score is 70. In the first, scores fall between 68 and 72; in the second, they range from 40 to 100. Both means are 70, but the second class is far more variable. The standard deviation quantifies that difference, which is why reporting a mean without a measure of spread is incomplete.

    Standard deviation is the square root of the variance. Variance is the average of the squared deviations of each value from the mean. Squaring removes negative signs and emphasises larger departures, but it also leaves variance in squared units. Taking the square root returns the figure to the original units, making standard deviation the more interpretable companion to the mean.

    Population versus sample

    The formula differs depending on whether the data represent an entire population or a sample drawn from one. The population standard deviation divides the sum of squared deviations by N, the number of values. The sample standard deviation divides by n minus 1 rather than n. This adjustment, known as Bessel’s correction, compensates for the tendency of a sample to underestimate the spread of the population it came from. Because most research analyses a sample and infers something about a wider population, the sample formula with n minus 1 is the one most often applied.

    Quantity Divisor Used when
    Population standard deviation N Every member of the population is measured
    Sample standard deviation n − 1 A sample is used to estimate the population

    The 68-95-99.7 rule

    When data follow a normal (bell-shaped) distribution, standard deviation maps onto predictable proportions of the data. This is the empirical rule, often called the 68-95-99.7 rule. Approximately 68% of values fall within one standard deviation of the mean, about 95% fall within two standard deviations, and roughly 99.7% fall within three. These figures hold only for a normal distribution and are approximations for real data that merely resemble one; skewed or heavy-tailed distributions will not obey them.

    Range from the mean Approximate share of data (normal distribution)
    ±1 standard deviation 68%
    ±2 standard deviations 95%
    ±3 standard deviations 99.7%

    A worked conceptual example

    Suppose adult resting heart rates in a sample have a mean of 70 beats per minute and a standard deviation of 8. If the distribution is roughly normal, then about 68% of people in that sample have a resting rate between 62 and 78 (the mean plus or minus one standard deviation). About 95% fall between 54 and 86 (two standard deviations), and almost everyone, around 99.7%, falls between 46 and 94 (three standard deviations). A reading of 100 would lie more than three standard deviations above the mean and would therefore be unusual relative to this sample. Examining such extreme values links directly to outlier detection, a related step in data quality assessment.

    Standard deviation versus standard error

    A frequent source of confusion is the difference between standard deviation and standard error. Standard deviation describes the variability of individual observations in the data. The standard error of the mean describes the variability of the sample mean itself as an estimate of the population mean, and it equals the standard deviation divided by the square root of the sample size. Because dividing by the root of n shrinks it, the standard error is always smaller than the standard deviation and grows narrower as the sample grows.

    The choice between them depends on what is being communicated. To describe how much individuals differ from one another, report the standard deviation. To express how precisely the mean has been estimated, report the standard error or, more informatively, a confidence interval. Reporting a standard error where a standard deviation is meant can mislead readers into thinking data are far less variable than they are. For practical reporting conventions, see the CASRAI author guidance and the CASRAI dictionary.

    Frequently asked questions

    Why divide by n minus 1 for a sample?

    Dividing by n minus 1 corrects a bias: using the sample mean to centre the data slightly reduces the spread, so dividing by the smaller divisor produces an unbiased estimate of the population variance. This is Bessel’s correction.

    Can standard deviation be negative?

    No. It is a square root of an average of squared quantities, so it is always zero or positive. A standard deviation of zero means every value is identical to the mean.

    Should I report standard deviation or standard error?

    Report the standard deviation to describe variability among observations, and the standard error or a confidence interval to describe the precision of the mean. For wider context on variability and uncertainty, see our guide to confidence intervals and the reproducibility news category.

  • Variance in Statistics: Definition and Formula

    Variance is a measure of how spread out a set of values is, defined as the average of the squared deviations of each value from the mean. A large variance means the data points are widely dispersed; a small variance means they cluster tightly around the mean. Because the deviations are squared, variance is always non-negative and is expressed in squared units of the original measurement.

    The definition of variance

    To calculate variance, you first find the mean of the data, then subtract the mean from each value to get the deviations. Squaring each deviation removes the sign (so positive and negative deviations do not cancel) and gives greater weight to values far from the mean. The average of these squared deviations is the variance.

    Variance is the foundation of many statistical methods, including the analysis of variance (ANOVA), regression diagnostics and the construction of confidence intervals. Reporting it transparently supports the goals set out in our reproducibility coverage.

    Population variance versus sample variance

    The formula depends on whether your data are the entire population or a sample drawn from it. For a population, you divide the sum of squared deviations by the number of values, N. For a sample, you divide by n − 1 instead of n. This adjustment, known as Bessel’s correction, produces an unbiased estimate of the population variance, because using the sample mean slightly underestimates the spread.

    Quantity Symbol Divisor
    Population variance σ² N
    Sample variance n − 1

    A worked conceptual example

    Suppose five replicate measurements give 4, 8, 6, 5 and 2. The mean is (4 + 8 + 6 + 5 + 2) / 5 = 5. The deviations from the mean are −1, 3, 1, 0 and −3. Squaring these gives 1, 9, 1, 0 and 9, which sum to 20. Treating the five values as a population, the variance is 20 / 5 = 4. Treating them as a sample, the variance is 20 / 4 = 5. The sample figure is slightly larger, reflecting Bessel’s correction.

    Variance and the standard deviation

    Variance and the standard deviation describe the same property of spread, but in different units. The standard deviation is simply the square root of the variance, which returns the measure to the original units of the data. In our worked example the population standard deviation is √4 = 2. Because the standard deviation is easier to interpret alongside the mean, it is often reported in papers; see our companion piece on the standard deviation for detail. Variance, however, has convenient mathematical properties, which is why it underlies so many statistical procedures.

    Interpreting variance correctly

    Because variance is in squared units, its absolute size is hard to interpret in isolation. A variance of 4 cm² is meaningful only relative to the scale of the measurement. Variance is also sensitive to outliers: squaring magnifies the effect of extreme values, so a single anomalous point can inflate the variance substantially. Always inspect your data distribution before reporting variance, and define the term consistently in your methods. The CASRAI dictionary and our author guidance encourage precise, reproducible statistical reporting.

    Frequently asked questions

    Why is variance squared rather than absolute?

    Squaring the deviations keeps the measure mathematically tractable and differentiable, which makes it the natural basis for least squares estimation and many other techniques. The absolute deviation is an alternative but lacks these convenient properties.

    When should I divide by n − 1 instead of n?

    Divide by n − 1 whenever your data are a sample used to estimate the variance of a wider population. Divide by N only when your data genuinely represent the entire population of interest.

    Is a high variance bad?

    Not inherently. High variance simply means greater spread. Whether that is good or bad depends on context: high variance in measurement error is undesirable, but natural biological variation may be expected and informative.