Research statistics · 28 pages

Research statistics & data analysis

Clear, accurate explainers for the statistics that underpin empirical research — standardised scores, sampling distributions, measures of spread and the logic of hypothesis testing. Each page leads with a concise definition and the correct formula, then links across to the wider CASRAI standards and dictionary.

Browse the topic

All 28 research statistics & data analysis pages

Definition

Z-score

A z-score, or standard score, states how many standard deviations a value lies from the mean of its distribution. It is calculated as z = (x − μ) / σ, where x is the value, μ the population mean and σ the standard deviation. Positive z-scores sit above the mean, negative ones below.

Definition

Boxplot

A boxplot, or box-and-whisker plot, displays a dataset’s five-number summary: the minimum, first quartile (Q1), median, third quartile (Q3) and maximum. The box spans Q1 to Q3, covering the middle 50% of the data; a line inside marks the median; whiskers extend to the rest, and points beyond them flag potential outliers.

Definition

Central limit theorem

The central limit theorem (CLT) states that the sampling distribution of the sample mean approaches a normal distribution as the sample size grows, whatever the shape of the underlying population. The means cluster around the population mean μ, with a standard deviation (the standard error) of σ / √n. This holds for large samples even when the population is skewed.

Definition

Correlation coefficient

A correlation coefficient quantifies the strength and direction of the linear relationship between two variables. The most common, Pearson’s r, ranges from −1 to +1: −1 is a perfect negative relationship, +1 a perfect positive one, and 0 no linear relationship. The sign gives direction and the magnitude gives strength. Correlation does not imply causation.

Guide

Hypothesis testing

Hypothesis testing is the procedure for deciding whether sample evidence is strong enough to reject a null hypothesis. You state the null and alternative hypotheses, choose a significance level (alpha), compute a test statistic from the data, and compare it with a critical value or convert it to a p-value. If the result is extreme enough, you reject the null.

Comparison

Descriptive vs inferential statistics

The difference is one of purpose. Descriptive statistics summarise and describe the data you have collected — through measures such as the mean, median and standard deviation, and through charts. Inferential statistics go further, using a sample to draw conclusions about a wider population, estimating parameters and testing hypotheses while quantifying the uncertainty involved.

Definition

Skewness

Skewness measures the asymmetry of a probability distribution. A symmetric distribution has zero skew. In positive (right) skew, a long tail stretches to the right and the mean is pulled above the median. In negative (left) skew, the long tail is on the left and the mean is pulled below the median.

Definition

Degrees of freedom

Degrees of freedom are the number of independent values in a calculation that are free to vary given the constraints on the data. Estimating one quantity, such as the sample mean, uses up one degree of freedom, which is why a single sample’s degrees of freedom are often n − 1.

Definition

Standard error

The standard error of the mean measures how precisely a sample mean estimates the population mean — the typical distance between them. It equals the standard deviation divided by the square root of the sample size, SE = σ / √n, or s / √n when estimated from the sample. It shrinks as the sample grows.

Definition

Interquartile range

The interquartile range (IQR) is the spread of the middle 50% of a dataset. It equals the third quartile minus the first quartile, IQR = Q3 − Q1. Because it ignores the lowest and highest quarters, the IQR resists outliers and underpins the standard 1.5 × IQR outlier rule.

Comparison

Standard deviation vs variance

Both measure how spread out a dataset is. Variance is the average of the squared deviations from the mean, so it is expressed in squared units. The standard deviation is simply the square root of the variance, which returns it to the data’s original units and makes it easier to interpret. The standard deviation equals √variance.

Comparison

Z-score vs t-score

Both are standardised scores, but they differ in what is known. A z-score is used when the population standard deviation is known and uses the standard normal distribution. A t-score is used when the standard deviation is estimated from the sample, especially with small samples; it uses the t-distribution, which has heavier tails and depends on the degrees of freedom.

Definition

P-value

A p-value is the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true. A small p-value (typically below the significance level α, often 0.05) suggests the data are unlikely under the null, so researchers reject it. The p-value does not give the probability that the null is true, nor does it measure the size or importance of an effect.

Definition

Normal distribution

A normal distribution is a symmetric, bell-shaped continuous probability distribution defined entirely by two parameters: its mean (μ), which fixes the centre, and its standard deviation (σ), which fixes the spread. Its mean, median and mode coincide at the peak, and it follows the empirical 68-95-99.7 rule. Many statistical tests assume normality, partly because of the central limit theorem.

Definition

Mean, median and mode

Mean, median and mode are the three measures of central tendency — single values that summarise the centre of a dataset. The mean is the arithmetic average, the median is the middle value when data are ordered, and the mode is the most frequently occurring value. The range, the difference between the largest and smallest values, is a basic measure of spread that complements them.

Definition

Standard deviation

Standard deviation is a measure of the average spread, or dispersion, of values around the mean of a dataset. It is calculated as the square root of the variance, so it is expressed in the same units as the data. A small standard deviation means values cluster tightly near the mean; a large one means they are widely scattered. It is one of the most widely used measures of spread.

Definition

Variance

Variance is a measure of dispersion equal to the average of the squared deviations of each value from the mean. It tells you how spread out a dataset is: the larger the variance, the further values lie from the mean. Because the deviations are squared, variance is expressed in squared units, which is why its square root — the standard deviation — is often reported instead.

Definition

Confidence interval

A confidence interval is a range of plausible values for an unknown population parameter, calculated from sample data. The confidence level, such as 95%, describes the method: over many repeated samples, that percentage of the intervals produced would contain the true parameter. It conveys both an estimate and its uncertainty — a wider interval signals a less precise estimate, often from a smaller sample.

Definition

Correlation

Correlation describes the strength and direction of a linear association between two variables, summarised by a correlation coefficient (r) ranging from −1 to +1. A positive value means the variables rise together, a negative value means one rises as the other falls, and a value near zero means little or no linear relationship. Crucially, correlation does not prove that one variable causes the other.

Definition

Effect size

Effect size is a standardised measure of the magnitude of a result — how large a difference between groups is, or how strong a relationship is — independent of sample size. Unlike a p-value, which only signals whether an effect is unlikely to be due to chance, effect size shows how big the effect actually is. Common measures include Cohen’s d, Pearson’s r and the odds ratio. Reporting standards such as APA now require effect sizes alongside p-values.

Definition

Sample size

Sample size is the number of observations, participants or units included in a study. It is determined before data collection through a power analysis that balances the desired statistical power (commonly 0.80), the significance level (α, often 0.05), the expected effect size and the variability in the data. Too small a sample risks an underpowered study that misses real effects (Type II errors); an excessively large one can make trivial effects appear statistically significant.

Definition

t-test

A t test is a parametric statistical test that compares means to determine whether an observed difference is likely to reflect a real effect rather than chance. There are three common forms: the one-sample t test (a mean against a known value), the independent-samples t test (two separate groups), and the paired t test (the same subjects measured twice). It assumes the data are approximately normally distributed, and for independent samples that the groups have similar variances. For three or more groups, ANOVA is used instead.

Definition

ANOVA (analysis of variance)

ANOVA (analysis of variance) is a statistical test used to compare the means of three or more groups at once. It works by partitioning the total variance in the data into variation between groups and variation within groups, expressing their ratio as an F-statistic. A large F suggests the group means differ more than chance would predict. A one-way ANOVA tests one factor; a two-way ANOVA tests two factors and their interaction. A significant result is followed by post-hoc tests to locate which groups differ.

Definition

Chi-square test

A chi-square (χ²) test is a non-parametric test used with categorical data, comparing observed frequencies against the frequencies expected if a hypothesis were true. Its two main forms are the goodness-of-fit test, which checks whether one categorical variable matches an expected distribution, and the test of independence, which checks whether two categorical variables are associated within a contingency table. A large discrepancy between observed and expected counts yields a large χ² statistic and a small p-value.

Definition

Linear regression

Linear regression is a statistical technique that models the linear relationship between a dependent (outcome) variable and one or more independent (predictor) variables. It fits a straight line — defined by a slope and an intercept — that best predicts the outcome, and reports R², the proportion of variance the model explains. Simple linear regression uses one predictor; multiple linear regression uses several. It is used both to predict outcomes and to quantify how predictors relate to the outcome.

Comparison

Correlation vs causation

The difference is that correlation describes two variables that move together, whereas causation means one variable directly produces a change in the other. A correlation can arise without causation — through coincidence, reverse causation, or a hidden third variable (confounder) driving both. This is why "correlation does not imply causation". Establishing causation requires more than an association: ideally a controlled experiment such as a randomised controlled trial, plus evidence of temporal order and the ruling out of alternative explanations.

Definition

Statistical power

Statistical power is the probability that a test correctly rejects a false null hypothesis — the chance it detects a true effect when one genuinely exists. It equals 1 − β, where β is the Type II (false-negative) error rate. Researchers conventionally aim for power of at least 0.80. Power rises with larger effect sizes, larger samples, a higher significance level, and lower variability, and a priori power analysis uses these inputs to plan the sample size needed.

Comparison

t-test vs ANOVA

A t-test compares the means of two groups (or one group against a value, or two paired conditions), whereas ANOVA compares the means of three or more groups at once. Running several t-tests instead inflates the family-wise Type I error rate, which ANOVA controls with a single F-test. A t-test yields a t-statistic; ANOVA yields an F-statistic, and a significant ANOVA needs post-hoc tests to say which groups differ. A t-test is a special case of ANOVA with two groups, where F = t².