An effect size is a standardised measure of the magnitude of a difference or relationship, telling you how large an effect is rather than merely whether it is statistically detectable. Where a p-value answers “is there an effect?”, an effect size answers the more useful question “how big is it?”. Reporting effect sizes is now expected by major journals and statistical bodies, because significance alone can mislead.
Why a p-value is not enough
A p-value depends heavily on sample size. With a large enough sample, a trivially small difference can become statistically significant; with a small sample, a substantial effect can fail to reach significance. This means a significant result tells you an effect probably exists, but nothing about whether it is large enough to matter in practice. The American Statistical Association’s 2016 statement on p-values explicitly cautioned against treating statistical significance as a measure of importance and urged researchers to report effect sizes and uncertainty. For the foundations, see our explainer on p-values and statistical significance.
Common effect size measures
Different designs call for different effect size statistics. The table below summarises the most widely used.
| Measure | Used with | What it expresses | Rough benchmarks |
|---|---|---|---|
| Cohen’s d | Difference between two means | Difference in standard-deviation units | 0.2 small, 0.5 medium, 0.8 large |
| Eta-squared | ANOVA | Proportion of variance explained by a factor | 0.01 small, 0.06 medium, 0.14 large |
| Pearson’s r | Correlation between two variables | Strength and direction of association | 0.1 small, 0.3 medium, 0.5 large |
| Cramer’s V | Categorical association | Strength of relationship in a contingency table | Depends on table size |
These benchmarks, popularised by Jacob Cohen, are useful starting points but are not universal laws. What counts as a meaningful effect depends on the field: a small standardised effect in a public-health intervention can have enormous real-world value, while a large effect in a tightly controlled lab study may be unremarkable.
Effect size in context: ANOVA and categorical data
Effect sizes pair naturally with the tests that produce p-values. After an ANOVA, eta-squared (or partial eta-squared) quantifies how much variance each factor explains. After a chi-square test, Cramer’s V or the phi coefficient gives the strength of association that the chi-square statistic alone cannot. Reporting the test statistic and the effect size together turns “there is an effect” into “there is an effect of this size”.
Practical versus statistical significance
Statistical significance concerns whether an effect is distinguishable from chance. Practical significance concerns whether the effect is large enough to matter for decisions, policy or theory. The two can diverge sharply. A drug that lowers blood pressure by a statistically significant but clinically negligible amount is significant without being meaningful. Effect sizes, ideally reported with confidence intervals, are what let readers judge practical importance for themselves.
Reporting standards and reproducibility
Effect size reporting is not optional in many venues. The APA Publication Manual has long required effect sizes alongside test results, and reporting guidelines across disciplines echo this. Effect sizes also power meta-analysis and a-priori power analysis: you cannot plan an adequately powered study without an expected effect size, as our guide to sample size and statistical power explains. Recording effect sizes, confidence intervals and the measure used is part of the transparent reporting we champion across our reproducibility coverage and codify in our guidance for authors.
Frequently asked questions
What is the difference between a p-value and an effect size?
A p-value indicates whether an effect is likely to be real rather than chance. An effect size indicates how large that effect is. They answer different questions and should always be reported together.
Which effect size should I report?
Match the measure to the design: Cohen’s d for two-group mean differences, eta-squared for ANOVA, Pearson’s r for correlations, and Cramer’s V for categorical associations. Always state which measure you used.
Can a result be statistically significant but practically meaningless?
Yes. With a large sample, tiny differences become significant. The effect size, especially with a confidence interval, reveals whether the difference is large enough to matter in the real world.
Why do journals now require effect sizes?
Because significance alone gives an incomplete picture and contributes to overstated findings. Bodies such as the American Statistical Association and APA emphasise effect sizes to improve transparency and reproducibility. See the CASRAI dictionary for the standardised terms used in reporting.







