Editorial · CASRAI · Reproducibility and computational research

Effect Size: Why It Matters Beyond Statistical Significance

Reproducibility and computational research

Effect size measures how large a difference or relationship actually is, independent of sample size. This guide explains Cohen’s d, eta-squared and correlation r, why they complement p-values, and how reporting standards now expect them.

ByCASRAI Editorial Board

Published 19 Jun 2026· 4 minute read

An effect size is a standardised measure of the magnitude of a difference or relationship, telling you how large an effect is rather than merely whether it is statistically detectable. Where a p-value answers “is there an effect?”, an effect size answers the more useful question “how big is it?”. Reporting effect sizes is now expected by major journals and statistical bodies, because significance alone can mislead.

Why a p-value is not enough

A p-value depends heavily on sample size. With a large enough sample, a trivially small difference can become statistically significant; with a small sample, a substantial effect can fail to reach significance. This means a significant result tells you an effect probably exists, but nothing about whether it is large enough to matter in practice. The American Statistical Association’s 2016 statement on p-values explicitly cautioned against treating statistical significance as a measure of importance and urged researchers to report effect sizes and uncertainty. For the foundations, see our explainer on p-values and statistical significance.

Common effect size measures

Different designs call for different effect size statistics. The table below summarises the most widely used.

Measure	Used with	What it expresses	Rough benchmarks
Cohen’s d	Difference between two means	Difference in standard-deviation units	0.2 small, 0.5 medium, 0.8 large
Eta-squared	ANOVA	Proportion of variance explained by a factor	0.01 small, 0.06 medium, 0.14 large
Pearson’s r	Correlation between two variables	Strength and direction of association	0.1 small, 0.3 medium, 0.5 large
Cramer’s V	Categorical association	Strength of relationship in a contingency table	Depends on table size

These benchmarks, popularised by Jacob Cohen, are useful starting points but are not universal laws. What counts as a meaningful effect depends on the field: a small standardised effect in a public-health intervention can have enormous real-world value, while a large effect in a tightly controlled lab study may be unremarkable.

Effect size in context: ANOVA and categorical data

Effect sizes pair naturally with the tests that produce p-values. After an ANOVA, eta-squared (or partial eta-squared) quantifies how much variance each factor explains. After a chi-square test, Cramer’s V or the phi coefficient gives the strength of association that the chi-square statistic alone cannot. Reporting the test statistic and the effect size together turns “there is an effect” into “there is an effect of this size”.

Practical versus statistical significance

Statistical significance concerns whether an effect is distinguishable from chance. Practical significance concerns whether the effect is large enough to matter for decisions, policy or theory. The two can diverge sharply. A drug that lowers blood pressure by a statistically significant but clinically negligible amount is significant without being meaningful. Effect sizes, ideally reported with confidence intervals, are what let readers judge practical importance for themselves.

Reporting standards and reproducibility

Effect size reporting is not optional in many venues. The APA Publication Manual has long required effect sizes alongside test results, and reporting guidelines across disciplines echo this. Effect sizes also power meta-analysis and a-priori power analysis: you cannot plan an adequately powered study without an expected effect size, as our guide to sample size and statistical power explains. Recording effect sizes, confidence intervals and the measure used is part of the transparent reporting we champion across our reproducibility coverage and codify in our guidance for authors.

Frequently asked questions

What is the difference between a p-value and an effect size?

A p-value indicates whether an effect is likely to be real rather than chance. An effect size indicates how large that effect is. They answer different questions and should always be reported together.

Which effect size should I report?

Match the measure to the design: Cohen’s d for two-group mean differences, eta-squared for ANOVA, Pearson’s r for correlations, and Cramer’s V for categorical associations. Always state which measure you used.

Can a result be statistically significant but practically meaningless?

Yes. With a large sample, tiny differences become significant. The effect size, especially with a confidence interval, reveals whether the difference is large enough to matter in the real world.

Why do journals now require effect sizes?

Because significance alone gives an incomplete picture and contributes to overstated findings. Bodies such as the American Statistical Association and APA emphasise effect sizes to improve transparency and reproducibility. See the CASRAI dictionary for the standardised terms used in reporting.

Related editorial in this domain

More on Reproducibility and computational research

20 Jun 2026

Reporting Molecular Methods: PCR, qPCR and the MIQE Guidelines

PCR and quantitative PCR are core molecular methods, and the MIQE guidelines define what must be reported for results to be reproducible. This guide explains PCR at a high level and the minimum information MIQE requires for transparent qPCR experiments.

20 Jun 2026

Outliers in Statistics: Definition, Detection and Principled Handling

An outlier is a data point that lies an unusual distance from the bulk of a distribution. This guide defines outliers, separates measurement error from genuine extremes, and sets out detection methods and principled handling that you report rather than delete silently.

20 Jun 2026

PRISMA: The 2020 Reporting Standard for Systematic Reviews and Meta-Analyses

PRISMA is the Preferred Reporting Items for Systematic reviews and Meta-Analyses, a reporting standard whose 2020 update supplies a 27-item checklist and a flow diagram so that reviews are transparent, complete and reproducible by other researchers.