Definition · Plain-language

Effect size

Effect size is a quantitative measure of the magnitude of a difference or the strength of a relationship — the practical size of a result, independent of how many participants were studied.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

Why effect size complements the p-value

A p-value answers only one question: how surprising the data would be if there were truly no effect. It says nothing about how large or important the effect is, and it is heavily influenced by sample size — with a large enough sample, even a trivial difference becomes "statistically significant". Effect size separates magnitude from significance. Reporting both lets readers judge not just whether an effect is unlikely to be chance, but whether it is large enough to matter in practice. This is why statistical significance must never be equated with practical importance.

Common measures of effect size

Effect sizes fall into two broad families. Differences between means are expressed as standardised mean differences such as Cohen’s d, which reports the gap between two group means in standard-deviation units. Strength of association is expressed by correlation coefficients such as Pearson’s r, or by the coefficient of determination R² for variance explained. For categorical outcomes, the odds ratio and risk ratio quantify how the odds or risk of an event differ between groups. Each measure suits a particular design, but all share the goal of describing magnitude on an interpretable scale.

Interpreting the size

Cohen proposed rough benchmarks for d of about 0.2 (small), 0.5 (medium) and 0.8 (large), with comparable conventions for r of about 0.1, 0.3 and 0.5. These are conventional rules of thumb, not laws: what counts as a meaningful effect depends on the field, the outcome and the cost of the intervention. A small effect on mortality may matter far more than a large effect on a trivial outcome. Wherever possible, interpret an effect size against domain knowledge and previous studies rather than relying on the generic labels alone.

Key facts

At a glance

Definition: the magnitude of a difference or strength of a relationship
Key property: independent of sample size (unlike the p-value)
Mean-difference measure: Cohen’s d (standardised mean difference)
Association measures: Pearson’s r, R², odds ratio, risk ratio
Cohen benchmarks: d ≈ 0.2 small, 0.5 medium, 0.8 large (conventions, not laws)
Standards: APA and reporting guidelines require effect sizes with p-values

Common misconceptions

What people often get wrong

Often heard: A statistically significant result (small p-value) means the effect is large or important.

Actually: Significance and magnitude are separate. A tiny, unimportant effect can be significant in a large sample, while a large effect can be non-significant in a small one. Effect size, not the p-value, tells you how big the result is.

Often heard: Cohen’s benchmarks of 0.2, 0.5 and 0.8 are fixed rules for what counts as small, medium and large.

Actually: They are conventional rules of thumb. What is meaningful depends on the field, the outcome and the stakes; a "small" effect on an important outcome can be highly valuable, so interpret effect sizes in context.

Often heard: Effect size depends on how many participants you recruit.

Actually: Effect size estimates the true magnitude of an effect and does not grow with sample size. A larger sample makes the estimate more precise (narrower confidence interval), but does not inflate the effect size itself.

Going deeper

Related CASRAI guidance

What is a p-value? →What is sample size? →What is a correlation coefficient? →What is hypothesis testing? →Statistics hub →Standards dictionary →