Definition · Plain-language

Statistical power

Statistical power is the probability that a hypothesis test correctly detects a real effect — that is, rejects a null hypothesis that is actually false. It equals 1 − β, where β is the Type II error rate.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

What statistical power measures

Power is the sensitivity of a test: the long-run probability that, when a real effect exists in the population, the test returns a statistically significant result and correctly rejects the null hypothesis. Formally, power = 1 − β, where β is the probability of a Type II error (failing to detect a true effect, a false negative). A test with power of 0.80 will, on average, detect a genuine effect of the assumed size in eighty per cent of repeated studies, missing it in the remaining twenty per cent. Power complements the significance level α, which governs the Type I (false-positive) error rate.

What determines power

Four interlocking quantities fix the power of a test, so setting any three determines the fourth: the effect size (larger true effects are easier to detect), the sample size (more observations sharpen estimates and raise power), the significance level α (a more lenient α, such as 0.05 rather than 0.01, raises power but also the false-positive rate), and the variability in the data (lower variance, or a more precise design, raises power). Because these trade off against one another, power is best treated as a design decision made before data are collected, not a property discovered afterwards.

Power analysis and sample-size planning

A priori power analysis reverses the relationship to plan a study: the researcher fixes a target power (commonly 0.80), a significance level α (commonly 0.05), and a smallest effect size worth detecting, then solves for the minimum sample size required. This is the standard, defensible way to justify a sample size and is increasingly expected by funders, ethics committees and journals following reporting reforms. Tools such as G*Power compute these values for common designs. Planning power in advance guards against running an underpowered study that cannot fairly test its own hypothesis.

Why low power is dangerous

An underpowered study is likely to miss real effects, wasting effort and risking a false conclusion that no effect exists. Less obviously, low power also corrupts the significant findings it does produce: when power is low, a statistically significant result is more likely to be a fluke, and genuine effects that clear the threshold tend to be overestimated in magnitude (an inflation sometimes called the winner’s curse). Chronic low power across a field therefore reduces the reliability and reproducibility of its published literature, which is why power planning is now a core element of responsible research design.

Key facts

At a glance

Definition: probability a test correctly rejects a false null (detects a true effect)
Formula: power = 1 − β, where β is the Type II error rate
Convention: aim for power ≥ 0.80 (β ≤ 0.20)
Determined by: effect size, sample size, significance level α, variability
Main use: a priori power analysis to set required sample size
Risk if low: missed real effects plus unreliable, inflated significant findings

Common misconceptions

What people often get wrong

Often heard: Power of 0.80 means there is an 80% chance the study’s result is correct.

Actually: Power is conditional on a real effect of the assumed size existing. It is the probability the test detects that effect — not the probability that any given result is true. It says nothing about results when the null is actually true.

Often heard: You can fix low power after the fact by running a post-hoc power calculation on your observed effect.

Actually: Observed (post-hoc) power computed from your own result is a redundant restatement of the p-value and adds no information. Power should be planned a priori, using an effect size you care about detecting, before data collection.

Often heard: Power and the significance level α are essentially the same threshold.

Actually: They control different errors. α is the Type I (false-positive) rate — the chance of rejecting a true null. Power (1 − β) concerns the Type II (false-negative) rate — the chance of detecting a true effect. Lowering α, all else equal, lowers power.

Going deeper

Related CASRAI guidance

Statistics hub →Sample size →Effect size →Null hypothesis →t-test vs ANOVA →Standards dictionary →