Statistical power is the probability that a study will correctly detect an effect when one truly exists. It is formally defined as one minus the Type II error rate, written as power = 1 − β. A study with high power is likely to find a real effect; an underpowered study may miss it, producing a false negative. Power is closely tied to sample size, which is why power analysis is a core part of study planning.
Type I and Type II errors
Hypothesis testing can go wrong in two ways. A Type I error, with probability α, occurs when the test detects an effect that is not really there, a false positive. A Type II error, with probability β, occurs when the test fails to detect an effect that is genuinely present, a false negative.
| Effect truly exists | No effect exists | |
|---|---|---|
| Test is significant | Correct (power = 1 − β) | Type I error (α) |
| Test is not significant | Type II error (β) | Correct |
The significance threshold α is usually set at 0.05, which links directly to the interpretation of p-values and significance testing.
The 0.8 convention
By widespread convention, researchers aim for a power of at least 0.8, meaning the study has an 80% chance of detecting the effect of interest if it exists. This corresponds to a Type II error rate of 0.2. The figure is a pragmatic standard rather than a law: some fields demand higher power, such as 0.9, particularly when missing an effect would be costly. The key point is to choose and justify a target before data collection.
What determines power?
Four quantities are linked: the sample size, the effect size, the significance level α and the power. Fixing any three determines the fourth. Power increases with a larger sample size, a larger true effect, a less stringent α and lower data variance. Because researchers usually cannot change the effect size or the desired α, the practical lever is the sample size.
A priori power analysis
An a priori power analysis is performed before data collection to determine the sample size needed to achieve the desired power for a plausible effect size. Researchers specify the target power (often 0.8), the significance level (often 0.05) and the smallest effect size they consider meaningful, then calculate the required number of participants. This prevents the common mistake of recruiting too few subjects, and is increasingly expected by funders, ethics committees and journals. The same logic applies whether the planned analysis is a t-test, a regression or another test.
Why underpowered studies harm reproducibility
Underpowered studies are a major threat to reproducibility. They frequently miss real effects, and when they do reach significance the estimated effect is often exaggerated, a phenomenon known as the winner’s curse. Such inflated estimates fail to replicate in larger studies. Conducting and reporting a power analysis, and pre-specifying the sample size, makes research more credible. The CASRAI dictionary and our author guidance encourage transparent reporting of these design choices, ideally alongside a confidence interval that conveys the precision of the estimate.
Frequently asked questions
What is a good level of statistical power?
A power of 0.8 is the common minimum, giving an 80% chance of detecting a true effect. Higher targets such as 0.9 are preferable when feasible, especially for confirmatory studies.
Can I calculate power after the study is finished?
Post-hoc power calculated from the observed effect is generally uninformative, because it is just a restatement of the p-value. Power analysis is most useful when done in advance to plan sample size.
What is the relationship between sample size and power?
Larger samples increase power because they reduce the standard error, making real effects easier to detect. This is the main reason a priori power analysis focuses on choosing an adequate sample size.







