Editorial · CASRAI · Reproducibility and computational research

P-Values and Statistical Significance Explained Correctly

Reproducibility and computational research

A p-value is the probability of observing data at least as extreme as those seen, assuming the null hypothesis is true. This guide explains p-values precisely, summarises the ASA 2016 statement, and corrects common misinterpretations.

ByCASRAI Editorial Board

Published 19 Jun 2026· 4 minute read

A p-value is the probability of obtaining a result at least as extreme as the one observed, assuming that the null hypothesis is true. It is a measure of how compatible the data are with a specified statistical model in which there is no effect or no difference. A small p-value indicates that the observed data would be unusual if the null hypothesis held; it does not, by itself, prove that the null hypothesis is false or that an effect is real or important.

What the null hypothesis represents

Hypothesis testing begins with a null hypothesis, typically a statement of no effect, no difference or no association. The test asks how surprising the observed data would be if that null hypothesis were true. The p-value quantifies that surprise: the smaller it is, the less compatible the data are with the null model. Critically, the p-value is calculated under the assumption that the null is true, which is why it cannot be read as the probability that the null is true.

The American Statistical Association’s 2016 statement

In 2016 the American Statistical Association (ASA) published a formal statement on p-values, the first time it had issued such guidance, in response to widespread misuse. The statement set out six principles. In summary, it affirmed that p-values can indicate how incompatible data are with a specified model, but warned that a p-value does not measure the probability that the hypothesis under study is true, nor the probability that the data arose by chance alone. It cautioned that scientific conclusions should not be based only on whether a p-value passes a threshold, that proper reporting requires full transparency, that a p-value does not measure the size or importance of an effect, and that by itself a p-value is a poor measure of evidence regarding a model or hypothesis.

Common misinterpretations

Several persistent errors surround p-values. Avoiding them is essential for sound, reproducible reporting.

Misinterpretation	Why it is wrong
The p-value is the probability the null hypothesis is true	It is calculated assuming the null is true; it cannot also be that probability
p = 0.05 means a 5% chance the result is a fluke	The p-value is not the probability that the finding is due to chance
A non-significant result proves no effect exists	Absence of significance is not evidence of absence; the study may simply lack power
A small p-value means a large or important effect	The p-value reflects compatibility and sample size, not effect magnitude

The limits of the 0.05 convention

The threshold of 0.05 for declaring statistical significance is a convention, not a law of nature. Treating 0.05 as a bright line encourages dichotomous thinking in which a result at p = 0.049 is celebrated and one at p = 0.051 dismissed, despite negligible difference between them. This convention has fed practices such as selective reporting and p-hacking, where analyses are adjusted until a result crosses the threshold, both serious threats to reproducibility. The ASA statement explicitly warned against basing conclusions solely on whether a p-value clears a cut-off.

Effect sizes and intervals

Because a p-value says nothing about magnitude, it should be accompanied by an effect size, which describes how large the observed effect is, and ideally a confidence interval, which expresses the precision of the estimate. Reporting these alongside, or instead of, a bare p-value gives readers far more information for judging whether a finding matters. The underpinning ideas come from the wider discipline of statistics, and transparent reporting of all of them supports the goals tracked in our reproducibility category. For terminology and reporting conventions, consult the CASRAI dictionary.

Frequently asked questions

Does a p-value below 0.05 prove an effect is real?

No. It indicates the data would be unusual if the null hypothesis were true, but it does not prove the null is false, nor that the effect is large or important. Replication, effect sizes and intervals are needed to judge that.

What did the ASA 2016 statement conclude?

The statement set out six principles emphasising that p-values measure compatibility with a model, are not the probability the hypothesis is true, do not measure effect size, and should never be the sole basis for scientific conclusions. It urged full transparency in reporting.

Should we abandon p-values altogether?

Not necessarily. P-values can be informative when interpreted correctly and reported alongside effect sizes and confidence intervals. The problem lies in misuse and over-reliance on a single threshold, not in the statistic itself. See the CASRAI author guidance for reporting practices.

Related editorial in this domain

More on Reproducibility and computational research

20 Jun 2026

Reporting Molecular Methods: PCR, qPCR and the MIQE Guidelines

PCR and quantitative PCR are core molecular methods, and the MIQE guidelines define what must be reported for results to be reproducible. This guide explains PCR at a high level and the minimum information MIQE requires for transparent qPCR experiments.

20 Jun 2026

Outliers in Statistics: Definition, Detection and Principled Handling

An outlier is a data point that lies an unusual distance from the bulk of a distribution. This guide defines outliers, separates measurement error from genuine extremes, and sets out detection methods and principled handling that you report rather than delete silently.

20 Jun 2026

PRISMA: The 2020 Reporting Standard for Systematic Reviews and Meta-Analyses

PRISMA is the Preferred Reporting Items for Systematic reviews and Meta-Analyses, a reporting standard whose 2020 update supplies a 27-item checklist and a flow diagram so that reviews are transparent, complete and reproducible by other researchers.