Tag: statistical significance

  • Confidence Intervals in Research, Explained Precisely

    A confidence interval is a range of values, calculated from sample data, that is designed to contain the true value of an unknown population parameter with a stated level of confidence. A 95% confidence interval is produced by a procedure that, over many repeated samples, would capture the true parameter in about 95% of those intervals. It conveys both an estimate of the parameter and the uncertainty around that estimate, expressed as the width of the interval.

    The correct interpretation

    The confidence level is a property of the long-run procedure, not of any single interval. Once a specific interval has been calculated, the true parameter either lies inside it or it does not; there is no probability left to assign. It is therefore incorrect to say there is a 95% probability that the parameter lies within a particular calculated interval. The accurate statement is that if the study were repeated many times and an interval computed each time, about 95% of those intervals would contain the true value. This frequentist interpretation is subtle but important, and misstating it is one of the most common errors in applied statistics.

    Statement Correct?
    95% of intervals from repeated samples contain the true parameter Yes
    There is a 95% probability this specific interval contains the parameter No
    The interval shows a range of plausible values for the parameter Yes, a reasonable informal reading
    95% of the data fall within the interval No, that confuses it with a data range

    Width, precision and sample size

    The width of a confidence interval reflects the precision of the estimate. A narrow interval indicates a precise estimate; a wide one signals substantial uncertainty. Width depends chiefly on the variability in the data and on the sample size. Larger samples generally produce narrower intervals because the standard error shrinks as the sample grows. Raising the confidence level, say from 95% to 99%, widens the interval, because demanding greater confidence requires admitting a broader range of plausible values.

    Relationship to statistical significance

    Confidence intervals and significance tests are closely linked. For a comparison such as a difference between two means, if a 95% confidence interval for the difference excludes zero, the result is statistically significant at the 0.05 level; if the interval includes zero, it is not. The interval therefore conveys the same information as a p-value while adding crucial context: the estimated size of the effect and the range of values compatible with the data.

    Why intervals are often more informative

    Reporting a confidence interval communicates more than a bare p-value because it shows magnitude and precision together. A result may be statistically significant yet have an interval spanning only trivial effects, or be non-significant yet have an interval wide enough to include important ones. Many methodologists, including the authors of the American Statistical Association’s 2016 guidance on p-values, encourage reporting estimates with intervals rather than relying on significance thresholds alone. This practice supports clearer interpretation and stronger reproducibility, themes tracked in our reproducibility category. The underlying methods belong to the broader discipline of statistics, and consistent reporting terminology is documented in the CASRAI dictionary.

    Frequently asked questions

    What does a 95% confidence interval really mean?

    It means that the method used to build the interval would capture the true population value in about 95% of repeated samples. It is not a 95% probability that the true value lies in one particular calculated interval.

    Does a narrower interval always mean a better study?

    A narrow interval indicates a precise estimate, usually from a large or low-variability sample, but precision is not the same as validity. A precise estimate from a biased study can still be wrong. Width describes uncertainty from sampling, not freedom from bias.

    Should I report a confidence interval or a p-value?

    Where possible, report an effect estimate with its confidence interval, optionally alongside a p-value. The interval shows both the size and the precision of the effect, which is generally more informative for readers. See the CASRAI author guidance for reporting recommendations.

  • P-Values and Statistical Significance Explained Correctly

    A p-value is the probability of obtaining a result at least as extreme as the one observed, assuming that the null hypothesis is true. It is a measure of how compatible the data are with a specified statistical model in which there is no effect or no difference. A small p-value indicates that the observed data would be unusual if the null hypothesis held; it does not, by itself, prove that the null hypothesis is false or that an effect is real or important.

    What the null hypothesis represents

    Hypothesis testing begins with a null hypothesis, typically a statement of no effect, no difference or no association. The test asks how surprising the observed data would be if that null hypothesis were true. The p-value quantifies that surprise: the smaller it is, the less compatible the data are with the null model. Critically, the p-value is calculated under the assumption that the null is true, which is why it cannot be read as the probability that the null is true.

    The American Statistical Association’s 2016 statement

    In 2016 the American Statistical Association (ASA) published a formal statement on p-values, the first time it had issued such guidance, in response to widespread misuse. The statement set out six principles. In summary, it affirmed that p-values can indicate how incompatible data are with a specified model, but warned that a p-value does not measure the probability that the hypothesis under study is true, nor the probability that the data arose by chance alone. It cautioned that scientific conclusions should not be based only on whether a p-value passes a threshold, that proper reporting requires full transparency, that a p-value does not measure the size or importance of an effect, and that by itself a p-value is a poor measure of evidence regarding a model or hypothesis.

    Common misinterpretations

    Several persistent errors surround p-values. Avoiding them is essential for sound, reproducible reporting.

    Misinterpretation Why it is wrong
    The p-value is the probability the null hypothesis is true It is calculated assuming the null is true; it cannot also be that probability
    p = 0.05 means a 5% chance the result is a fluke The p-value is not the probability that the finding is due to chance
    A non-significant result proves no effect exists Absence of significance is not evidence of absence; the study may simply lack power
    A small p-value means a large or important effect The p-value reflects compatibility and sample size, not effect magnitude

    The limits of the 0.05 convention

    The threshold of 0.05 for declaring statistical significance is a convention, not a law of nature. Treating 0.05 as a bright line encourages dichotomous thinking in which a result at p = 0.049 is celebrated and one at p = 0.051 dismissed, despite negligible difference between them. This convention has fed practices such as selective reporting and p-hacking, where analyses are adjusted until a result crosses the threshold, both serious threats to reproducibility. The ASA statement explicitly warned against basing conclusions solely on whether a p-value clears a cut-off.

    Effect sizes and intervals

    Because a p-value says nothing about magnitude, it should be accompanied by an effect size, which describes how large the observed effect is, and ideally a confidence interval, which expresses the precision of the estimate. Reporting these alongside, or instead of, a bare p-value gives readers far more information for judging whether a finding matters. The underpinning ideas come from the wider discipline of statistics, and transparent reporting of all of them supports the goals tracked in our reproducibility category. For terminology and reporting conventions, consult the CASRAI dictionary.

    Frequently asked questions

    Does a p-value below 0.05 prove an effect is real?

    No. It indicates the data would be unusual if the null hypothesis were true, but it does not prove the null is false, nor that the effect is large or important. Replication, effect sizes and intervals are needed to judge that.

    What did the ASA 2016 statement conclude?

    The statement set out six principles emphasising that p-values measure compatibility with a model, are not the probability the hypothesis is true, do not measure effect size, and should never be the sole basis for scientific conclusions. It urged full transparency in reporting.

    Should we abandon p-values altogether?

    Not necessarily. P-values can be informative when interpreted correctly and reported alongside effect sizes and confidence intervals. The problem lies in misuse and over-reliance on a single threshold, not in the statistic itself. See the CASRAI author guidance for reporting practices.