Editorial · CASRAI · Reproducibility and computational research

The Chi-Square Test for Categorical Data: A Practical Guide

Reproducibility and computational research

The chi-square test compares observed counts with the counts expected under a hypothesis, making it the standard tool for categorical data. This guide covers goodness-of-fit versus independence, assumptions, and how to interpret results for reproducible research.

ByCASRAI Editorial Board

Published 18 Jun 2026· 4 minute read

The chi-square test is a statistical method for categorical data that compares the frequencies you actually observe with the frequencies you would expect if a given hypothesis were true. The larger the gap between observed and expected counts, the larger the chi-square statistic, and the stronger the evidence against the hypothesis of no relationship. It is the workhorse test for counts, proportions and contingency tables across the social, biological and medical sciences.

Observed versus expected frequencies

Every chi-square test rests on the same intuition. You record how many cases fall into each category (the observed frequencies), then calculate how many should fall there under your null hypothesis (the expected frequencies). The statistic sums the squared difference between observed and expected, divided by expected, across all cells:

chi-square = sum of (observed – expected)squared / expected

A value near zero means observation matches expectation. A large value, evaluated against the chi-square distribution with the appropriate degrees of freedom, produces a small p-value and signals a meaningful departure. For background on interpreting those probabilities, see our explainer on p-values and statistical significance.

Two common forms of the test

There are two principal versions, which answer different questions.

Feature	Goodness-of-fit	Test of independence
Variables	One categorical variable	Two categorical variables
Question	Do observed counts match an expected distribution?	Are the two variables associated?
Data layout	Single row of category counts	Contingency (cross-tabulation) table
Expected counts from	A theoretical or known distribution	Row and column marginal totals
Example	Is a die fair across its six faces?	Is treatment outcome related to dosage group?

The goodness-of-fit test checks whether a single variable follows a hypothesised distribution. The test of independence checks whether two variables in a contingency table are related or vary independently. A closely related variant, the test of homogeneity, asks whether several populations share the same category distribution.

Assumptions and small-sample cautions

The chi-square test relies on a handful of conditions. The data must be frequency counts, not percentages or means. Observations should be independent, with each case appearing in only one cell. And expected counts should be reasonably large: a common rule of thumb is that no cell should have an expected frequency below 5, and ideally all should exceed it. When tables are small or sparse, Fisher’s exact test is the safer choice, and for 2×2 tables Yates’s continuity correction is sometimes applied. Reporting which test variant and corrections were used is part of transparent, replicable analysis, a theme across our reproducibility coverage.

Interpreting and reporting the result

A significant chi-square tells you that an association or departure exists, but not how strong it is. Because the statistic scales with sample size, even trivial differences become significant in very large datasets. For this reason you should accompany the test with a measure of association such as Cramer’s V or the phi coefficient, which behave like an effect size for categorical data. Report the chi-square value, degrees of freedom, sample size and p-value together, for example: chi-square(2, N = 240) = 11.3, p = .003.

Adequate planning matters too. As with mean comparisons in ANOVA, the power to detect a true association depends on having enough observations, a point we expand on in our guide to sample size and statistical power.

Frequently asked questions

When should I use a chi-square test rather than a t-test or ANOVA?

Use chi-square when your outcome is categorical and you are working with counts in categories. Use a t-test or ANOVA when your outcome is a continuous measurement whose means you want to compare across groups.

What is the difference between goodness-of-fit and the test of independence?

Goodness-of-fit examines one variable against an expected distribution. The test of independence examines whether two variables in a contingency table are associated. They share the same formula but answer different questions.

What happens if my expected counts are too small?

The chi-square approximation becomes unreliable when expected cell counts fall below about 5. In that case, combine sparse categories where it makes sense, or use Fisher’s exact test, which is valid for small samples.

Does a significant chi-square tell me how strong the relationship is?

No. It only indicates that a relationship is unlikely to be due to chance. To judge strength, report an association measure such as Cramer’s V alongside the result. The CASRAI dictionary and our author guidance describe the reporting metadata that keeps such analyses auditable.

Related editorial in this domain

More on Reproducibility and computational research

20 Jun 2026

Reporting Molecular Methods: PCR, qPCR and the MIQE Guidelines

PCR and quantitative PCR are core molecular methods, and the MIQE guidelines define what must be reported for results to be reproducible. This guide explains PCR at a high level and the minimum information MIQE requires for transparent qPCR experiments.

20 Jun 2026

Outliers in Statistics: Definition, Detection and Principled Handling

An outlier is a data point that lies an unusual distance from the bulk of a distribution. This guide defines outliers, separates measurement error from genuine extremes, and sets out detection methods and principled handling that you report rather than delete silently.

20 Jun 2026

PRISMA: The 2020 Reporting Standard for Systematic Reviews and Meta-Analyses

PRISMA is the Preferred Reporting Items for Systematic reviews and Meta-Analyses, a reporting standard whose 2020 update supplies a 27-item checklist and a flow diagram so that reviews are transparent, complete and reproducible by other researchers.