Definition · Plain-language

Chi-square test

A chi-square test is a non-parametric statistical test that examines categorical, count-based data by comparing the frequencies observed against the frequencies expected under a hypothesis.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

A test for categorical, count-based data

Unlike t tests and ANOVA, which compare means of numerical data, the chi-square test works with categorical variables — data that fall into groups or categories such as yes/no, blood type or product choice. It is non-parametric, meaning it does not assume the data follow a normal distribution. The test always works from frequencies (counts), not percentages or averages. It compares the counts actually observed with the counts that would be expected under the null hypothesis, and sums the standardised discrepancies into a single χ² statistic.

Goodness-of-fit vs test of independence

The chi-square goodness-of-fit test involves a single categorical variable. It asks whether the observed distribution across categories matches an expected distribution — for instance, whether a die is fair or whether sampled proportions match known population proportions. The chi-square test of independence involves two categorical variables arranged in a contingency table. It asks whether the two variables are associated or independent — for example, whether voting preference is related to age group. Both compare observed against expected counts, but they answer different questions: fit to a distribution versus association between variables.

How the statistic is calculated and read

For each cell, the test takes the difference between the observed and expected count, squares it, divides by the expected count, and sums these values to give the χ² statistic. This is compared against the chi-square distribution using the appropriate degrees of freedom, which depend on the number of categories or the table dimensions. A large χ² indicates the observed pattern departs markedly from what the null hypothesis predicts. The test usually requires reasonably large expected counts per cell; when these are very small, an exact test such as Fisher’s is preferred.

Key facts

At a glance

Definition: a non-parametric test for categorical (frequency) data
Compares: observed counts vs expected counts under the null hypothesis
Goodness-of-fit: one variable vs an expected distribution
Test of independence: association between two categorical variables
Data form: uses raw counts, never percentages or means
Caveat: needs adequate expected counts per cell (else use Fisher’s exact)

Common misconceptions

What people often get wrong

Often heard: A chi-square test can be run on means or percentages.

Actually: It must use raw frequency counts of categorical data. Feeding it percentages, proportions or means violates the test; convert back to the actual observed counts before calculating χ².

Often heard: The goodness-of-fit test and the test of independence are the same thing.

Actually: They differ in structure. Goodness-of-fit checks one variable against an expected distribution; the test of independence checks whether two categorical variables are associated in a contingency table.

Often heard: A significant chi-square test of independence proves one variable causes the other.

Actually: It shows only that the two variables are associated, not that one causes the other. Establishing causation requires controlled experimental design, not a test of association.

Going deeper