Definition · Plain-language

Correlation coefficient

A correlation coefficient measures the strength and direction of the linear relationship between two variables; Pearson’s r ranges from −1 (perfect negative) through 0 (none) to +1 (perfect positive).

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

The −1 to +1 scale

A correlation coefficient is a single number summarising how two variables move together. Pearson’s r, the most widely used, is bounded between −1 and +1. The sign indicates direction: a positive value means the variables rise together, while a negative value means one falls as the other rises. The magnitude indicates strength: values near ±1 indicate a tight linear relationship in which the points lie close to a straight line, while values near 0 indicate little or no linear relationship. An r of exactly +1 or −1 means the points fall perfectly on a line; an r of 0 means no straight-line trend at all.

Judging strength and direction

Interpreting the magnitude depends on context and discipline, but a common informal guide treats values around ±0.1 as weak, ±0.3 as moderate and ±0.5 or above as strong, with the social sciences often tolerating smaller correlations than the physical sciences. The crucial point is that strength and direction are read separately: an r of −0.8 describes a strong relationship, just a negative one, and is stronger than an r of +0.4. Because r measures only the linear component, it should always be interpreted alongside a scatterplot, which reveals curvature, clusters or influential points that a single number conceals.

The coefficient of determination, r²

Squaring the correlation coefficient gives r², the coefficient of determination. Conceptually, r² represents the proportion of the variance in one variable that is shared with, or statistically explained by, the other in a linear relationship. An r of 0.7, for instance, gives r² = 0.49, meaning about 49% of the variation is shared. Because squaring removes the sign, r² is always between 0 and 1 and says nothing about direction. It usefully tempers interpretation: a respectable-looking r of 0.5 corresponds to only r² = 0.25, so just a quarter of the variation is accounted for.

Linear only, and not causation

Two important limits govern the correlation coefficient. First, Pearson’s r captures only linear relationships. A strong but curved relationship can yield an r near zero, so a low coefficient does not prove the variables are unrelated — only that they are not linearly related. Second, and most fundamentally, correlation does not imply causation. A high correlation may arise because one variable causes the other, because both are driven by a confounding third variable, or by coincidence. Establishing cause requires controlled study designs, not a correlation coefficient alone, however large it is.

Key facts

At a glance

Definition: a measure of the strength and direction of a linear relationship
Common form: Pearson’s correlation coefficient, r
Range: −1 to +1, where 0 means no linear relationship
Sign: positive means variables rise together; negative means they move oppositely
r squared: the proportion of shared (explained) variance, between 0 and 1
Key caveat: measures linear association only and does not imply causation

Common misconceptions

What people often get wrong

Often heard: A high correlation coefficient proves that one variable causes the other.

Actually: Correlation does not imply causation. A strong r can reflect a confounding variable or coincidence; establishing cause needs a controlled design, not a correlation alone.

Often heard: A correlation coefficient near zero means the two variables are unrelated.

Actually: It means no linear relationship. A strong non-linear (for example, U-shaped) relationship can still produce an r near zero, which is why a scatterplot should always be inspected.

Often heard: A negative correlation is weaker or worse than a positive one.

Actually: The sign only gives direction. An r of −0.8 indicates a stronger relationship than +0.4; strength is read from the magnitude, regardless of sign.

Going deeper

Related CASRAI guidance

Standard deviation vs variance →Descriptive vs inferential statistics →Hypothesis testing →Research methods hub →Plain-language explainers →Standards dictionary →