Definition · Plain-language

Linear regression

Linear regression is a statistical method that models the linear relationship between a dependent variable and one or more independent variables, summarised by a fitted straight line.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

Slope, intercept and the fitted line

Simple linear regression summarises the relationship between two variables as a straight line of the form y = a + bx. The intercept (a) is the predicted value of the outcome when the predictor is zero, and the slope (b) is the average change in the outcome for each one-unit increase in the predictor. The line is usually fitted by the method of least squares, which chooses the slope and intercept that minimise the squared distances between the observed points and the line. The slope is the central result: it quantifies the direction and steepness of the relationship.

R² and model fit

R², the coefficient of determination, measures how much of the variation in the outcome the regression model explains, ranging from 0 (the predictors explain none of the variability) to 1 (they explain all of it). A higher R² means the points lie closer to the fitted line and the model predicts the outcome more precisely. R² should be interpreted alongside the slope and its statistical significance: a model can explain a large share of variance yet still rest on assumptions — linearity, independent errors, constant variance and approximately normal residuals — that must hold for its results to be trustworthy.

Simple vs multiple regression

Simple linear regression uses a single predictor to model the outcome. Multiple linear regression extends this to two or more predictors, each with its own slope (coefficient), allowing the model to estimate the effect of one predictor while holding the others constant. This makes multiple regression valuable for examining several influences at once and for statistical control of confounding. Regression differs from correlation in being directional and predictive: it specifies an outcome and predictors and produces an equation for prediction, whereas correlation simply measures the symmetric strength of association between two variables.

Key facts

At a glance

Definition: models the linear relationship between an outcome and predictors
Fitted line: y = intercept + slope × predictor (least squares)
Slope: average change in outcome per one-unit change in predictor
R²: proportion of variance in the outcome explained (0–1)
Simple: one predictor; Multiple: two or more predictors
Vs correlation: regression is directional and predictive, not symmetric

Common misconceptions

What people often get wrong

Often heard: A high R² means the regression has proved that the predictor causes the outcome.

Actually: R² measures how much variance the model explains, not causation. A strong fit on observational data can still reflect confounding; causal claims require experimental design or careful causal-inference methods.

Often heard: Linear regression and correlation are the same thing.

Actually: Correlation measures the symmetric strength of association between two variables. Regression is directional: it designates an outcome and predictors, estimates slopes, and produces an equation to predict the outcome.

Often heard: Linear regression can be applied to any data without checking anything.

Actually: It assumes a roughly linear relationship, independent observations, constant error variance and approximately normal residuals. Ignoring these assumptions, or extrapolating beyond the observed range, can give misleading slopes and predictions.

Going deeper

Related CASRAI guidance

What is correlation? →What is a correlation coefficient? →Correlation vs causation →What is hypothesis testing? →Statistics hub →Standards dictionary →