Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Definition · Plain-language

Linear regression

Linear regression is a statistical method that models the linear relationship between a dependent variable and one or more independent variables, summarised by a fitted straight line.

CASRAI research-methods explainer — Linear regression

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

Slope, intercept and the fitted line

Simple linear regression summarises the relationship between two variables as a straight line of the form y = a + bx. The intercept (a) is the predicted value of the outcome when the predictor is zero, and the slope (b) is the average change in the outcome for each one-unit increase in the predictor. The line is usually fitted by the method of least squares, which chooses the slope and intercept that minimise the squared distances between the observed points and the line. The slope is the central result: it quantifies the direction and steepness of the relationship.

R² and model fit

R², the coefficient of determination, measures how much of the variation in the outcome the regression model explains, ranging from 0 (the predictors explain none of the variability) to 1 (they explain all of it). A higher R² means the points lie closer to the fitted line and the model predicts the outcome more precisely. R² should be interpreted alongside the slope and its statistical significance: a model can explain a large share of variance yet still rest on assumptions — linearity, independent errors, constant variance and approximately normal residuals — that must hold for its results to be trustworthy.

Simple vs multiple regression

Simple linear regression uses a single predictor to model the outcome. Multiple linear regression extends this to two or more predictors, each with its own slope (coefficient), allowing the model to estimate the effect of one predictor while holding the others constant. This makes multiple regression valuable for examining several influences at once and for statistical control of confounding. Regression differs from correlation in being directional and predictive: it specifies an outcome and predictors and produces an equation for prediction, whereas correlation simply measures the symmetric strength of association between two variables.

Key facts

At a glance

  • Definition: models the linear relationship between an outcome and predictors
  • Fitted line: y = intercept + slope × predictor (least squares)
  • Slope: average change in outcome per one-unit change in predictor
  • R²: proportion of variance in the outcome explained (0–1)
  • Simple: one predictor; Multiple: two or more predictors
  • Vs correlation: regression is directional and predictive, not symmetric

Common misconceptions

What people often get wrong

Often heard: A high R² means the regression has proved that the predictor causes the outcome.

Actually: R² measures how much variance the model explains, not causation. A strong fit on observational data can still reflect confounding; causal claims require experimental design or careful causal-inference methods.

Often heard: Linear regression and correlation are the same thing.

Actually: Correlation measures the symmetric strength of association between two variables. Regression is directional: it designates an outcome and predictors, estimates slopes, and produces an equation to predict the outcome.

Often heard: Linear regression can be applied to any data without checking anything.

Actually: It assumes a roughly linear relationship, independent observations, constant error variance and approximately normal residuals. Ignoring these assumptions, or extrapolating beyond the observed range, can give misleading slopes and predictions.

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →