Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Editorial · CASRAI · Reproducibility and computational research

Reproducibility of Machine Learning Research

ML reproducibility is the ability to obtain consistent results from the same code, data and configuration. This article explains why ML results are hard to reproduce and the practical standards that help: random seeds, data and model versioning, compute reporting, sharing code and weights, and reproducibility checklists.

ByCASRAI Editorial Board
Published 20 Jun 2026· 4 minute read

Machine-learning (ML) reproducibility is the ability of an independent party to obtain results consistent with a published study using the same code, data and computational configuration. It is a persistent challenge: many ML papers report results that others cannot reproduce, not through misconduct but because critical details, such as random seeds, data versions and compute settings, go unrecorded. Fixing this is a matter of disciplined reporting rather than new science, and a set of practical standards has emerged to make ML results reliably reproducible.

Why ML results are hard to reproduce

Several sources of variation conspire against reproducibility. ML training is inherently stochastic: random weight initialisation, data shuffling and randomised algorithms mean two runs of the same code can yield different models. Results are also sensitive to the exact data version and preprocessing, to hyperparameters, and to the software and hardware environment, since different library versions or GPU behaviour can change outcomes. When a paper omits these, the reported numbers cannot be regenerated. The train/validation/test discipline that guards against inflated results is covered in our explainer on machine learning concepts and methods.

Random seeds and reporting variance

Setting and recording random seeds for every source of randomness makes a single run repeatable. But a fixed seed is not the whole story: because results vary across seeds, robust practice is to report performance across multiple seeds with a measure of spread, not a single best run. This distinguishes a genuine improvement from one that merely got a lucky initialisation.

Data and model versioning

Reproducibility requires knowing exactly which data and which model produced a result. Data versioning records the precise dataset snapshot, including any cleaning, filtering and splits, so the same inputs can be reconstructed. Model versioning records the trained weights and the configuration that produced them. This provenance is the engineering counterpart to the documentation artefacts described in our piece on AI model documentation: datasheets and model cards describe what the data and model are, while versioning lets others retrieve the exact instances used.

Practice What it captures Why it matters
Random seeds All sources of randomness Makes a run repeatable; report across seeds for variance
Data versioning Exact dataset snapshot and splits Lets others reconstruct the same inputs
Model versioning Trained weights and configuration Identifies exactly which model produced a result
Environment reporting Library versions, hardware, compute Controls for software and hardware variation
Shared code and weights The implementation itself Enables direct re-execution and scrutiny

Environment and compute reporting

Results depend on the computational environment, so reproducible studies report the software stack (framework and library versions), the hardware used (such as the GPU type and count), and the compute budget, including training time and the number of runs. Capturing dependencies, for example through a pinned environment file or a container, lets others recreate the conditions rather than guess at them. Compute reporting also supports honest comparison, since a method that wins only with vastly more compute is a different claim from one that wins under equal budgets.

Sharing code, weights and reproducibility checklists

The single most effective step is to share the code and trained weights alongside the paper, so reviewers and readers can re-run the experiments directly. To make expectations concrete, the community has adopted reproducibility checklists, such as the machine-learning reproducibility checklist used by major conferences, which prompt authors to confirm that they have reported data, code, hyperparameters, compute and statistical significance. Treating these checklists as standard practice raises the floor for the whole field. We track these standards across our AI and ML research outputs coverage, with shared terminology anchored in the casrai.org research dictionary and contribution credit handled through CRediT.

Frequently asked questions

Why are machine-learning results often hard to reproduce?

Because training is stochastic and results depend on random seeds, exact data versions, hyperparameters and the software and hardware environment. When papers omit these details, the reported numbers cannot be regenerated.

Is setting a random seed enough for reproducibility?

No. A fixed seed makes one run repeatable, but because results vary across seeds, robust practice is to report performance over multiple seeds with a measure of spread, not a single best run.

What is a reproducibility checklist?

It is a structured list, adopted by major ML venues, that prompts authors to confirm they have reported data, code, hyperparameters, compute and statistical significance, raising the baseline standard for the field.

What is the single most effective reproducibility step?

Sharing the code and trained weights alongside the paper, together with the exact data and environment, so that others can directly re-run and scrutinise the experiments.

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →