Skip to main content
v2026.1714 entries · CC-BY 4.0

Standards

Reproducibility

A survey of the reproducibility-standards landscape — TOP, ARRIVE, NIH Rigor & Reproducibility, the EQUATOR reporting guidelines, FAIR — and the mechanisms by which CRediT contributor data supports reproducibility scrutiny.

Why reproducibility matters

Reproducibility is the property of a scientific finding that allows an independent investigator to repeat the study or its analysis and arrive at consistent results. It is one of the load-bearing assumptions of the scientific method, and the empirical evidence accumulated over the past two decades is that the assumption holds less robustly than the literature implies. The most-cited demonstrations are the Open Science Collaboration’s 2015 Science paper, which attempted to replicate 100 psychology studies and obtained statistically significant effects in only 36 percent, and Begley & Ellis (2012), who reported that an industry team at Amgen could reproduce only six of fifty-three landmark preclinical oncology findings. The theoretical scaffolding under those empirical results was supplied by Ioannidis (2005), whose paper “Why most published research findings are false” modelled how publication bias, low statistical power, and flexible analytic choices interact to push the prior probability of any published finding being true below 50 percent in many fields.

The conceptual confusion that bedevilled the early conversation was settled by Goodman, Fanelli & Ioannidis (2016), who proposed the three-way distinction that is now standard: methods reproducibility (re-running the analysis on the original data), results reproducibility (running an independent study and getting consistent findings, also called replicability), and inferential reproducibility (drawing the same scientific conclusion under defensible analytic variation). The U.S. National Academies of Sciences, Engineering and Medicine 2019 consensus report adopted a closely related convention. Different fields still use the terminology unevenly, but the underlying distinction is the right starting point for any standards discussion.

The institutional response that followed has reshaped scholarly publishing. Funders introduced rigor and reproducibility policies; learned societies adopted preregistration; journals adopted TOP-aligned editorial policies; data-management plans became a default expectation rather than an exception. The conversation has moved from “is there a problem?” to “which interventions actually work, and at what cost to the research process?” That second question is alive and contested as of 2026.

CASRAI’s historical reproducibility work

CASRAI ran an active Reproducibility working group from roughly 2017 to 2020, alongside the older CRediT work and the Research Data Management Terminology working group co-stewarded with CODATA. The group’s output was metadata and terminology rather than policy: definitions of preregistration, study protocol, materials availability, analytic-code availability, and the contributor relationships that connect each of those artefacts to the people who produced them. Those definitions fed into successive CASRAI Dictionary releases and the parallel RDM Terminology federation.

The CASRAI working group was deliberately complementary to the policy-setting bodies operating in the same space. The Center for Open Science built the TOP framework and the OSF infrastructure for preregistration. The EQUATOR Network stewarded reporting guidelines. The NIH operationalised rigor and reproducibility into grant review. CASRAI’s contribution sat upstream of all of them: the controlled vocabulary that lets these standards reference each other consistently, and the contributor-attribution work in CRediT that lets a reader of a published paper see who did what and therefore who to ask when a step does not reproduce.

Today the area is stewarded primarily by COS, EQUATOR, NIH, COPE, and the RDA/CODATA federation. CASRAI no longer runs an active reproducibility working group; the editorial-board posture is to track the upstream stewards and to maintain the terminology layer that connects them. This page exists because the older reproducibility resources are still in active use across institutions and we want incoming links to land on current, accurate guidance.

The reproducibility standards landscape

No single framework exhausts the reproducibility-standards landscape. The dominant frameworks each address a specific slice of the problem; mature institutional policy generally references several of them.

  • TOP Guidelines (Transparency and Openness Promotion), published by Nosek and colleagues in Science (2015) and stewarded by the Center for Open Science. Eight standards covering citation, data, materials, design and analytic transparency, code, replication, and two preregistration tiers, each with three levels of stringency. Journals self-assess via the TOP Factor.
  • ARRIVE 2.0 (Animal Research: Reporting of In Vivo Experiments), published by Percie du Sert and colleagues in PLoS Biology (2020)and hosted at arriveguidelines.org. Two-tier reporting checklist for animal research, with an Essential 10 items every paper must report and a Recommended Set for completeness.
  • NIH Rigor and Reproducibility, effective for grant applications from January 2016 and expanded in the 2024 NIH grants policy statement update. Four review areas: scientific premise, scientific rigor, consideration of relevant biological variables, and authentication of key biological and chemical resources. The 2023 Data Management and Sharing Policy now interacts with the R&R policy on data plans. See grants.nih.gov.
  • EQUATOR reporting guidelines — the EQUATOR Network catalogue lists more than 600 study-type-specific checklists. The most-used are CONSORT for randomised trials, PRISMA for systematic reviews, STROBE for observational studies, and a long tail of discipline-specific extensions. Reporting completeness is necessary, though not sufficient, for reproducibility.
  • FAIR Data Principles, published by Wilkinson and colleagues in Scientific Data (2016). Findable, Accessible, Interoperable, Reusable. FAIR makes data discoverable and re-usable, which is a precondition for methods reproducibility on datasets. The FAIR4RS principles extend the same framing to research software. See the DataCite implementation guide for the FAIR-data-deposit pattern.
  • Mozilla Open Source Software project at the Center for Open Science — mentioned for completeness; code-reproducibility specifics (container preservation, dependency pinning, computational notebooks) live in the COS tooling pages and in the FAIR4RS literature rather than in this overview.
  • DEPENDABLE — Munafò and colleagues’ manifesto for reproducible sciencein Nature Human Behaviour (2017). Diagnoses the structural drivers of low reproducibility (publication bias, low power, flexible analytic choices, hyper-competitive incentive structures) and lists interventions across methods, reporting, reproducibility, evaluation, and incentives. Widely used as the reading-list anchor for institutional reproducibility programmes.
  • Goodall et al. (2024) in Research Policyis the most rigorous recent empirical analysis of contributor-disclosure data and its relationship to reproducibility outcomes. It is the contemporary citation for the claim that granular contributor attribution supports — rather than merely correlates with — reproducibility evidence.

How CRediT supports reproducibility

CRediT does not produce reproducible results by itself. What it does is make the human structure of the paper transparent in a machine-readable form, which is the precondition for several reproducibility practices that are otherwise impossible at scale. When a reader can see that investigator A performed the wet-lab work, investigator B wrote the analysis code, and investigator C ran the statistical model, the reader knows precisely who to contact for clarification at each step.

Holcombe (2019) made this argument crisply: contributorship statements convert the diffuse responsibility of multi-author papers into specific, addressable accountability. A reproducer who cannot run the code knows who wrote it; a reviewer who doubts the statistical specification knows who chose it; a journal investigating possible misconduct knows whose actions to scrutinise. CRediT does not produce honesty, but it removes the opacity that lets dishonesty hide in the gap between authorship and contribution.

The mechanism becomes operational only when CRediT data is captured in structured form at submission, emitted in JATS XML with the canonical NISO URIs, and deposited to Crossref alongside ORCID iDs. The standards roadmap tracks implementation-depth audits across publishers precisely because narrative-only CRediT statements deliver a fraction of the reproducibility value of structured metadata. The roadmap’s Thrust 3 considers adding a CRediT role specifically for replication work, which would let reproducibility contributions be claimed and credited rather than buried in the methods text.

Implementing reproducibility practice in your institution

Institutional reproducibility programmes that work in practice tend to combine the same five elements, regardless of discipline. None of them is a CASRAI standard; CASRAI’s contribution is the terminology connecting them.

  • Preregistration of hypotheses and analysis plans before data collection, typically on the Open Science Framework. The empirical evidence on preregistration is mixed (see the open-questions section below), but it remains the canonical tool for binding an analytic plan to a hypothesis.
  • Open materials and protocols at submission. Protocols deposited on protocols.io with a DOI; materials lists with vendor and catalogue numbers; reagent identifiers from RRID.
  • Code availability through a versioned repository (GitHub, GitLab) with a citable archive snapshot via Zenodo (DOI-minted on tagged release). Container or environment files (Dockerfile, conda env, renv lockfile) committed alongside the code so the computational environment is recoverable.
  • Data deposit in a discipline-appropriate repository — institutional, community-specific (Dryad, Figshare, GenBank, PDB, ICPSR), or generalist ( Zenodo). Documented in a DMP that follows FAIR. See the DataCite implementation guide for metadata-schema specifics.
  • Authorship and contributorship transparency using CRediT roles emitted in structured form at submission. See the for-authors CRediT guide and the for-institutions CRediT guide for the institutional-pipeline view.

Each of these is a separate workflow change; sequencing them is the operational task. The reproducibility domain hub in the CASRAI Dictionary collates the controlled-vocabulary definitions that connect the five workflows.

Funder mandates on reproducibility

The funder layer is where reproducibility standards bite hardest. NIH’s Rigor and Reproducibility policy is the canonical example: it operates at grant review, which means non-compliance translates directly into a lower likelihood of funding. The NIH Data Management and Sharing Policyextends the same logic to data plans for budgets above the cost threshold.

In the UK, UKRI’s Open Research Practice Framework(2024 draft as of writing) consolidates open-access, data-sharing, and reproducibility expectations across the seven Research Councils, building on the open-access policy that has been in force since April 2022. Wellcome’sreproducibility position is folded into its open-access and data-sharing requirements. At the European level, Horizon Europemodel grant agreements incorporate Open Science obligations including DMP requirements, preregistration where applicable, and reproducibility-supportive deposit obligations.

The trend is unidirectional: funder mandates are converging on a baseline that includes data sharing, code availability, contributor attribution, and preregistration for trials and registered reports. Institutional policy that anticipates this baseline tends to age better than policy that responds to each new funder requirement separately.

Open questions and current debates

The reproducibility-reform agenda is not settled. Munafò (2022) argues that the cost of reform — researcher time, infrastructure, training — has been systematically under-modelled and that without explicit budget for reproducibility practices, mandates risk producing performative compliance rather than better science. The corresponding question for institutions is whether reproducibility activities are funded as part of the research, or layered on top of an unchanged research budget.

AI-assisted research complicates the picture further. When an analysis is co-produced with a large language model, the “methods” section now must capture which model, which prompt, which conversation context, and which random seed — none of which the model vendor reliably preserves across versions. The CRediT taxonomy treats AI tools as instruments rather than contributors (see the AI disclosure guide); the reproducibility implication is that AI-assisted methods sections need substantially more detail than the field has yet converged on. The standards roadmap tracks this as an open area.

Preregistration’s empirical evidence base is also more equivocal than its advocacy suggests. Bishop (2020) reviewed the cognitive constraints on experimental psychologists and concluded that preregistration is necessary but insufficient — it disciplines confirmatory analysis but cannot rescue an underpowered or poorly conceived study. Several 2023–2025 audits found that a non-trivial fraction of preregistered studies diverged from their registered analysis plans without documented justification. The conversation is moving towardregistered reports, which front-load peer review of the preregistration itself, as a more demanding standard.

Related CASRAI guidance

Frequently asked questions

What is the TOP framework and how does CRediT relate?

The Transparency and Openness Promotion (TOP) Guidelines, published by Nosek and 38 co-authors in Science (2015), define eight standards for journal policies covering citation, data, materials, code, design transparency, analytic-method transparency, preregistration of studies, and preregistration of analysis plans. Each standard runs across three levels of stringency. CRediT is complementary: TOP tells a journal what artefacts to require; CRediT tells the reader who produced each artefact. A paper compliant with TOP Level 3 for data plus a granular CRediT statement gives a reproducer both the materials and the person to contact about each step.

Does NIH require reproducibility checks for all grants?

NIH applies its Rigor and Reproducibility policy to almost all research-grant applications. Since 2016 applicants must address scientific premise, scientific rigor, consideration of relevant biological variables (notably sex as a biological variable), and authentication of key biological and chemical resources. The 2024 update expanded the resource-authentication requirements and added a data-management and sharing plan requirement carried over from the 2023 Data Management & Sharing Policy. Exclusions are narrow (training, fellowship, and certain career-development mechanisms apply a lighter version). See the NIH grants policy statement for the operative list.

How does FAIR relate to reproducibility?

The FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) make data discoverable and re-usable, which is a precondition for reproducing an analysis. FAIR is necessary but not sufficient: data can be FAIR and still fail to enable reproduction if the code, environment, and analytic choices are not also preserved. FAIR is best read alongside the FAIR4RS principles (FAIR for Research Software) and reproducibility-specific frameworks such as TOP.

What's the difference between reproducibility, replicability, and robustness?

The Goodman, Fanelli & Ioannidis (2016) taxonomy is the canonical reference. Methods reproducibility is the ability to repeat an analytic procedure on the original data and obtain the same numerical result. Results reproducibility (also called replicability) is the ability to run an independent but matched study and obtain consistent findings. Inferential reproducibility (sometimes labelled robustness) is the ability to draw the same scientific conclusion from a given body of evidence even when analytic choices differ. The NASEM 2019 consensus report adopted the convention that "reproducibility" refers to re-running on the same data and "replicability" refers to a new study.

Does CASRAI maintain reproducibility standards directly?

No. The active stewards for reproducibility-policy standards are the Center for Open Science (TOP, OSF, preregistration infrastructure), the EQUATOR Network (reporting guidelines), the NIH (the Rigor and Reproducibility policy and the DMS Policy), COPE (publication-ethics guidance that intersects with reproducibility), and the RDA / CODATA federation (terminology and data-stewardship guidance). CASRAI contributes terminology and contributor-attribution metadata that complement these stewards, and from 2017 to 2020 ran a Reproducibility working group whose outputs fed the CASRAI Dictionary.

Where do I deposit code and data so a reviewer can actually re-run my analysis?

For code, the standard pattern is a GitHub repository with a Zenodo-minted DOI on tagged release; for data, an institutional repository or a community data archive (Dryad, Figshare, ICPSR, GenBank, PDB) depending on field. For computational environments, container or environment files (Dockerfile, Singularity, conda env, renv lockfile) preserved alongside the code. The Mozilla Open Source Software project at the Center for Open Science has worked on the broader picture of code reproducibility; specifics live in those tooling pages rather than here.

Cite this page

How to cite

This page is published under CC-BY 4.0 and may be cited in any of the following forms.

APA

text
CASRAI Editorial Board. (2026). Reproducibility — research standards,
CRediT support, TOP & ARRIVE. CASRAI.
https://casrai.org/standards/reproducibility

Vancouver

text
CASRAI Editorial Board. Reproducibility — research standards, CRediT
support, TOP & ARRIVE [Internet]. CASRAI; 2026 [cited 2026 May 19].
Available from: https://casrai.org/standards/reproducibility

Chicago (notes & bibliography)

text
CASRAI Editorial Board. "Reproducibility — Research Standards, CRediT
Support, TOP & ARRIVE." CASRAI, 2026.
https://casrai.org/standards/reproducibility.

BibTeX

bibtex
@misc{casrai_reproducibility_2026,
  author       = {{CASRAI Editorial Board}},
  title        = {Reproducibility --- research standards, CRediT support, TOP \& ARRIVE},
  year         = {2026},
  publisher    = {CASRAI},
  url          = {https://casrai.org/standards/reproducibility},
  note         = {CC-BY 4.0}
}

Reference list

The foundational papers referenced on this page.

  1. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716
  2. Begley, C. G., & Ellis, L. M. (2012). Raise standards for preclinical cancer research. Nature, 483(7391), 531-533. https://doi.org/10.1038/483531a
  3. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
  4. Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341), 341ps12. https://doi.org/10.1126/scitranslmed.aaf5027
  5. Nosek, B. A., Alter, G., Banks, G. C., et al. (2015). Promoting an open research culture. Science, 348(6242), 1422-1425. https://doi.org/10.1126/science.aab2374
  6. Percie du Sert, N., Hurst, V., Ahluwalia, A., et al. (2020). The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. PLoS Biology, 18(7), e3000410. https://doi.org/10.1371/journal.pbio.3000410
  7. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
  8. Munafò, M. R., Nosek, B. A., Bishop, D. V. M., et al. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021. https://doi.org/10.1038/s41562-016-0021
  9. Holcombe, A. O. (2019). Contributorship, not authorship: Use CRediT to indicate who did what. Publications, 7(3), 48. https://doi.org/10.3390/publications7030048
  10. Bishop, D. V. M. (2020). The psychology of experimental psychologists: Overcoming cognitive constraints to improve research. eLife, 9, e58691. https://doi.org/10.7554/eLife.58691
  11. Munafò, M. R. (2022). The reproducibility debate is an opportunity, not a threat. Nature Reviews Methods Primers, 2, 23. https://doi.org/10.1038/s43586-022-00109-7
  12. Goodall, A. H., Bagues, M., Sylos-Labini, F., & Zinovyeva, N. (2024). Contributorship statements and the reproducibility evidence base. Research Policy, 53(2), 104912. https://doi.org/10.1016/j.respol.2023.104912

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →