Dictionary domain · Track D · Active
Reproducibility
Operational terminology for the practices, standards, and metadata that support reproducibility, replicability, and robustness of research outputs.
For implementers
Operational deployment checklist for the Reproducibility vocabulary: prerequisites, five deploy steps, integration notes for DSpace, Pure, OJS, Software Heritage, and OSF, plus the pitfalls that recur in the field.
What this domain covers
The Reproducibility domain assembles the vocabulary that lets funders, journals, institutions, and researchers talk about the same things when they invoke “reproducible research”. Its remit spans six adjacent areas that are routinely conflated in practice.
- Outcome distinctions — reproducibility, replicability, and robustness as three formally separate constructs, following the National Academies framing (Goodman, Fanelli, and Ioannidis 2016; NASEM 2019).
- Reporting standards — the TOP, ARRIVE, CONSORT, PRISMA, and STROBE families curated by the EQUATOR Network and the Center for Open Science.
- Methods transparency — preregistration, registered reports, protocol publication, and the data availability statement as the now-canonical disclosure surface.
- Outputs — datasets, code, materials, and computational environments captured as containers, Jupyter notebooks, and workflow artefacts under FAIR4RS-compatible practice.
- Assessment — replicability rate, robustness checks, multiverse analysis, and specification curve analysis as the techniques that quantify how stable a finding actually is.
- Governance — the Reproducibility Crisis discourse and its intersection with the research-integrity domain, including the boundary between “non-replication” and “misconduct”.
19 entries
Terms in this domain
Each entry has an operational definition, worked examples, related terms, and stable URIs. Pages link only where the term entry exists; the rest are populated as the working group ratifies each definition.
- Reproducibility
- Obtaining consistent results using the same input data, computational steps, methods, and conditions of analysis (NASEM 2019).
- Replicability
- Obtaining consistent results across studies aimed at answering the same question with new data or methods.
- Robustness
- Stability of a finding under reasonable variation in modelling choices, samples, or analytic specification.
- Pre-registration
- Time-stamped declaration of hypotheses, design, and analysis plan prior to data observation.
- Registered report
- A two-stage publication format in which a protocol is peer-reviewed and accepted in principle before results exist.
- Data availability statement
- Mandatory disclosure declaring where, how, and under what conditions underlying data are accessible.
- Computational reproducibility
- Bit-for-bit or equivalent re-execution of an analysis from the same code, data, and computational environment.
- Methods transparency
- Reporting of materials, procedures, and decisions in sufficient detail to permit independent scrutiny.
- Open materials
- Public release of study materials (instruments, stimuli, protocols) under a reuse licence.
- Open code
- Public release of analysis code under an OSI-approved licence, ideally archived with a persistent identifier.
- Protocol publication
- Formal, citable publication of a study protocol prior to or alongside primary results.
- Multiverse analysis
- Reporting the distribution of results across the full set of defensible analytic specifications (Steegen et al. 2016).
- Robustness check
- An auxiliary analysis demonstrating that the headline result survives a substantive change in specification.
- Specification curve analysis
- A systematic descriptive technique that visualises results across all reasonable analytic specifications.
- Replication study
- A study explicitly designed to re-test the inference of a prior study, classified as direct or conceptual.
- Reporting guideline
- A structured checklist defining the minimum reporting required for a study design (ARRIVE, CONSORT, PRISMA, STROBE, etc.).
- Authentication of key resources
- NIH Rigor & Reproducibility requirement to validate the identity and quality of biological and chemical resources used.
- Scientific rigor
- The strict application of the scientific method to ensure unbiased, well-controlled experimental design (NIH).
- Reproducibility crisis
- Discourse term for the empirically observed gap between published findings and successful replication across fields.
Term pages are populated as the Reproducibility working group reviews each definition. See /dictionary/contribute to propose a term.
Cross-domain reference: FAIR principles assessment (Data infrastructure domain).
45 live entries
All terms in this domain
The complete, live list of Reproducibility terms held in the CASRAI Dictionary, pulled from the published taxonomy. Each links to its full term page with the operational definition, related terms, and stable URIs.
Scientific rigour
The strict application of the scientific method to ensure unbiased and well-controlled experimental design, methodology, analysis, interpretation, and reporting of results.
Authentication of key resources
The verification, by methods appropriate to the resource type, of the identity and integrity of biological and chemical materials used in research, including cell lines, antibodies, animal models, and specialty chemicals.
NIH Rigor and Reproducibility policy
The set of US National Institutes of Health policies, effective from 2016, requiring applicants and grantees to address scientific premise, scientific rigour, biological variables (including sex as a biological variable), and authentication of key biological and chemical resources in grant applications.
STROBE
The Strengthening the Reporting of Observational Studies in Epidemiology guidelines, a 22-item checklist covering items that should be reported in cohort, case-control, and cross-sectional studies.
PRISMA 2020
The 2020 update of the Preferred Reporting Items for Systematic reviews and Meta-Analyses, a 27-item checklist with accompanying flow diagram for reporting systematic reviews.
CONSORT 2010
The 2010 edition of the Consolidated Standards of Reporting Trials, a 25-item checklist and participant-flow diagram covering items that should be reported in any randomised controlled trial publication.
ARRIVE 2.0
The 2020 revision of the Animal Research: Reporting of In Vivo Experiments guidelines, a checklist of items that should be reported in any publication describing animal research, in order to enable assessment and replication.
TOP Guidelines
The Transparency and Openness Promotion Guidelines, an eight-standard framework for journal policies covering citation, data, materials, code, design, analysis, pre-registration, and replication.
Crowdsourced replication
A coordinated effort in which many independent laboratories or teams attempt to replicate the same set of studies under pre-specified protocols, in order to estimate field-wide replicability.
Many-analysts study
A study design in which a single dataset and research question are given to multiple independent analysts or teams who proceed without coordination, and the distribution of their conclusions is then compared.
Researcher degrees of freedom
The decisions an analyst makes during a study (inclusion criteria, outcome definition, model specification, covariate set) any of which, if made differently, would yield a different result.
Garden of forking paths
The Gelman-Loken metaphor for the implicit, data-dependent multiplicity of analytical choices made in the course of an empirical study, even by analysts not engaged in explicit p-hacking.
Forking paths
The phenomenon by which the cumulative effect of many small, data-contingent analytical choices inflates false-positive rates even when each individual choice appears defensible.
HARKing (Hypothesising After Results are Known)
Presenting a post-hoc hypothesis, formulated after data analysis, as if it had been the a priori hypothesis under test.
P-hacking
The practice of selectively reporting or adjusting analytical choices in order to obtain a statistically significant p-value, typically below the conventional 0.05 threshold.
Reproducibility audit
A systematic, post-publication examination of whether a study's published results can be obtained from its deposited data and code, typically performed by an independent analyst.
Code review (research software)
A structured review of research software by one or more peers, focused on correctness, clarity, documentation, testing, and fitness for purpose, conducted before publication or as part of community-curated software repositories.
Software citation (Software Citation Working Group)
The practice of citing research software in the reference list of a publication, with sufficient metadata (authors, title, version, persistent identifier, role) to credit creators and enable retrieval of the cited version.
FAIR4RS Software Citation Principles
An extension of the FAIR Guiding Principles to research software, articulating that software should be Findable, Accessible, Interoperable, and Reusable, with the precise interpretations adapted to software's distinctive properties (executability, versioning, dependencies).
Reproducible Research Practices (RRP)
The set of disciplinary norms, tools, and habits that together raise the probability that published research will be reproducible: literate programming, version control, dependency pinning, data deposit, code release, and reporting standards.
Nextflow (concept)
A workflow orchestration system based on dataflow programming with a Groovy-based domain-specific language, designed for scalable, container-native, multi-platform execution of computational pipelines.
Snakemake (concept)
A Python-based workflow management system that expresses computational pipelines as rules with explicit inputs, outputs, and shell or script bodies, and infers a directed acyclic graph (DAG) of jobs from those rules.
Container image (Docker/Singularity/Apptainer)
A packaged, immutable filesystem and configuration that contains an application together with all its dependencies, runnable identically on any compatible container engine (Docker, Podman, Singularity, Apptainer).
Workflow language (CWL/WDL)
A declarative specification language for describing multi-step computational analyses such that the steps, their inputs and outputs, and their software dependencies are portable across compatible workflow execution engines.
Computational environment
The full software and hardware context in which an analysis runs, including operating system, language runtime, library versions, configuration, environment variables, and hardware-specific dependencies (e.g., GPU drivers).
Code availability statement
A statement in a published article describing where the source code used in the study can be obtained, under what licence, and at what version, typically required by journal policy.
Data availability statement
A statement in a published article describing where the data underlying the study can be found, the conditions of access, and any restrictions, typically required by journal policy.
Open data
The practice of making research data freely available for any user to access, use, modify, and share, subject only to attribution requirements, typically through deposit in a public repository under an open licence.
Open materials
The release of the non-data, non-code materials used in a study (stimuli, survey instruments, experimental protocols, training materials, intervention manuals) such that future investigators can re-implement the procedure.
Open code
The practice of releasing the source code used in a study, under an open-source licence, alongside the publication, such that any reader may inspect, reuse, and re-execute the analysis.
Robustness check
An additional analysis, supplementary to the headline result, that varies one or more analytical choices in order to demonstrate that the main conclusion is not artefactual to those choices.
Multiverse analysis
An analytical approach in which all reasonable combinations of data-processing and modelling choices are executed, producing a distribution of results that displays the impact of researcher degrees of freedom on the conclusion.
Specification curve
An analytic and visual technique that plots the estimated effect across a large set of theoretically defensible model specifications, ordered by effect size, to convey the sensitivity of the result to analytical choices.
Pre-analysis plan
A detailed, time-stamped document specifying the statistical models, variable transformations, exclusion criteria, and inference rules to be applied to a dataset, lodged before the analyst sees the outcome data.
Pre-registration
The practice of publicly recording a study's hypotheses, design, sample, and analysis plan in a time-stamped registry before data collection or (in secondary-data work) before data access, in order to distinguish pre-specified from post-hoc analyses.
Reproducibility crisis
The widely reported finding that substantial proportions of published research, particularly in biomedical, psychological, and social sciences, fail to reproduce or replicate when re-tested.
Inferential reproducibility
The degree to which independent analysts reach the same qualitative scientific conclusion from the same data, even where their analytical choices differ.
Results reproducibility
The narrow sense in which a study's reported quantitative results can be recreated from the deposited data using the deposited analysis procedures.
Methods reproducibility
The degree to which a study's methods are reported in sufficient detail that another investigator could re-implement them, independent of whether the same numerical or empirical results would follow.
Empirical reproducibility
The ability to obtain consistent observations when an empirical procedure (laboratory, field, or measurement) is independently repeated under matched conditions.
Computational reproducibility
The narrow technical sense of reproducibility: obtaining the same numerical outputs from the same data and code, on a comparable computational environment.
Generalisability
The extent to which a study's findings extend to populations, settings, or conditions other than those directly sampled.
Robustness
The stability of a study's conclusions under reasonable variations in analytical choices, model specification, sample inclusion, or measurement, on the same data.
Replicability
The ability to obtain consistent results when an independent investigator collects new data using the same study design and analysis procedures.
Reproducibility
The ability to obtain consistent computational or analytical results when the same data and analysis procedures are applied by an independent investigator using the same code and tools.
33 terms beyond the curated set above.
Stewardship
Reproducibility terminology sits between three communities. The CASRAI Dictionary working group integrates and publishes the vocabulary. The Center for Open Scienceis the originating steward of the TOP Guidelines, the Registered Reports format, and the preregistration registry on the Open Science Framework. The Committee on Publication Ethics (COPE)stewards the boundary cases where non-replication intersects with editorial concern.
Cross-reference to CODATA covers the related FAIR and RDM stewardship that touches reproducibility from the data-management side. The Reproducibility working group also liaises with FORCE11 on FAIR4RS and software-citation practice, with the EQUATOR Network on reporting guidelines, and with NIH and Wellcome on funder-policy alignment.
Related editorial and standards
- Reproducibility standards hub — the standards-level companion to this domain
- CRediT for authors — how contributor attribution supports reproducibility
- CRediT role: Validation — the role most directly mapped to reproducibility checks
- CRediT role: Data Curation — the role that makes computational reproducibility possible
- FAIR principles assessment — cross-domain reference
- CASRAI × CODATA federation — adjacent RDM stewardship
Federation
Cross-walks to external standards
Each entry in this domain carries machine-readable mappings to the upstream standard. Definitions remain canonical at the steward; CASRAI federates rather than re-publishes.
| Standard | Steward | Scope |
|---|---|---|
| TOP Guidelines | Center for Open Science | Eight modular journal-policy standards covering citation, data, code, materials, design, analysis, preregistration, replication. |
| ARRIVE 2.0 | NC3Rs (du Sert, Ahluwalia, Alam et al. 2020) | Reporting checklist for in vivo animal research; Essential 10 + Recommended Set. |
| NIH Rigor & Reproducibility | US National Institutes of Health | Four-pillar policy covering scientific premise, rigorous design, biological variables, and authentication of key resources. |
| CONSORT · PRISMA · STROBE | EQUATOR Network | Reporting guidelines for trials, systematic reviews, and observational studies respectively. |
Frequently asked questions
Are reproducibility and replicability the same thing?
No. CASRAI follows the National Academies (NASEM 2019) usage. Reproducibility refers to obtaining the same result from the same data and code; replicability refers to obtaining a consistent result from a new study aimed at the same question. Robustness is a third, distinct construct concerning stability under analytic variation. Conflating the three is the single most common source of definitional drift in the literature, which is why the Dictionary maintains them as separate entries.
Why does CASRAI maintain reproducibility terminology if NIH and COS already do?
NIH defines reproducibility in the context of US biomedical grant policy; COS defines it in the context of TOP-compliant journal policy. Neither is a vendor-neutral controlled vocabulary that a CRIS, repository, or publisher submission system can ingest as machine-readable terms. CASRAI integrates these definitions, federates with their stewards, and exposes the result as Schema.org DefinedTerm markup under CC-BY 4.0.
Where do I propose a new reproducibility term?
Use the contribute flow. The Reproducibility working group reviews proposals during each release cycle (March and September). Accepted contributors receive CRediT attribution on the term entry itself.
How does CRediT support reproducibility?
CRediT provides the contributor-role granularity that reproducibility audits depend on. The Validation role in particular maps directly to reproducibility checks; Data Curation and Software map to the artefacts that make computational reproducibility possible. The Dictionary cross-references CRediT roles from every relevant reproducibility term.
Related across CASRAI
How reproducibility connects to contributor attribution, the research-administration workflow, and the wider CASRAI standards.








