computational reproducibility – CASRAI Dictionary

A computational result that no one else can re-run is, strictly speaking, a claim rather than a finding. The gap between “the figures are in the paper” and “anyone with the code and data can regenerate the figures” is the gap that computational reproducibility exists to close, and over the past decade a practical toolkit has emerged to close it. This article walks that toolkit — containers, workflow languages, source-code identifiers, and the FAIR4RS principles — drawing on the reproducibility domain.

What computational reproducibility means, precisely

Computational reproducibility is the property of a computational result being reproducible from the provided code and data. It is a narrower and more achievable target than replication: replication asks whether a finding holds when the study is run afresh; computational reproducibility asks only whether the same inputs and the same code produce the same outputs. That sounds trivial and is notoriously not, because a computation depends on far more than the script the author remembers to share — it depends on library versions, the operating system, environment variables, random seeds, and sometimes the hardware.

The artefact that makes reproducibility possible is the reproducibility package (some communities say replication package): a bundle of code, data, and instructions sufficient to reproduce the results of an output. A good package is not a folder of scripts; it is a self-contained, documented environment that a stranger can execute.

Containers: capturing the environment

The single largest source of irreproducibility is the environment, and the most effective response is the container image — a packaged, reproducible computational environment that bundles the code together with the exact operating system libraries and dependencies it needs. The modern standard is the OCI (Open Container Initiative) image format, familiar to most through Docker. In high-performance computing, where users cannot run as root, Singularity / Apptainer images serve the same purpose under HPC constraints.

A container is not the only way to pin an environment. A Conda environment with an exported specification, or a requirements lockfile recording exact dependency versions, achieves much of the same for interpreted-language work. The principle is constant: the environment is part of the result, and must be captured as deliberately as the code itself. Recording it in a structured compute environment record — what ran, on what, with which versions — is what lets a reviewer distinguish a genuine reproduction from an accidental match.

Workflows: capturing the steps

Capturing the environment is necessary but not sufficient; the steps matter too. A multi-stage analysis run by hand, in an order the author holds in their head, is not reproducible no matter how well the environment is pinned. This is the problem a workflow definition solves: a formal, executable specification of the computational steps and their dependencies.

Several workflow languages are in wide use, and the dictionary treats them as variants of a single concept rather than picking a winner. Common Workflow Language (CWL), the Broad Institute’s WDL, Nextflow, and Snakemake each express a pipeline as a declarative graph of steps, so that the whole analysis can be re-executed with one command. Expressed this way, the workflow is itself a citable research output — the structured record of how the result was produced, not merely a description of it.

Identifying the software: Software Heritage and SWHID

Reproducibility presupposes that the code still exists and can be referred to unambiguously, and this is where source-code identifiers come in. Software Heritage is the universal archive of source code, harvesting and preserving public code repositories at scale. It issues the SWHID (Software Heritage Identifier): a persistent identifier that is content-derived and immutable — it identifies an exact state of the code by its content, so the same SWHID always resolves to byte-identical source.

This intrinsic property distinguishes a SWHID from a DOI minted for a software release. A DOI (via DataCite, often through a GitHub–Zenodo deposit) gives the release a citable handle and rich metadata; a SWHID guarantees that the specific code referenced is exactly the code archived. The two are complementary, and a robust reproducibility package can carry both: a DOI for citation and a SWHID for byte-level fidelity.

FAIR4RS: software is not just data

The FAIR principles — Findable, Accessible, Interoperable, Reusable — were written with data in mind, and applying them naively to software misses what is distinctive about code. FAIR4RS, the FAIR Principles for Research Software, is the RDA-developed adaptation that takes software’s particular nature seriously: software is executable, it has versions and dependencies, it is composed of and depends on other software, and it evolves. FAIR4RS reframes each principle for these realities — findability through a persistent identifier and rich metadata, accessibility of both the software and its description, interoperability through standard formats and dependencies, and reusability through clear licensing, provenance, and documentation. It is the conceptual bridge between the data-centric FAIR data principles and the practical work of making research software reproducible.

Recognition: reviewing the artefacts

None of this happens without incentives, and the incentive structure is slowly maturing. Artifact evaluation — peer review of the code, data, and environments behind a paper — is now a standard track at many computer-science venues, and the ACM Artifact Review and Badging programme attaches visible badges to papers whose artefacts have been checked. A reproducibility review targeting the computational results specifically is becoming a recognised contribution in its own right, the kind of work that responsible assessment frameworks aim to make visible alongside conventional outputs.

Where shared vocabulary fits

The reproducibility toolkit is mature, but its terms are used loosely across communities — “reproducibility package” and “replication package” name the same thing, workflow languages proliferate, and “reproducible” itself means different things to different fields. A shared, federated vocabulary that defines these terms and points back to the RDA for FAIR4RS and to Software Heritage for the SWHID is what lets a reproducibility claim in one field be understood in another. Supplying that definitional layer is the role the CASRAI dictionary exists to play.

What to do now

For researchers: ship a reproducibility package with a pinned environment (a container or lockfile), an executable workflow, and persistent identifiers — a DOI for citation and a SWHID for the exact code. For reviewers and venues: treat artifact evaluation and reproducibility review as first-class, badge-worthy contributions. For standards work: align software vocabulary on FAIR4RS and the persistent-identifier ecosystem rather than letting each community coin its own.

Tag: computational reproducibility

Computational reproducibility: containers, workflows and FAIR4RS