Tag: Binder

  • Computational notebooks and reproducible analysis: Jupyter, R Markdown and executable environments

    For most of the history of scholarly publishing, the analysis behind a result and the account of that result lived in separate places. The computation happened in scripts, spreadsheets and statistical software; the write-up happened in a word processor; and the two were stitched together by hand, numbers copied across and figures pasted in. The seams were invisible in the final paper but they were also where reproducibility quietly broke, because nothing connected the published claim to the code that produced it. Computational notebooks close that gap by weaving executable code, its outputs and explanatory narrative into a single document. They have become one of the most important tools for reproducible analysis, and they sit at the heart of the reproducibility domain of the CASRAI Dictionary.

    What a computational notebook is

    A computational notebook is a document that interleaves three things: code that performs an analysis, the output that code produces — tables, figures, statistics — and narrative text that explains what is being done and why. The most widely used is the Jupyter notebook, popular across data science and many scientific fields, which supports a range of languages and presents an analysis as a sequence of executable cells interleaved with prose. In the R world, R Markdown serves a similar purpose, embedding R code in a Markdown document that renders code, results and text together into a finished report. The appeal is profound: instead of a static description of an analysis, a notebook is the analysis — the code that generated each figure sits right beside it, so a reader can see exactly how a result was produced and, in principle, run it themselves. Narrative and computation become one artefact rather than two loosely related ones.

    Quarto and the next generation

    The notebook idea has continued to evolve. Quarto is an open-source publishing system that builds on the lessons of R Markdown and generalises them: it works across multiple languages, including R and Python, and can render the same source into many output formats — articles, websites, slides, books — from a single document that combines code, output and text. Tools of this kind reflect a maturing of the literate-programming idea, where a piece of work is authored once as an executable document and then produced in whatever form is needed. The analysis and its presentation are increasingly treated as a single reproducible source rather than a manuscript assembled from separate parts — so the figures and numbers in a report, generated by the code in the same document, cannot silently drift out of step with the analysis they describe.

    The reproducibility gap notebooks do not close

    For all their power, notebooks carry a serious and frequently underestimated limitation. A notebook captures the code and its narrative, but a notebook is only as reproducible as the environment it runs in. Code does not run in a vacuum; it depends on a particular language version, on specific libraries at specific versions, and sometimes on system-level components. A notebook shared without that environment may simply fail to run on someone else’s machine — or, more insidiously, run but produce different results because a dependency has changed. This is the gap between sharing a notebook and sharing a reproducible analysis. The notebook is necessary but not sufficient; what is also needed is a way to capture and reconstruct the executable environment in which the notebook was meant to run.

    Executable environments: Binder and Code Ocean

    Several tools and platforms exist precisely to close this environment gap, by packaging a notebook together with the environment it needs so that it can be re-run reliably:

    • Binder takes a code repository containing notebooks and configuration describing its dependencies, and builds a live, executable environment in which anyone can run those notebooks in a browser — turning a static repository into something a reader can actually execute without installing anything.
    • Code Ocean provides a platform where code, data and the computational environment are bundled into a self-contained, executable “capsule” that can be run and re-run, supporting the publication of analyses that others can reproduce.

    The common principle is that the environment must travel with the notebook. By capturing the dependencies — whether through configuration files, containerisation or a managed platform — these tools let an analysis be re-executed as its author intended, rather than left to break against whatever happens to be installed on the next person’s computer.

    Notebooks as citable outputs

    Because a notebook captures real intellectual work — the design and implementation of an analysis — it deserves to be treated as an output in its own right, not merely as private working material. Deposited in a repository, given a persistent identifier and paired with its environment, a notebook becomes a citable, reusable artefact that documents exactly how a study’s results were obtained. This sits within the wider recognition of software and analysis code as first-class research outputs, a theme developed in our resources on the full range of scholarly outputs and at the CASRAI learning hub. A well-preserved, executable notebook is among the strongest forms of reproducibility a computational study can offer: not a description of the analysis, but the analysis itself, ready to run.

    A consistent vocabulary for reproducible work

    For notebooks, environments and the analyses they contain to be shared, found and credited across platforms, the way they are described must be consistent — output types, software and environment information, relationships to data, and contributor roles. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that a computational notebook and its executable environment are understood the same way wherever they are deposited or cited. And because building an analysis is genuine contribution, the work behind it can be described in the same shared framework — the CRediT taxonomy, whose Software and Formal analysis roles map directly onto notebook-based work. Computational notebooks brought code and narrative together; pairing them with reproducible environments is what turns a readable analysis into a re-runnable one.