research code – CASRAI Dictionary

The data management plan has become a fixture of modern research. Funders ask for one, repositories expect data to be deposited, and researchers increasingly treat planning for data as a normal part of designing a project. Yet a quieter gap has opened alongside this success. The data a study produces is now planned for and curated, but the software that generates, cleans and analyses that data — the scripts, pipelines, models and bespoke tools that do the actual computational work — is frequently left undocumented, unversioned and unpreserved. A dataset deposited without the code that produced it is often only half of a reproducible result. The software management plan exists to close that gap by extending the discipline of the DMP to research code, a development that sits naturally within the machine-actionable DMP domain of the CASRAI Dictionary.

Why data planning is not enough

A data management plan answers questions about what data will be collected, how it will be documented, where it will be stored and how it can be reused. Those questions matter, but they assume the data can stand on its own. In computational research it frequently cannot. The meaning of a dataset often depends on exactly how it was processed: which version of which analysis script, with which parameters, in which software environment. If that processing is captured only as an informal collection of files on a researcher’s laptop, then the data, however well curated, cannot be fully understood or reproduced once the project ends and the person who wrote the code moves on. Planning for the data while ignoring the code that gives it meaning leaves a reproducibility gap that no amount of careful data deposit can fill.

What a software management plan covers

A software management plan asks of code the kinds of questions a DMP asks of data, adapted to the nature of software. Typical concerns include:

Identification and versioning. How will the software be version-controlled, and how will specific versions be identified so that a paper can cite the exact version that produced its results?
Documentation. What documentation will let someone else install, run and understand the software — dependencies, environment, usage, and the assumptions built into it?
Licensing. Under what licence will the software be released, so that others know what they are permitted to do with it?
Preservation and availability. Where will the software be deposited for the long term, and how will it be given a persistent identifier so it remains findable and citable after the project closes?
Sustainability. Who maintains the software during the project, and what is the realistic plan for it afterwards — archived as a snapshot, handed on, or actively maintained?

The aim is not to turn every analysis script into a maintained software product. It is to make a deliberate, honest decision about how each piece of research software will be handled, proportionate to its importance.

The FAIR4RS principles

Underpinning this is the recognition that software is itself a research output deserving the same care as data. The FAIR principles — that outputs should be Findable, Accessible, Interoperable and Reusable — were written with data in mind, and applying them to software required adaptation, because code differs from data in important ways: it executes, it has dependencies, it changes through versions, and its reusability depends on documentation and environment as much as on access. The FAIR4RS principles (FAIR for Research Software) provide that adaptation, articulating what findability, accessibility, interoperability and reusability mean for software specifically. A software management plan is, in effect, the practical instrument for delivering FAIR4RS within a project: it is where the abstract principles become concrete commitments about identifiers, documentation, licensing and preservation.

Making plans machine-actionable

The most significant shift in management planning generally is the move from prose documents to machine-actionable formats. A traditional plan is a narrative that a human reads once and files away; nothing checks whether its commitments were kept. A machine-actionable plan (a maDMP, and by extension a machine-actionable software management plan) expresses its commitments as structured, machine-readable statements that systems can act on. This matters for software because so many of its commitments are checkable. Whether code was deposited in a repository, given a persistent identifier, assigned a licence and linked to the paper that uses it are all facts a system can verify rather than merely record. A machine-actionable plan can therefore connect the promise to the evidence: it can be updated automatically as a repository is created and an identifier minted, and it can flag where a stated commitment has not yet been met. The plan becomes a living part of the research workflow rather than a document written to satisfy a requirement and then forgotten.

Software as a first-class output

All of this rests on treating software as a genuine research output, not an invisible by-product. The broader movement to recognise the full range of scholarly outputs — data, software, protocols and more — is the subject of our research outputs domain. When research software is identified, versioned, licensed, preserved and citable, it can be found and reused by others, and its development can be recognised as the substantial intellectual work it often is. A software management plan is the planning instrument that makes those properties likely rather than accidental.

Crediting and standardising software work

Because building research software is real contribution, it belongs in the structured account of who did what. The CRediT taxonomy — whose full set of contribution types is described in our overview of the CRediT roles — captures this through its Software role, which recognises programming, implementation and the development and testing of code, alongside Data curation for the data that code produces. For a software management plan to be machine-actionable across the systems that consume it — repositories, CRIS platforms, funder reporting — the elements it contains must mean the same thing everywhere. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the licences, identifiers, output types and roles flowing through a software management plan are understood identically wherever they travel. Data planning made data a managed asset; software management planning extends the same care to the code that makes the data mean something.

Tag: research code

Software management plans: extending the DMP to research code