Tag: software citation

  • How the Software role applies to code-only outputs

    A growing fraction of research output is code: software libraries that implement a method, computational notebooks that demonstrate an analysis, simulation frameworks that enable a body of work, infrastructure tooling that supports a research community. When the output is primarily code, the CRediT Software role carries weight that the role’s brief definition does not fully prepare it for. This post is a practical guide to assigning Software in code-centric contexts.

    The Software role, briefly

    The CRediT Software role is defined as: Programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components. The definition is short and was written with software-as-tool-for-a-paper in mind, not software-as-the-paper.

    For a conventional research paper where someone wrote analysis code that supported the science, Software is straightforward: the person who wrote the analysis code gets the role. For a paper whose primary scholarly contribution is the code itself — a JOSS paper, a software-methods paper, a tool announcement — Software is the dominant role and the brevity of the definition starts to bite.

    What the Software role should cover in a code-only context

    Our recommendation, distilled from the practice of JOSS, the Software Sustainability Institute, the Research Software Engineers community, and several years of CASRAI editorial work, is to read Software in code-only contexts as encompassing the following five sub-activities, all of which should be visible in the contributorship statement even if they share the role.

    Implementation: writing the production code itself. This is the core of Software and is what people most naturally associate with the role.

    Architecture and design: the higher-level decisions about how the code is structured, what its dependencies are, how its modules interact. In a code-only paper, architecture is part of the intellectual contribution and the architect should be a co-author with Software role.

    Testing: writing the test suite, including unit tests, integration tests, and regression tests. A code-only paper with a credible test suite has someone who built it.

    Documentation: user-facing documentation, developer-facing documentation, README, examples, tutorials. For code intended for reuse, documentation is part of the deliverable; the documentation contributor gets the Software role.

    Packaging and release: the engineering work of making the code installable, citable, and citation-resolvable. CI/CD configuration, dependency management, release-tagging, DOI registration. For long-lived code with multiple releases, this is sustained work; for a one-off code release accompanying a paper, it is still non-trivial.

    Each of these is meaningful contribution that the Software role captures. A code-only paper’s CRediT statement should make the distribution of these activities across contributors visible, using the lead/equal/supporting qualifier to express relative magnitude.

    Where Software overlaps with other roles

    Three overlaps deserve attention.

    First, Software versus Methodology. If the code implements a novel method, the method itself is a Methodology contribution; the implementation is a Software contribution. The same person often discharges both, and the contributorship statement should assign both roles to them. The error to avoid is conflating the two: assigning Software while omitting Methodology under-represents the intellectual contribution.

    Second, Software versus Validation. Writing tests is Software (per the definition); validating the code against reference implementations or independent data is Validation. The distinction is genuine: tests verify that the code does what the developer intended; validation verifies that the code does what is scientifically correct. Both belong in a code-only paper’s contributorship.

    Third, Software versus Writing – original draft. The README, the developer documentation, the API reference — these are documentation, captured under Software. The paper itself, including its method description and its discussion of design choices, is captured under Writing – original draft. The boundary is the publication artefact: anything in the paper is Writing; anything in the code repository is Software.

    Cross-referencing with CITATION.cff

    The CITATION.cff convention, increasingly standard in scientific software repositories, provides a richer contributor model than CRediT alone. CFF supports author, contact, and contributor entries with type-of-contribution fields; integrators have extended it with CRediT-aligned vocabularies. The recommended pattern for a code-only paper is to maintain both: a CRediT statement in the paper (for the paper-level contributorship) and a CITATION.cff in the repository (for the per-version, per-component contributorship that CRediT cannot express).

    The two should be consistent. A contributor named in the paper with Software role should appear in the CITATION.cff with at least equivalent contribution; a contributor named in the CITATION.cff but not in the paper should be acknowledged in the paper’s acknowledgements section. The CASRAI CITATION.cff entry walks through the integration patterns.

    The maintenance question

    An unresolved aspect of Software in code-only contexts is how to credit maintenance over time. A research software package may have a paper at first release, with a CRediT statement reflecting the founding contributors. Five years and several major versions later, the package has new maintainers, new contributors, and a substantially different code base. The original paper’s CRediT statement is increasingly out of date.

    The current pragmatic answer is: the paper’s CRediT statement freezes at publication; the CITATION.cff in the repository tracks current contributorship; downstream citation should reference both, with the paper as the publication-of-record and the CFF as the current-contributor record. This works but is imperfect. The Software Citation Working Group has been chewing on whether per-version CRediT statements, deposited to Crossref via the related-identifier mechanism, would be a cleaner answer; the proposal is technically viable but not yet a community consensus.

    What journals should do

    For journals publishing software papers, the recommended editorial practices are: require CRediT with qualifiers in the paper; require a CITATION.cff in the linked repository; verify that the two are consistent; for major software packages, accept and publish supplementary contributor records that go beyond the byline.

    JOSS is the maturity reference here and most other software-paper venues are moving toward similar practices. The CASRAI CRediT for software papers guide is updated quarterly with current practice.

    What authors should do

    For authors of code-only papers, four practical steps. First, distribute the Software role across the five sub-activities visibly, using the qualifier. Second, assign Methodology when the code implements a novel method. Third, maintain the CITATION.cff in the repository in parallel with the paper’s CRediT statement. Fourth, plan for the maintenance-credit question: who will maintain the code, how their contribution will be recognised over time, where the credit will live.

    The CRediT taxonomy can support code-only outputs well, with attention. The work is in using the Software role thoughtfully, in interlocking it with Methodology and Writing where appropriate, and in maintaining the parallel record in the repository.

    Related dictionary entries

  • Data papers, software papers, and the limits of CRediT

    The 14 roles of CRediT were designed against the model of a conventional research article reporting empirical work: a study with a hypothesis, a method, data, analysis, and a written argument. Data papers and software papers fit this model awkwardly. A data paper describes a dataset; a software paper describes a piece of software. The intellectual contribution is the artefact itself, not the prose around it. The CRediT roles, applied to these papers, produce statements that are technically valid but substantively misleading. This post catalogues the friction and suggests where the taxonomy could be extended.

    What a data paper actually is

    A data paper, as the genre has developed in venues like Scientific Data, Earth System Science Data, GigaScience, and the data-paper streams of disciplinary journals, is a peer-reviewed description of a dataset: its provenance, its collection method, its quality, its access conditions, and its potential reuse. The dataset itself lives in a repository with its own DOI; the data paper provides the citable, peer-reviewed scholarly record that the dataset exists, that it was collected with rigour, and that it is fit for reuse.

    The intellectual labour behind a data paper is mostly not in the paper. It is in the years of fieldwork or instrument operation that produced the data, the protocols that ensured comparability across collection events, the curation work that turned raw observations into a structured deposit, the documentation that lets a stranger understand what the data mean. The paper is a summary record of that work.

    Where CRediT falls short for data papers

    Three friction points. First, Investigation and Data curation bear most of the load and they are not differentiated finely enough. A field ecologist who spent years collecting samples, a lab technician who processed them, a data manager who normalised the schema, and a metadata specialist who wrote the documentation are all plausibly Investigation or Data curation; the roles do not distinguish them. The result is that two papers with very different actual contributorship patterns can have identical-looking CRediT statements.

    Second, Resources overlaps with Investigation in a confusing way. A data paper describing a long-term ecological observatory has a Resources contribution (the observatory itself) that is distinct from the per-sample Investigation. CRediT does not currently cleanly separate “provided the infrastructure that produced the data” from “provided the samples that went into the data.”

    Third, Writing – original draft is often the smallest contribution, not the largest, and assigning it Lead can misrepresent the contribution structure. The person who wrote the paper is often a relatively junior team member, not the senior person whose intellectual contribution was the protocol and the multi-year campaign.

    Software papers and the JOSS model

    Software papers, exemplified by the Journal of Open Source Software (JOSS), face an analogous problem from a different direction. A JOSS paper is short — often under 1,000 words — and is paired with a peer-reviewed software repository. The intellectual contribution is the software: its design, its implementation, its tests, its documentation, its maintenance over time. The paper is a stub.

    JOSS itself uses CRediT for its papers and has done so since 2020. The community has converged on a set of mappings:

    • Conceptualization covers software design and architectural decisions.
    • Software covers implementation. This is the central role for most JOSS contributors.
    • Validation covers testing, both unit tests and validation against reference implementations.
    • Methodology covers the algorithmic content, where the software implements a non-trivial method.
    • Writing – original draft covers the paper itself. The README, the developer documentation, and the user docs are also writing work, but they are not the JOSS paper.
    • Supervision covers project leadership; Project administration covers maintenance and coordination.

    The friction in this mapping is that the Software role is overloaded. It conflates the initial implementation, ongoing maintenance, bug-fixing, refactoring, and tooling. A contributor who implemented the core algorithm and a contributor who maintains the CI/CD pipeline both get “Software” with no further distinction. For long-lived software with many contributors over years, the role assignment ends up giving everyone Software (lead/equal/supporting) and the differentiation lives in the GitHub commit history, not in CRediT.

    The FAIR4RS angle

    The FAIR4RS Principles for research software, finalised in 2022, set out what FAIR means for software: findable, accessible, interoperable, reusable. They explicitly acknowledge that software citation needs a richer model than data citation, because software has versions, dependencies, and ongoing development that data typically does not.

    FAIR4RS implies, though does not directly require, a richer contributorship taxonomy for software. The Software Citation Implementation Working Group has been chewing on this for several years. Their working position is that CRediT remains the right vocabulary for software paper contributorship, but that the software repository itself should carry its own contributor metadata using a complementary scheme — typically CITATION.cff with extended fields — that captures the per-version, per-component contributorship that CRediT cannot.

    The mapping problem

    For data papers and software papers, the operational reality is that two parallel records exist: the paper’s CRediT statement and the dataset or software repository’s contributor metadata. They overlap but do not align cleanly. The dataset DOI and software DOI live in DataCite; the paper DOI lives in Crossref; the relations between them are declared in the metadata but not always reciprocally.

    The CASRAI research outputs domain tracks the mapping conventions in current use. Our recommendation, for now, is that data papers and software papers should publish a CRediT statement covering the paper’s contributorship and should additionally publish a richer contributor metadata file with the dataset or software, using CRediT roles plus the disciplinary-specific extensions that have emerged.

    Possible extensions

    Three extensions would meaningfully improve the situation. First, sub-roles within Software: an extended taxonomy with implementation, testing, documentation, maintenance, and integration as sub-roles would give a software paper a more truthful contributorship statement. This work has been drafted by the FORCE11 software citation working group but not formally proposed as a CRediT extension.

    Second, distinguished Investigation roles for data papers: collection, processing, curation, documentation as sub-roles of Investigation and Data curation would let a data paper describe its contributorship more faithfully. The challenge here is keeping the taxonomy usable; an over-elaborate vocabulary loses adoption.

    Third, artefact-level role assignments: the current CRediT statement applies at the paper level. For a paper that describes a dataset and a software package, it might be more useful to have role assignments at the artefact level (paper, dataset, software each get their own statement) with cross-references. This would require schema work in Crossref, DataCite, and ORCID.

    What to do now

    For authors of data papers, the practical advice is: use CRediT for the paper; deposit a complementary contributors.json with the dataset that captures finer-grained roles; cross-reference the two in the related-identifier blocks. For authors of software papers, use CRediT for the paper and CITATION.cff for the repository, with the CFF carrying the rich per-component contributor data. The CASRAI data and software papers guide has worked examples.

    For the CRediT stewardship group, the recommendation is to prioritise the data-paper and software-paper mapping problem in the v2026.3 revision discussion. The friction is real, the workarounds are working but ugly, and the taxonomy will be strengthened by a thoughtful extension.

    Related dictionary entries