Editorial · CASRAI · Research outputs (expanded)

DataCite, GitHub, Zenodo: the three-cornered software-citation stack

Software citation in 2026 runs on a three-cornered stack. The roles of DataCite, GitHub, and Zenodo — and what integrators should do about the seams.

ByCASRAI Editorial Board

Published 18 Jun 2026· Last updated 10 Jul 2026· 5 minute read

Software citation in 2026 mostly runs on a three-cornered stack: a code repository (typically GitHub), an archiving service that issues DOIs (typically Zenodo), and the DataCite infrastructure that registers and resolves the DOIs. The integration between the three is more polished than it was five years ago and substantially less polished than it could be. This post walks through the current state and what integrators should do.

The pattern that works

The operational pattern that the community has converged on. A research-software project lives in a Git repository (often on GitHub, increasingly on GitLab or other forges). At each release, the repository is archived to Zenodo, which creates a DOI for that release; a concept DOI for the project overall is also issued, resolving to the latest release. The repository carries a CITATION.cff file specifying how to cite the software, including the Zenodo DOI and the contributor list. The published paper (if any) cites the software via the Zenodo DOI; the software citation pattern is operationally clean.

The integration works at the technical layer. GitHub-Zenodo integration is documented and stable. CITATION.cff is supported by GitHub’s repository UI for human-readable citations and by an increasing number of tools (Zenodo, JOSS, R packages’ references) for machine processing. DataCite’s metadata supports the software-type record with CRediT-aligned contributor roles where the depositor provides them.

What’s good

Three things this stack does well.

First, versioning. Software is versioned; citation should be versionable. The concept-DOI plus per-version-DOI pattern lets a paper cite either the specific version it used or the project conceptually, with the appropriate DOI. This is the right design for software citation and the community has converged on it.

Second, open infrastructure. Zenodo is operated by CERN as a public infrastructure; DataCite is a community-governed organisation. The depositor’s investment in software citation does not lock them into a commercial vendor. This matters for sustainability.

Third, integration with FAIR4RS. The FAIR4RS Principles and the resulting software citation principles are operationalised by this stack. A FAIR-aligned software project should have an archived release with a DOI, with rich metadata, with a contributor record, all of which the stack supports.

What’s still rough

Four issues at the seams.

First, the GitHub dependency. The dominant code-hosting platform is a commercial service owned by a major tech company. The Zenodo integration is GitHub-specific in important ways (the auto-archival webhook, the metadata propagation from the GitHub release to Zenodo). GitLab and other forges have lighter-weight integration patterns. The community’s reliance on GitHub for the code-hosting corner of the stack creates a single-point-of-vendor risk that the FAIR-software community has been increasingly aware of. Software Heritage’s archive of public repositories provides some long-term resilience but is not a substitute for the operational integration.

Second, metadata fidelity at deposit. The GitHub-Zenodo automatic deposit captures repository metadata but the fidelity is variable. CITATION.cff is honoured if present and well-formed; in its absence, Zenodo defaults to repository-level metadata that may not reflect the contributor structure the developers intended. Projects without CITATION.cff get less-good Zenodo records.

Third, the CRediT-CITATION.cff alignment. CITATION.cff supports a contributors list with type-of-contribution; the type-of-contribution vocabulary has converged on a CRediT-aligned set but the alignment is not strict. Tools that translate CITATION.cff to CRediT-compliant DataCite metadata produce slightly different results. The Software Citation Working Group has been working on the formal alignment; the work is partly complete.

Fourth, versioning of the contributor record. CITATION.cff in the repository captures current contributorship; the Zenodo deposit captures contributorship as of the deposit. A project that adds contributors after a release has a stale Zenodo record for that release until the next release. The trade-off (mutable vs immutable per-version records) is a real one; the community has accepted immutable per-version records as the better default.

What integrators should do

For software-paper authors and software developers, the practical advice in 2026 is: maintain a CITATION.cff in every research-software repository; archive every meaningful release to Zenodo; cite the specific Zenodo DOI in publications that use the software; cite the concept DOI in publications that reference the project conceptually. The CASRAI software-citation authors guide walks through the patterns.

For journals publishing software papers, the recommendation is to require CITATION.cff and a Zenodo (or equivalent) deposit at submission, to verify the consistency between the CITATION.cff and the paper’s contributorship statement, and to cite the Zenodo DOI in the published paper. JOSS does all of this; other software-paper venues should follow.

For institutions, the recommendation is to ingest software-DOI records into CRIS systems as a first-class research output, to surface them in researcher dashboards alongside publications, and to recognise software contribution in promotion and tenure assessment. The CASRAI research outputs domain tracks the institutional implementation patterns.

For the broader infrastructure community, two priorities. First, support non-GitHub code-hosting integration with Zenodo; the single-vendor concentration is a real risk. Second, complete the CRediT-CITATION.cff alignment work; the operational ambiguity is small but real.

What’s coming

Two developments to watch in 2026-2027. First, the Software Heritage citation integration: Software Heritage archives the world’s public source code and assigns SWHIDs (Software Heritage Identifiers). The integration of SWHIDs as a complementary identifier alongside Zenodo DOIs is in progress; the relationship between SWHID and DOI for the same software release is in design. Second, per-version contributor records: the community has been chewing on whether per-version CRediT statements deposited to Crossref or DataCite would be useful for software. The technical viability is clear; the community-consensus and tool-support work is in motion.

For the moment, the three-cornered stack does the job. The seams are real but workable. Software citation has moved from being a research-software-engineering aspiration to an operational practice; the further refinements are about polish, not foundation.

Related editorial in this domain

More on Research outputs (expanded)

30 Jul 2026

How CRediT Data Reveals Co-Corresponding Roles

A 2026 Journal of Informetrics study uses CRediT data to measure contribution among co-corresponding authors, finding it rises with byline position.

23 Jul 2026

arXiv Becomes an Independent Nonprofit, Spun Out of Cornell

On July 1, 2026, arXiv left Cornell University’s umbrella to become arXiv, Inc., an independent 501(c)(3) nonprofit. Cornell and the Simons Foundation are founding Members with board seats on a 12-member board; a $10M Simons/Schmidt Sciences gift funds cloud migration. The arxiv.org URL, free access, staff, and moderation process are unchanged.

17 Dec 2025

Data papers, software papers, and the limits of CRediT

Data papers and software papers don’t map cleanly onto the 14 CRediT roles. A practical guide to the friction and where the taxonomy needs work.