Tag: Zenodo

  • DataCite, GitHub, Zenodo: the three-cornered software-citation stack

    Software citation in 2026 mostly runs on a three-cornered stack: a code repository (typically GitHub), an archiving service that issues DOIs (typically Zenodo), and the DataCite infrastructure that registers and resolves the DOIs. The integration between the three is more polished than it was five years ago and substantially less polished than it could be. This post walks through the current state and what integrators should do.

    The pattern that works

    The operational pattern that the community has converged on. A research-software project lives in a Git repository (often on GitHub, increasingly on GitLab or other forges). At each release, the repository is archived to Zenodo, which creates a DOI for that release; a concept DOI for the project overall is also issued, resolving to the latest release. The repository carries a CITATION.cff file specifying how to cite the software, including the Zenodo DOI and the contributor list. The published paper (if any) cites the software via the Zenodo DOI; the software citation pattern is operationally clean.

    The integration works at the technical layer. GitHub-Zenodo integration is documented and stable. CITATION.cff is supported by GitHub’s repository UI for human-readable citations and by an increasing number of tools (Zenodo, JOSS, R packages’ references) for machine processing. DataCite’s metadata supports the software-type record with CRediT-aligned contributor roles where the depositor provides them.

    What’s good

    Three things this stack does well.

    First, versioning. Software is versioned; citation should be versionable. The concept-DOI plus per-version-DOI pattern lets a paper cite either the specific version it used or the project conceptually, with the appropriate DOI. This is the right design for software citation and the community has converged on it.

    Second, open infrastructure. Zenodo is operated by CERN as a public infrastructure; DataCite is a community-governed organisation. The depositor’s investment in software citation does not lock them into a commercial vendor. This matters for sustainability.

    Third, integration with FAIR4RS. The FAIR4RS Principles and the resulting software citation principles are operationalised by this stack. A FAIR-aligned software project should have an archived release with a DOI, with rich metadata, with a contributor record, all of which the stack supports.

    What’s still rough

    Four issues at the seams.

    First, the GitHub dependency. The dominant code-hosting platform is a commercial service owned by a major tech company. The Zenodo integration is GitHub-specific in important ways (the auto-archival webhook, the metadata propagation from the GitHub release to Zenodo). GitLab and other forges have lighter-weight integration patterns. The community’s reliance on GitHub for the code-hosting corner of the stack creates a single-point-of-vendor risk that the FAIR-software community has been increasingly aware of. Software Heritage’s archive of public repositories provides some long-term resilience but is not a substitute for the operational integration.

    Second, metadata fidelity at deposit. The GitHub-Zenodo automatic deposit captures repository metadata but the fidelity is variable. CITATION.cff is honoured if present and well-formed; in its absence, Zenodo defaults to repository-level metadata that may not reflect the contributor structure the developers intended. Projects without CITATION.cff get less-good Zenodo records.

    Third, the CRediT-CITATION.cff alignment. CITATION.cff supports a contributors list with type-of-contribution; the type-of-contribution vocabulary has converged on a CRediT-aligned set but the alignment is not strict. Tools that translate CITATION.cff to CRediT-compliant DataCite metadata produce slightly different results. The Software Citation Working Group has been working on the formal alignment; the work is partly complete.

    Fourth, versioning of the contributor record. CITATION.cff in the repository captures current contributorship; the Zenodo deposit captures contributorship as of the deposit. A project that adds contributors after a release has a stale Zenodo record for that release until the next release. The trade-off (mutable vs immutable per-version records) is a real one; the community has accepted immutable per-version records as the better default.

    What integrators should do

    For software-paper authors and software developers, the practical advice in 2026 is: maintain a CITATION.cff in every research-software repository; archive every meaningful release to Zenodo; cite the specific Zenodo DOI in publications that use the software; cite the concept DOI in publications that reference the project conceptually. The CASRAI software-citation authors guide walks through the patterns.

    For journals publishing software papers, the recommendation is to require CITATION.cff and a Zenodo (or equivalent) deposit at submission, to verify the consistency between the CITATION.cff and the paper’s contributorship statement, and to cite the Zenodo DOI in the published paper. JOSS does all of this; other software-paper venues should follow.

    For institutions, the recommendation is to ingest software-DOI records into CRIS systems as a first-class research output, to surface them in researcher dashboards alongside publications, and to recognise software contribution in promotion and tenure assessment. The CASRAI research outputs domain tracks the institutional implementation patterns.

    For the broader infrastructure community, two priorities. First, support non-GitHub code-hosting integration with Zenodo; the single-vendor concentration is a real risk. Second, complete the CRediT-CITATION.cff alignment work; the operational ambiguity is small but real.

    What’s coming

    Two developments to watch in 2026-2027. First, the Software Heritage citation integration: Software Heritage archives the world’s public source code and assigns SWHIDs (Software Heritage Identifiers). The integration of SWHIDs as a complementary identifier alongside Zenodo DOIs is in progress; the relationship between SWHID and DOI for the same software release is in design. Second, per-version contributor records: the community has been chewing on whether per-version CRediT statements deposited to Crossref or DataCite would be useful for software. The technical viability is clear; the community-consensus and tool-support work is in motion.

    For the moment, the three-cornered stack does the job. The seams are real but workable. Software citation has moved from being a research-software-engineering aspiration to an operational practice; the further refinements are about polish, not foundation.

    Related dictionary entries

  • Conference outputs as part of the scholarly record: proceedings, posters and presentations

    For a great deal of research, the conference is where it first meets the world. A finding is presented in a talk, a method shown on a poster, a work-in-progress debated long before it appears in a journal — and in some fields, notably parts of computer science and engineering, the peer-reviewed conference paper is itself a primary, prestigious form of publication. Yet the outputs that conferences generate have an uneasy relationship with the formal scholarly record. A poster rolled back into its tube, a set of slides shared only with the people in the room, a proceedings paper that never receives a stable identifier: these represent real scholarly work that too often slips through the cracks of citation, discovery and recognition. This article looks at how conference outputs can take their proper place in the record, drawing on the research outputs domain of the CASRAI Dictionary.

    The range of conference outputs

    “Conference output” covers several distinct things, and treating them as one blurs important differences:

    • Proceedings papers. Full written papers published as part of a conference’s proceedings, frequently peer-reviewed and, in some disciplines, the main venue for significant work — carrying prestige comparable to or exceeding journal articles.
    • Extended abstracts. Shorter written contributions that summarise work presented at a meeting.
    • Posters. Visual presentations of research, often of preliminary or focused findings, displayed and discussed during a conference.
    • Presentations and slides. The talks given at conferences and the slide decks that accompany them, which capture how work was framed and communicated at a particular moment.

    Each of these is a genuine output reflecting real intellectual contribution. The problem has rarely been their value; it has been their persistence and findability. A journal article is deposited, identified, indexed and citable almost automatically. A poster or a set of slides, historically, was not — and so excellent work could effectively vanish after the event that occasioned it.

    The persistence problem

    The core difficulty is that conference outputs have often lacked the infrastructure that makes other outputs durable. Without a stable home and a persistent identifier, a poster or presentation cannot be reliably cited, because there is nothing stable to cite; it cannot be easily discovered, because it is not indexed; and it cannot be properly credited, because it leaves no fixed trace. The result is a systematic under-recognition of a large category of work, and a loss to the record itself, since conference outputs frequently contain early results or methodological details that never reach a later paper. Solving this requires the same two things that make any output durable: a stable place to live and a persistent identifier to name it.

    Repositories and DOIs for conference outputs

    This is exactly what general-purpose research repositories now provide. Platforms such as Zenodo and Figshare allow researchers to deposit a wide range of outputs — including posters, presentations, slides and proceedings papers — and, crucially, to mint a DOI for each one. The effect is transformative. A poster deposited in Zenodo with a DOI is no longer an ephemeral object that existed for one afternoon; it is a permanently archived, uniquely identified, citable output with its own landing page and metadata. The same applies to a slide deck or an extended abstract. By depositing conference outputs and obtaining persistent identifiers for them, researchers turn fleeting presentations into durable parts of the scholarly record — findable, linkable and citable like any article or dataset. The infrastructure that was once reserved for formal publications is now readily available for the full range of conference work.

    Citing and connecting conference outputs

    Once a conference output has a persistent identifier, it can participate fully in the scholarly graph. It can be cited in later work, so that the poster which first presented an idea, or the proceedings paper that established a method, receives proper credit. It can be linked to related outputs — connected to the dataset it draws on, the eventual journal article it grew into, or the software it demonstrated — so that the relationships between a project’s outputs are visible. And it can be attributed to its creators through their identifiers, so the contribution attaches to the right people. This connectivity matters because research outputs are most valuable when they are linked rather than isolated. A conference output with a DOI is not a dead end; it is a node in the network of scholarship, able to cite and be cited like any other.

    Recognition and assessment

    Making conference outputs persistent and citable also makes them visible to the systems that recognise and assess research. When a poster or presentation has a stable identifier and clear metadata, it can appear in a researcher’s ORCID record, flow into institutional systems, and be counted as part of their contribution — particularly important for early-career researchers, whose conference work may precede their first journal publications, and for disciplines where the peer-reviewed proceedings paper is the principal output. Recognising the full breadth of scholarly work, rather than only the journal article, is a recurring theme of the research outputs domain and of fair approaches to authorship and contribution. Conference outputs deserve to count, and persistent identification is what lets them.

    A consistent vocabulary for conference work

    For conference outputs to be deposited, cited, linked and credited consistently across repositories and research systems, they must be described in a shared way — the type of output, its relationship to an event and to other outputs, and the roles of those who created it. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that a proceedings paper, a poster or a presentation is understood for what it is wherever it appears. And because each rests on genuine contribution, the work behind it can be described in the same shared framework — the CRediT taxonomy. The conference is where research is so often first shared; giving its outputs persistent identifiers ensures that what is shared there takes its rightful place in the lasting record.