CASRAI Dictionary

Category: Analysis

Explanatory deep-dives on standards, frameworks, and the open-research landscape.

DataCite, GitHub, Zenodo: the three-cornered software-citation stack
Software citation in 2026 mostly runs on a three-cornered stack: a code repository (typically GitHub), an archiving service that issues DOIs (typically Zenodo), and the DataCite infrastructure that registers and resolves the DOIs. The integration between the three is more polished than it was five years ago and substantially less polished than it could be. This post walks through the current state and what integrators should do.

The pattern that works

The operational pattern that the community has converged on. A research-software project lives in a Git repository (often on GitHub, increasingly on GitLab or other forges). At each release, the repository is archived to Zenodo, which creates a DOI for that release; a concept DOI for the project overall is also issued, resolving to the latest release. The repository carries a CITATION.cff file specifying how to cite the software, including the Zenodo DOI and the contributor list. The published paper (if any) cites the software via the Zenodo DOI; the software citation pattern is operationally clean.

The integration works at the technical layer. GitHub-Zenodo integration is documented and stable. CITATION.cff is supported by GitHub’s repository UI for human-readable citations and by an increasing number of tools (Zenodo, JOSS, R packages’ references) for machine processing. DataCite’s metadata supports the software-type record with CRediT-aligned contributor roles where the depositor provides them.

What’s good

Three things this stack does well.

First, versioning. Software is versioned; citation should be versionable. The concept-DOI plus per-version-DOI pattern lets a paper cite either the specific version it used or the project conceptually, with the appropriate DOI. This is the right design for software citation and the community has converged on it.

Second, open infrastructure. Zenodo is operated by CERN as a public infrastructure; DataCite is a community-governed organisation. The depositor’s investment in software citation does not lock them into a commercial vendor. This matters for sustainability.

Third, integration with FAIR4RS. The FAIR4RS Principles and the resulting software citation principles are operationalised by this stack. A FAIR-aligned software project should have an archived release with a DOI, with rich metadata, with a contributor record, all of which the stack supports.

What’s still rough

Four issues at the seams.

First, the GitHub dependency. The dominant code-hosting platform is a commercial service owned by a major tech company. The Zenodo integration is GitHub-specific in important ways (the auto-archival webhook, the metadata propagation from the GitHub release to Zenodo). GitLab and other forges have lighter-weight integration patterns. The community’s reliance on GitHub for the code-hosting corner of the stack creates a single-point-of-vendor risk that the FAIR-software community has been increasingly aware of. Software Heritage’s archive of public repositories provides some long-term resilience but is not a substitute for the operational integration.

Second, metadata fidelity at deposit. The GitHub-Zenodo automatic deposit captures repository metadata but the fidelity is variable. CITATION.cff is honoured if present and well-formed; in its absence, Zenodo defaults to repository-level metadata that may not reflect the contributor structure the developers intended. Projects without CITATION.cff get less-good Zenodo records.

Third, the CRediT-CITATION.cff alignment. CITATION.cff supports a contributors list with type-of-contribution; the type-of-contribution vocabulary has converged on a CRediT-aligned set but the alignment is not strict. Tools that translate CITATION.cff to CRediT-compliant DataCite metadata produce slightly different results. The Software Citation Working Group has been working on the formal alignment; the work is partly complete.

Fourth, versioning of the contributor record. CITATION.cff in the repository captures current contributorship; the Zenodo deposit captures contributorship as of the deposit. A project that adds contributors after a release has a stale Zenodo record for that release until the next release. The trade-off (mutable vs immutable per-version records) is a real one; the community has accepted immutable per-version records as the better default.

What integrators should do

For software-paper authors and software developers, the practical advice in 2026 is: maintain a CITATION.cff in every research-software repository; archive every meaningful release to Zenodo; cite the specific Zenodo DOI in publications that use the software; cite the concept DOI in publications that reference the project conceptually. The CASRAI software-citation authors guide walks through the patterns.

For journals publishing software papers, the recommendation is to require CITATION.cff and a Zenodo (or equivalent) deposit at submission, to verify the consistency between the CITATION.cff and the paper’s contributorship statement, and to cite the Zenodo DOI in the published paper. JOSS does all of this; other software-paper venues should follow.

For institutions, the recommendation is to ingest software-DOI records into CRIS systems as a first-class research output, to surface them in researcher dashboards alongside publications, and to recognise software contribution in promotion and tenure assessment. The CASRAI research outputs domain tracks the institutional implementation patterns.

For the broader infrastructure community, two priorities. First, support non-GitHub code-hosting integration with Zenodo; the single-vendor concentration is a real risk. Second, complete the CRediT-CITATION.cff alignment work; the operational ambiguity is small but real.

What’s coming

Two developments to watch in 2026-2027. First, the Software Heritage citation integration: Software Heritage archives the world’s public source code and assigns SWHIDs (Software Heritage Identifiers). The integration of SWHIDs as a complementary identifier alongside Zenodo DOIs is in progress; the relationship between SWHID and DOI for the same software release is in design. Second, per-version contributor records: the community has been chewing on whether per-version CRediT statements deposited to Crossref or DataCite would be useful for software. The technical viability is clear; the community-consensus and tool-support work is in motion.

For the moment, the three-cornered stack does the job. The seams are real but workable. Software citation has moved from being a research-software-engineering aspiration to an operational practice; the further refinements are about polish, not foundation.

Related dictionary entries
June 18, 2026
Open peer review: signals, identifiers, attribution
Open peer review has moved from radical experiment to mainstream option. Some form of transparency — published reviewer identity, published review content, post-publication peer review — is now offered by a majority of major journals as either default or opt-in. The CASRAI open peer review entry tracks the policy landscape; this post focuses on the integration layer that makes the practice work.

What open peer review actually is

The phrase covers several distinct practices that share a transparency commitment but differ in what is made transparent. Open identities: reviewer names are disclosed to authors or published with the paper. Open reports: review content is published alongside the paper. Open participation: peer review is conducted in public, with the broader community able to contribute. Open final-version commenting: post-publication commenting is supported as a continuation of review. Different journals combine these differently.

The current state in 2026: open reports are widely offered (eLife, EMBO Journal, PLOS, Nature Communications, BMJ, F1000Research, Royal Society Open Science, and many others), open identities are less common as default but offered as an opt-in by many, open participation remains the most experimental, and open final-version commenting is supported on most major platforms but lightly used.

The signal infrastructure

For open peer review to integrate with the research-information ecosystem, three signals need to be carried through the metadata.

First, the review-as-output signal. A peer review is itself a scholarly output. Crossref’s review-type DOIs, introduced in 2017 and now widely used, give each review a citable identifier. The review DOI is linked to the article DOI via the reviewed-relationship metadata. A reviewer can be credited for the review as a structured output, separately from any co-authorship.

Second, the reviewer-identifier signal. ORCID’s peer-review activity record carries reviews by ORCID iD. A reviewer whose name is disclosed and who has consented to ORCID-record deposit gets the review entered into their ORCID profile, with the journal as the source and the verification provided by the publisher’s deposit. The CASRAI ORCID implementation guide walks through the deposit patterns.

Third, the review-credit signal. The 2024 work on a structured taxonomy for peer-review contribution — distinguishing the actions of a reviewer (read, queried, recommended changes, validated computation, validated data) — has produced a working vocabulary that several journals now apply at review-submission time. The vocabulary is in the CASRAI research integrity domain.

The attribution layer

The attribution layer is where open peer review interlocks with broader research recognition. Pre-Publons-acquisition, the dominant pattern was that journal-published reviews counted as a recognised scholarly output but the cross-journal aggregation was patchy. Publons consolidated some of this; the post-acquisition (Clarivate-owned, now part of Web of Science) state is functional but not as integrated as the community would prefer.

The current best practice is: review with open identity disclosed; review content published under a CC BY licence with a review DOI; review deposited to ORCID via the publisher’s member API; review surfaced in the reviewer’s narrative CV with appropriate context. The result is a reviewer recognition trail that supports promotion, tenure, and career-development assessments.

For institutional research-administration offices, the implication is to capture peer-review contribution in CRIS systems and in researcher reporting. Several institutions have built peer-review dashboards from ORCID-deposit data; the practice is becoming standard at research-intensive universities.

The CRediT interlock

The CRediT taxonomy as currently constituted does not include a Peer Review role; peer review is treated as separate from authorship-related contributorship. There is a structural reason for this: peer review is a per-paper recognition that does not produce co-authorship; CRediT is the co-authorship contributorship taxonomy. Conflating them would muddy both.

The clean separation is: CRediT for paper contributorship (author roles); review DOIs and ORCID peer-review records for reviewer recognition (separately). The two structures are complementary; a researcher’s CV should surface both. The CASRAI peer-review credit guide walks through the integration.

The gaps still open

Three gaps deserve attention in 2026.

First, the cross-journal aggregation gap. Reviews live with the journals that solicited them; ORCID provides the per-reviewer view; but the cross-journal picture (what fraction of reviews in field X are openly published, what review-to-acceptance lag distribution exists, who is reviewing for whom) is harder to assemble. The OpenAIRE Graph has begun ingesting review-DOI data; the picture is improving but not complete.

Second, the quality-signal gap. Open review content is variable in quality; the integration ecosystem treats all reviews as equivalent. A short, perfunctory open review and a substantial methodological critique both get a review DOI and an ORCID entry. The community has not yet developed quality signals for review content; doing so without producing perverse incentives is genuinely difficult.

Third, the uneven adoption gap. The major open-publishing platforms have committed to transparency; many traditional journals offer open review as opt-in but with low uptake. A reviewer’s open-review track record is incomplete if many of their reviews are at journals that do not support open review. The trajectory is positive but uneven.

What CASRAI recommends

Four recommendations. First, journals should default to open reports with reviewer-identity opt-in; the default-opt-in distinction matters for uptake. Second, publishers should deposit reviews to Crossref and ORCID consistently, with the review-credit metadata. Third, institutions should capture peer-review contribution in their CRIS systems and surface it in researcher recognition. Fourth, the responsible-assessment community should treat substantial peer-review work as a legitimate and recognised contribution in narrative CVs and promotion dossiers.

For reviewers, the practical advice is to opt for open identity where journal policy allows, to take the time to write reviews that are substantive enough to count as contributions in their own right, and to maintain their ORCID peer-review record. For authors, the practical advice is to engage seriously with open reviews when received — the public-facing nature is a feature, not a threat.

The longer arc

Open peer review’s mainstreaming is happening alongside, and partly in tension with, the broader concerns about reviewer burden and the sustainability of peer review as an unpaid scholarly contribution. The integration improvements — review DOIs, ORCID deposit, structured credit signals — make peer review more visible, but visibility alone does not solve the volume problem. The responsible-assessment community’s recognition of peer review as legitimate contribution is necessary; it is not sufficient. The next phase of the conversation will likely centre on reviewer compensation, reviewer-load capping, and the integration of peer review into institutional workload models.

Related dictionary entries
June 4, 2026
Cross-institutional CRIS interoperability: the CERIF-Pure-VIVO triangle
Current Research Information Systems (CRIS) have been a critical institutional infrastructure layer for two decades, capturing researchers, their outputs, their funding, their collaborations, their projects. The three dominant data models in 2026 are CERIF (the European standard model, maintained by euroCRIS), Pure (the Elsevier-operated CRIS, with the largest market share in research-intensive universities), and VIVO (the open-source community-maintained CRIS with strong North American adoption). The three are convergent in intent and divergent in detail. This post is a practical guide to interoperating across them.

What each model is

CERIF is the Common European Research Information Format, maintained by euroCRIS since 2002. CERIF is a data model, not a system; it specifies the entities and relationships a CRIS should track and how they should be expressed in XML or RDF. CERIF-CRIS systems exist in many forms (the original CERIF reference implementations, Elsevier’s Pure with CERIF compliance, various national-system implementations) and CERIF compatibility is the lingua-franca claim in the European CRIS market.

Pure is the dominant commercial CRIS product, used by hundreds of research-intensive universities globally. Pure has its own data model, which is broadly CERIF-compatible but with vendor-specific extensions and refinements. Pure’s market position means its data model functions as a de facto standard regardless of its formal status.

VIVO is the open-source community-maintained CRIS originally developed at Cornell and now maintained by an international community under the DuraSpace umbrella. VIVO is built on a semantic-web foundation with an RDF/OWL ontology and explicit federation-friendly design. VIVO has strong adoption in US research universities and a growing international community.

Where the models align

The three converge on the core entities. All three model researchers (with ORCID iDs), organisations (with ROR IDs), publications (with DOIs), projects (with funding metadata), and the relationships between them. The CRediT roles can be expressed in all three. The funder-grant-output structure is representable in all three. For the 80% of routine queries against a CRIS, all three produce comparable answers.

The convergence has been substantially driven by external standards. ORCID, ROR, DOIs, Crossref Funder Registry, CRediT, RDA DMP Common Standard — these external persistent-identifier and metadata standards have pulled the CRIS models toward common representations even where the internal models differ. The CASRAI research information systems domain tracks the convergence.

Where the models diverge

Three areas of substantial divergence.

First, granularity of activities. CERIF models a wide range of research activities at different granularities (projects, work packages, deliverables, milestones); Pure focuses on the publication-centric workflow with project as a supporting entity; VIVO’s ontology accommodates both but is community-extended in ways that vary by deployment. An institution moving CRIS from one platform to another typically loses or transforms activity-level data in ways that require careful migration planning.

Second, contribution and contributorship. CERIF’s contributor structure has evolved to carry CRediT roles natively. Pure carries CRediT but with vendor-specific extensions. VIVO’s ontology can express CRediT but the per-deployment representation varies. A research output with structured contributorship in one CRIS may lose detail when exported to another.

Third, extension and customisation. Pure customers heavily customise their deployments with institution-specific fields and workflows; VIVO sites likewise extend the ontology. The customisations are valuable locally and problematic for cross-institutional interoperability. A federated query that works at one institution may return different fields at another, even where both claim CERIF compliance.

The interoperability layer

The practical interoperability layer in 2026 runs through three exchange mechanisms.

OpenAIRE-CRIS is the European interoperability profile for CERIF, defining a subset of CERIF that all participating CRIS systems can emit and consume. OpenAIRE consumes CERIF-CRIS feeds via OpenAIRE-CRIS and incorporates them into the OpenAIRE Graph. Most European institutional CRIS systems can produce OpenAIRE-CRIS-compliant feeds with modest configuration.

ORCID-CRIS integration is the per-researcher exchange channel. A CRIS depositing publication and affiliation data to ORCID, and consuming corrections back from ORCID, becomes a node in the ORCID-anchored researcher record. All three major CRIS models support ORCID integration, though the depth varies.

Crossref event data and citation feeds provide the publication-level exchange. A CRIS that ingests Crossref event data picks up post-publication corrections, citations, and relationship updates that the local CRIS would otherwise miss.

The three exchange mechanisms together cover most of what cross-institutional interoperability requires. They do not cover the activity-level data that diverges across CRIS models; that data remains harder to interoperate.

What institutions should do

For institutions selecting or migrating a CRIS, the practical recommendations are: prioritise CERIF compliance regardless of vendor; require ORCID integration; require Crossref event-data ingestion; verify OpenAIRE-CRIS compliance for institutions with European funder reporting obligations; insist on data-export capability that includes the full activity-level data, not just the publication-centric subset.

For institutions operating an established CRIS, the priorities are to keep the integration layers current (ORCID 4.0 transition, OpenAIRE-CRIS profile updates, Crossref REST API consumption), to invest in metadata-quality QA, and to participate in the CERIF, Pure, or VIVO community work to influence the data-model evolution.

For CRIS vendors, the priorities are to honour the convergent standards (ORCID, ROR, CRediT, OpenAIRE-CRIS) without burying them under vendor-specific extensions, and to make data export and import paths reliable across customer transitions. The market would benefit from less lock-in friction; the standards work supports that direction.

The euroCRIS-DuraSpace-Elsevier triangle

Beneath the technical layer is an organisational layer. euroCRIS as a standards body, DuraSpace as the VIVO open-source community steward, Elsevier as the Pure operator — these three together substantially set the direction of CRIS evolution. The 2024-2025 coordination work (visible in the joint CERIF-VIVO ontology alignment, the Pure CERIF-compliance certification process, the OpenAIRE-CRIS profile refinement) has been more productive than the prior decade.

The convergence is incomplete and uneven, but the direction is clear. By 2028, cross-CRIS interoperability for the standard entities (researchers, outputs, projects, funding) should be a routine technical exercise, not a multi-year integration project. The activity-level interoperability will follow more slowly.

Related dictionary entries
- CERIF
- Pure CRIS
- VIVO
- euroCRIS
- OpenAIRE-CRIS profile
- CRIS
- OpenAIRE Graph
- DuraSpace
May 19, 2026
Carbon-aware computing for academic HPC clusters
Academic high-performance computing has a material climate footprint. A modern HPC cluster running at scale draws power in the megawatt range; the embodied carbon of the hardware, the operational carbon of the grid electricity, and the cooling overhead together produce annual emissions comparable to a mid-sized industrial facility. The sustainable-research community has been working on this since the late 2010s; 2026 is the year that carbon-aware computing moved from research interest to operational practice at academic clusters. This post walks through what’s happening and what cluster operators should be doing.

What carbon-aware computing means

Carbon-aware computing is a family of techniques for reducing the carbon footprint of computational work without reducing the work itself. The techniques include: temporal shifting, running non-urgent jobs during periods of low-carbon-intensity grid electricity; geographic shifting, running jobs at facilities with cleaner local grids; load-following, scaling cluster capacity with grid carbon intensity; efficiency improvements, doing more work per kilowatt-hour through hardware and software optimisations; demand reduction, eliminating redundant or wasteful computation.

The CASRAI carbon-aware computing entry tracks the terminology and the academic community’s evolving vocabulary.

What’s changed in 2025-2026

Three things converged in 2025-2026 to move carbon-aware computing into practical academic deployment.

First, real-time grid carbon-intensity data became reliable. The Electricity Maps API, Tomorrow’s national emissions data, and several regional grid operators’ direct data feeds now provide sub-hourly carbon-intensity data for most major grids. Scheduling decisions can be made on near-real-time information, not on average historical data.

Second, scheduler integrations matured. Slurm, PBS Pro, and the major HPC schedulers now have plugin or integration paths for carbon-aware scheduling decisions. The plugins consume carbon-intensity feeds and influence job dispatch decisions based on configurable policies. The integrations are not yet universal but are no longer bespoke.

Third, institutional commitments matured. The major UK research councils’ joint commitment to net-zero research by 2040, the EU’s broader sustainability-in-research push under the European Green Deal, several US universities’ institutional net-zero commitments — these created the policy mandate that aligns with the technical capability.

What clusters are doing

A non-exhaustive tour of the patterns we see at academic clusters in 2026.

Temporal scheduling for batch jobs. Most clusters have substantial batch workloads where the deadline is days or weeks out. Carbon-aware schedulers shift these jobs to grid-low-carbon windows. The University of Edinburgh’s ARCHER2, the Stuttgart HLRS cluster, and the Berkeley Lab NERSC system have all reported carbon savings in the 15-25% range from temporal shifting without measurable impact on time-to-result for affected jobs.

Geographic shifting for cloud-burst capacity. Clusters with cloud-burst arrangements for peak loads are increasingly directing burst capacity to cloud regions with cleaner grids. The carbon savings here are large per job but only apply to the burst fraction.

Idle reduction. The least glamorous and most impactful intervention. Clusters typically have substantial idle capacity due to scheduling fragmentation; running fewer nodes more efficiently produces direct emissions reduction. The pattern is to consolidate workload onto fewer nodes during low-demand periods and power down the rest, which requires the ability to bring nodes back up reliably when demand rises.

Hardware efficiency. The energy-per-flop trajectory in HPC hardware has been favourable; recent-generation hardware is materially more efficient than 5-year-old hardware. The cluster-refresh-cycle question becomes a sustainability question: when does the embodied carbon of new hardware get amortised by the operational savings? Mark Allen and the Green Software Foundation have published useful frameworks here.

Software efficiency. Often-overlooked. A scientific code that uses 30% less compute for the same result delivers a 30% emissions saving. Code-efficiency efforts at HPC centres (profiling, algorithmic improvements, library updates) have outsized impact. The Software Sustainability Institute has been advocating this for years and is finally getting traction.

The reporting and accounting layer

An emerging challenge is how to report computational carbon to funders and institutional sustainability offices. The CodeCarbon library, ML CO2 calculator, and several others provide per-job carbon-estimation tools. The estimates are approximate but useful at the order-of-magnitude level. Major HPC centres are now publishing annual carbon reports; the methodology varies and harmonisation work is underway via the Green HPC working group.

The CASRAI sustainable research domain is tracking the reporting standards. Our recommendation is that funders should ask for computational carbon estimates in proposals for compute-intensive work, with the estimate framed as a planning aid rather than a hard constraint.

What researchers should do

Three practical recommendations for researchers running compute-intensive work.

First, profile your code. The single highest-impact intervention is identifying the parts of the workflow that consume disproportionate resources. The Performance Optimisation and Productivity (POP) network in Europe and similar initiatives elsewhere provide free or low-cost profiling support. A well-profiled and reasonably-optimised code typically achieves 1.5-3x the throughput-per-kwh of an unprofiled version of the same workflow.

Second, use carbon-aware schedulers where available. If your cluster supports temporal shifting, mark jobs as deadline-flexible where they genuinely are. The scheduler will exploit the flexibility; the carbon savings accrue without effort on your part.

Third, report and account. Include computational-carbon estimates in your project’s environmental reporting. Make the cost visible. The cultural shift that follows visibility is the longest-term impact.

What institutions should do

For institutional HPC operations, the 2026 priorities are: deploy carbon-aware scheduling; publish annual carbon reports with methodology disclosure; integrate computational-carbon estimation into the user-facing portal; participate in the inter-institutional benchmarking and best-practice exchange via the Green HPC working group.

For institutional sustainability offices, the priority is to bring research computing into the institutional carbon accounting. Many institutional net-zero commitments under-count or omit research computing; this is a material reporting gap.

For funders, the priority is to recognise sustainability as a legitimate cost item in compute-intensive grants and to use the proposal-stage carbon estimation as a planning input rather than a punitive metric. UKRI’s 2024 sustainability-in-research guidance is a useful model.

The honest limits

Carbon-aware computing reduces but does not eliminate HPC’s footprint. A genuinely net-zero research-computing posture requires either grid decarbonisation (largely outside HPC operators’ control) or computational-demand reduction. The demand-reduction conversation is uncomfortable — large language model training, climate modelling at very high resolution, large-scale molecular dynamics — but it is increasingly unavoidable. The sustainable-research community needs to have it without flinching, while continuing the technical work that makes the unavoidable computational work as low-impact as feasible.

Related dictionary entries
May 13, 2026
Diamond OA at the inflection: SciELO + Latindex + AmeliCA
The Diamond OA conversation in 2026 is increasingly framed as a new direction for global scholarly publishing. From a Latin American perspective, this framing is roughly two decades late. SciELO, launched in Brazil in 1998, Latindex operating since 1995, Redalyc from 2003, and AmeliCA consolidating the regional infrastructure from 2018 — together these have operated a working Diamond OA ecosystem at regional scale for a quarter-century. This post looks at what the rest of the world is finally learning from this experience.

The Latin American model

The Latin American scholarly-communication model emerged from a different starting position than the Anglo-American one. Subscription publishing never dominated; commercial publisher penetration was limited; learned societies, universities, and national research councils operated journals as a public-good function. When the open-access conversation arrived in the 2000s, the question for Latin America was not how to flip a subscription system to OA but how to strengthen and federate the already-open infrastructure.

SciELO emerged from FAPESP (the São Paulo research funder) and BIREME (the regional Pan American Health Organisation library) as a quality-controlled regional federation of journals with shared technical infrastructure, peer-review standards, and indexing. Latindex emerged from UNAM as a regional catalogue of scholarly journals with quality criteria. Redalyc emerged from the Universidad Autónoma del Estado de México as a full-text repository of Latin American journals. AmeliCA, launched in 2018, federated the three with explicit Diamond-OA positioning.

The model is community-led, publicly funded, multilingual (Spanish, Portuguese, English, with growing Indigenous-language presence), and operates without article-processing charges. It indexes thousands of journals; the federated catalogue holds over a million articles; the technical infrastructure (XML production, DOI registration, COUNTER-compliant usage statistics) meets international standards.

What worked, and why

Three structural features explain the model’s durability.

First, institutional anchoring. SciELO, Latindex, Redalyc, and AmeliCA are each hosted by major research institutions (FAPESP, UNAM, UAEM) with stable funding. The infrastructure is not project-grant-dependent; it is institutionally sustained. This is the contrast with the European Diamond OA conversation, which has struggled with project-grant precarity, and one of the lessons that the 2024 Plan Diamond declaration explicitly acknowledged.

Second, quality through federation. The journal-level quality criteria (Latindex’s catalogue criteria, SciELO’s collection criteria) are operated as community-standards bodies, not as gatekeepers. A journal that meets the criteria is indexed; the criteria are public; appeals are possible. The federated catalogue is the quality signal; reputation is built through inclusion rather than through individual journal brand.

Third, technical infrastructure shared at scale. The SciELO publishing infrastructure (XML production, web hosting, DOI registration) is offered as a service to participating journals. Journals do not each reinvent the technical layer. This reduces per-journal cost dramatically and is the model that the European Diamond OA capacity centre is now trying to replicate.

What the global North is learning

Three lessons are being absorbed, slowly.

First, institutional funding is the sustainable model. APC-based gold OA reproduces commercial publishing’s economics; transformative agreements concentrate funding in well-resourced consortia; Diamond OA funded by institutions is the cost-effective alternative for the scholarly-communication public good. The OPERAS network in Europe, the cOAlition S 2024 strategic refresh, and the MIT Framework on principles for scholarly communication all explicitly endorse institutional funding for OA infrastructure.

Second, bibliodiversity is a feature, not a bug. The Latin American model publishes in multiple languages, with regional editorial leadership, addressing regional research priorities. The dominant-language, Global-North-centred model that emerged from the subscription era is a historical accident, not a quality standard. The bibliodiversity framing from the Jussieu Call (2018), the Helsinki Initiative on Multilingualism in Scholarly Communication (2019), and the UNESCO Recommendation on Open Science (2021) all draw on the Latin American experience.

Third, regional infrastructure is legitimate research infrastructure. The bibliometric assessment patterns that treated SciELO and similar venues as second-tier indexing did so on assumptions (Web of Science and Scopus as global standards) that are themselves historically and geographically specific. The 2024 Helsinki Initiative implementation guidance and the CoARA reform agenda push assessment systems to recognise regional infrastructure on its own terms.

What still needs work

The Latin American model is not without its tensions. Three deserve mention.

First, discoverability beyond the region. SciELO is indexed in major international databases; Latindex and AmeliCA less so. Articles published in regional Diamond OA venues are findable to those who know to look there; less findable to those defaulting to Web of Science or Scopus. The integration with Crossref and DataCite has improved this, but the discovery-default question remains.

Second, discipline coverage. The Latin American Diamond OA ecosystem is stronger in humanities, social sciences, and applied health than in laboratory natural sciences and engineering, where researchers under bibliometric pressure publish externally. The model needs reinforcement in disciplines where it is currently thinner.

Third, language equity within the region. Indigenous-language and Portuguese publication is growing but is still well behind Spanish. The 2024 AmeliCA strategic refresh prioritises multilingual expansion.

What CASRAI recommends

For Global North funders and institutions considering Diamond OA investment, the operating advice is to learn from the Latin American experience and to support, where possible, integration with the existing regional infrastructure rather than build parallel structures. The 2024 Plan Diamond signatory commitments include several explicit channels for funding regional infrastructure; the CASRAI Diamond OA funder guide walks through the options.

For institutions evaluating their researchers’ contributions, the operating advice is to recognise publication in regional Diamond OA venues on the same terms as publication in international venues. This requires updating bibliometric tools to include the regional indices, updating promotion-and-tenure committees’ reading lists, and treating responsible assessment commitments seriously rather than performatively.

For researchers, the operating advice is to publish where the work fits the venue, not where the bibliometric pressure points. A regional-language paper in a SciELO or Redalyc journal is a legitimate scholarly output and should be claimed and cited as such. The CASRAI bibliodiversity for authors guide discusses the practicalities.

The longer arc

The next ten years of scholarly publishing will be shaped by whether the global system absorbs the Latin American lessons or continues to treat them as regional exceptions. The signs are tentatively positive. cOAlition S’s strategic refresh, the OPERAS work in Europe, the institutional re-investment as transformative agreements expire — all point toward a less commercial, less APC-centred, more bibliodiverse system. The infrastructure to operate that system already exists in Latin America; the rest of the world is catching up.

Related dictionary entries
April 29, 2026
CARE-FAIR tension and how the GIDA Manifesto resolves it
The CARE Principles for Indigenous Data Governance (Collective benefit, Authority to control, Responsibility, Ethics) and the FAIR data principles (Findable, Accessible, Interoperable, Reusable) are often presented as complementary. In practice they have a real tension: FAIR maximises openness and access; CARE centres Indigenous community authority over data, including over what counts as accessible and to whom. The Global Indigenous Data Alliance’s 2024 manifesto on CARE-FAIR integration is the most developed framework for reconciling them. This post walks through the tension, the GIDA Manifesto’s resolution, and what implementers should do.

What FAIR says and what it does not

FAIR, articulated by Wilkinson and colleagues in 2016, is a framework for making data more useful. Findable: data have rich metadata and persistent identifiers. Accessible: data can be retrieved by an authentication-and-authorisation protocol that is open and free. Interoperable: data use shared vocabularies and standards. Reusable: data have clear provenance, licensing, and usage information.

What FAIR does not directly address is who decides what is findable, accessible, interoperable, and reusable. FAIR is technically permissive about access controls — the principles allow that authentication-and-authorisation may restrict access — but the dominant interpretation of FAIR has been maximalist: open by default, restricted only with clear justification. This has produced an implementation pattern where Indigenous data are often treated as candidates for openness with restrictions, rather than as community-governed assets whose access decisions sit with the community.

What CARE says

The CARE Principles, articulated by the Global Indigenous Data Alliance in 2019 and formally published in 2020, are a counterweight rather than a contradiction of FAIR. Collective benefit: data ecosystems are designed and function in ways that enable Indigenous peoples to derive benefit from the data. Authority to control: Indigenous peoples’ rights and interests in Indigenous data must be recognised, and they must have the authority to control such data. Responsibility: those working with Indigenous data have a responsibility to share how those data are used. Ethics: Indigenous peoples’ rights and wellbeing should be the primary concern at all stages of the data lifecycle.

CARE applies to Indigenous data, defined broadly to include data about Indigenous peoples, Indigenous lands, Indigenous resources, and Indigenous knowledge. The principles are not anti-openness; they are pro-authority-with-the-community-on-openness-questions.

The tension in practice

Three illustrative tensions.

First, a researcher working with an Indigenous community produces a dataset documenting traditional ecological knowledge. FAIR-maximalist implementation would push for open deposit with a CC BY licence. CARE-aligned implementation would defer to the community’s governance: the community may choose to share openly, may choose to share with use restrictions, may choose to restrict access entirely, may choose to share with attribution requirements via Traditional Knowledge Labels. The community’s decision is determinative under CARE; the FAIR-maximalist instinct is to nudge toward openness.

Second, a population health dataset includes Indigenous community-level data. FAIR-maximalist implementation would push for de-identified open deposit. CARE-aligned implementation asks whether de-identification is sufficient to prevent community-level identification (often it is not), whether the community has consented to research uses beyond the original study, and whether the planned uses generate collective benefit. The answers may permit open deposit, may require controlled access, or may require negotiated terms.

Third, a museum collection includes Indigenous cultural objects with associated metadata. FAIR-maximalist implementation would push for full metadata openness. CARE-aligned implementation defers to the community on what metadata is appropriate to share, what should be retained but restricted, and what should be returned to community governance.

The GIDA Manifesto

The GIDA Manifesto on CARE-FAIR integration, published in 2024 after extended consultation across the international Indigenous data networks (Te Mana Raraunga in Aotearoa New Zealand, Maiam nayri Wingara in Australia, the United States Indigenous Data Sovereignty Network, the First Nations Information Governance Centre in Canada, and others), articulates a reconciliation framework.

The framework’s core proposition is that FAIR and CARE are sequenced, not simultaneous. CARE comes first: the community’s governance decisions determine what data exist, who has rights in them, what uses are permitted, and what access conditions apply. FAIR then operates within the CARE-determined envelope: findable to those who should find them, accessible under the access conditions the community has set, interoperable for the uses the community has permitted, reusable subject to community-defined terms.

This is not a watering-down of FAIR; the manifesto is explicit that all four FAIR principles are honoured within their proper scope. It is a re-ordering of the implementation question. The pre-FAIR step is not assumed-open; it is community-determined.

Operational implications

For repositories, the operational implications are concrete. Repositories holding or potentially holding Indigenous data need governance arrangements that surface CARE compliance. This means: identifying Indigenous data at deposit; verifying community authorisation; recording the community’s access decisions in machine-readable form; honouring those decisions in the access-control layer; providing for community-initiated access changes over time.

The CASRAI Indigenous data and CARE domain tracks repository implementations. Several have led: the SOLES repository at the Smithsonian, the Indigenous-managed nodes of the OCAP-aligned First Nations data ecosystem in Canada, the Maori Data Sovereignty Network’s portal in Aotearoa New Zealand.

For researchers, the operational implications are about partnership. Research that produces Indigenous data needs to be conducted in partnership with the community, with data-governance arrangements agreed upfront, with the community holding control over downstream uses. The Free, Prior, and Informed Consent framework is the standard reference.

For funders and journals, the operational implications are about review and policy. Funder data-management requirements should recognise CARE-aligned deposit; journal data-availability requirements should accommodate community-governed access decisions. Several major funders and journals have updated their policies in 2024-2025 to do this; the implementation is uneven.

The OCAP and FPIC interfaces

Two adjacent frameworks deserve mention. OCAP (Ownership, Control, Access, Possession), articulated by the First Nations Information Governance Centre in Canada, predates CARE and operates in a more granular operational space; OCAP and CARE are compatible and OCAP-aligned implementations can claim CARE alignment in the relevant scope. FPIC is the consent framework derived from the UN Declaration on the Rights of Indigenous Peoples; FPIC operates at the research-design stage, before data are collected, and is upstream of both CARE and FAIR.

The integrated operational pattern: FPIC governs research design and data collection; OCAP governs the data-control arrangements during and after collection; CARE provides the data-governance framework for repository-level and ecosystem-level decisions; FAIR provides the technical-implementation framework for the openness-within-CARE-envelope work.

What CASRAI recommends

Four recommendations. First, repositories should adopt CARE-aligned governance, with community-controlled access decisions surfaced in the deposit and discovery layers. Second, researchers working with Indigenous communities should structure partnerships under FPIC and follow OCAP or equivalent arrangements. Third, funders and journals should recognise CARE-aligned deposit as fulfilling data-availability requirements. Fourth, the FAIR-data community should adopt the GIDA Manifesto’s sequencing as the default implementation pattern, with the FAIR-first interpretation explicitly identified as inappropriate for Indigenous data.

The reconciliation works. It requires more upfront attention to governance than the FAIR-maximalist default, but it produces outcomes that respect community sovereignty while delivering the technical-interoperability benefits that FAIR was designed for.

Related dictionary entries
April 22, 2026
How to write a CRediT statement for medical research in 2026
Medical-research contributorship sits at an awkward intersection. The International Committee of Medical Journal Editors (ICMJE) still defines who may sign as an author of a clinical paper through its four-part test: substantial contribution to conception/design or acquisition/analysis/interpretation; drafting or critical revision; final approval; and accountability. CRediT, the Contributor Roles Taxonomy that CASRAI helped steward into NISO Z39.104-2022, sits underneath and describes what each named contributor actually did. In 2026, after another wave of journal adoption and the long-anticipated alignment with ORCID’s contributor affiliation model, a CRediT statement is no longer a discretionary nicety. It is the contributorship record of the paper.

Medical-research author lists increasingly include laboratory and diagnostic contributors whose work — for example carrier screening tests in a translational study — needs explicit CRediT attribution rather than a generic acknowledgement.

This post walks through how to write a CRediT statement that satisfies a medical journal’s submission system in 2026, with attention to the editorial conventions of NEJM, The Lancet, JAMA, and The BMJ. It assumes you have already worked out who meets the ICMJE authorship threshold; see our medical-research authors guide for that step.

Authorship versus contributorship: not the same question

The first error we see in submissions is conflating ICMJE authorship with CRediT contributorship. ICMJE answers a binary: does this person qualify to be listed as an author and to be accountable for the work? CRediT answers a granular: of the people who are listed, who did what? A statistician who ran the analysis but did not draft or revise may not meet ICMJE criteria and is acknowledged separately; if they do meet ICMJE criteria, then their CRediT role assignment would include Formal analysis, possibly Methodology, possibly Software, and they would be named on the byline. Liz Allen and the team that originated CRediT at Wellcome Trust were explicit on this distinction; the taxonomy was designed to complement, not replace, journal authorship rules.

For medical research the second confounder is the guarantor. The BMJ has long required a named guarantor in addition to authors, and other ICMJE-following journals encourage the convention for clinical trials. The guarantor sits outside CRediT; it is closest in spirit to the Supervision role plus an accountability commitment, but it is not encoded in the taxonomy. In your CRediT statement, name the guarantor in a separate sentence; do not invent a Guarantor role.

The 14 roles in medical-research context

CRediT’s 14 roles were drafted for general research and need a brief translation when applied to clinical work. The full role definitions are normative; what follows is interpretive guidance, not a redefinition.
- Conceptualization. The research question. For a registered clinical trial this is often a Principal Investigator role; for a secondary analysis it may be a junior contributor with a novel hypothesis.
- Methodology. Study design, choice of endpoints, statistical-analysis-plan structure. A trial statistician contributing to the SAP earns this role even if a different person ran the final analysis.
- Software. Programming for data capture (REDCap configuration counts), randomisation code, custom statistical packages, any analytic script that materially shaped results.
- Validation. Reproduction of analyses, sensitivity analyses, cross-checks against an independent dataset. Often a co-author who replicates the lead analyst’s work.
- Formal analysis. The statistical analysis itself.
- Investigation. Recruitment, screening, consenting, clinical assessments, sample collection. Often the largest list of contributors in multi-site trials.
- Resources. Provision of patient samples, biobanks, animal models, instrument time. Distinct from Funding acquisition.
- Data curation. Data cleaning, harmonisation, query resolution, lock-down.
- Writing – original draft. First-draft authorship of the manuscript.
- Writing – review & editing. Substantive editorial revision, not copy-editing.
- Visualization. Figures, including Kaplan-Meier curves, forest plots, CONSORT flow diagrams.
- Supervision. Mentorship and oversight, often the senior author. A PI typically combines Supervision with Conceptualization and Funding acquisition.
- Project administration. Coordination across sites, ethics submissions, sponsor liaison.
- Funding acquisition. Grant-writing for the funded work.
The lead/equal/supporting qualifier

Adopted formally into NISO Z39.104 and now widely supported, the degree-of-contribution qualifier resolves a recurring source of disputes. For each role, exactly one contributor may be marked Lead, or several may be marked Equal; everyone else for that role is Supporting. In a multi-site oncology trial it is realistic to have a Lead Investigator (the coordinating PI), several Equal Investigators (site PIs), and a longer list of Supporting Investigators (sub-investigators, research nurses who meet ICMJE thresholds). The qualifier exists precisely so that the byline order does not have to encode contribution magnitude.

Writing the statement

A 2026-compliant CRediT statement is rendered as prose in the manuscript and as structured data in the submission system. Most major medical journals now extract the structured form from their submission portal directly; the prose paragraph is for the published version. Here is a worked example for a four-author RCT report:

CRediT author statement. Sarah Chen: Conceptualization (lead), Methodology (lead), Funding acquisition (lead), Supervision (lead), Writing – review & editing (equal). Marcus Okonkwo: Investigation (lead), Project administration (lead), Data curation (lead), Writing – original draft (lead). Priya Raman: Formal analysis (lead), Software (lead), Validation (lead), Visualization (lead), Writing – review & editing (equal). David Holcombe: Methodology (supporting), Investigation (supporting), Writing – review & editing (supporting), Supervision (supporting). Guarantor: Sarah Chen.

Note the explicit guarantor statement, separate from CRediT. Note also that not every role appears; Resources was inapplicable here and should be omitted rather than padded.

JATS XML output

For machine-actionable contributorship, journals serialise CRediT into JATS XML using the <role> element with the vocab="credit" attribute and the canonical role URI. The 2022 NISO version pinned the URIs at https://credit.niso.org/contributor-roles/<role-slug>/ with the qualifier expressed via specific-use="lead|equal|supporting". As an author you do not write the JATS by hand; you fill in the submission portal and the publisher’s tooling renders the XML. Where things go wrong is the round-trip: if the published HTML drops the qualifier, the JATS may also drop it and downstream Crossref deposits will be incomplete. If you care about the persistent record, check the published JATS via the publisher’s content syndication endpoint after acceptance.

Journal-specific notes

NEJM

The New England Journal of Medicine adopted CRediT in late 2023 and integrated it into its Editorial Manager workflow in 2024. NEJM’s idiosyncrasy is that it still asks separately for the prose contribution statement, then asks each author to confirm their CRediT roles, and finally requires a writing-assistance declaration that is not CRediT (it covers professional medical writers funded by sponsors). Do not list a paid medical writer who does not meet ICMJE criteria under CRediT Writing – original draft; declare them in the acknowledgements with the funding source per Good Publication Practice (GPP 2022).

The Lancet

The Lancet was an early CRediT adopter and was unusual in coupling the taxonomy to a long-standing requirement for each author to write a one-sentence prose contribution statement in their own words. Both are retained in 2026. The prose statement is what readers see in the printed acknowledgements; the structured CRediT data lives in the JATS and in Crossref. For a Lancet submission, write the structured assignment first and then have each author translate their own roles into a single readable sentence.

JAMA

JAMA and the JAMA Network journals adopted CRediT in 2022 and tied it tightly to ORCID; an author without a verified ORCID iD cannot complete the contributorship form. JAMA also asks for explicit role assignments for Statistical analysis, Obtained funding, and Administrative, technical, or material support; these are journal-specific role labels that overlap with CRediT but are tracked separately for editorial QA. If you have a Formal analysis role under CRediT you must also tick Statistical analysis on the JAMA form, otherwise the submission will not validate.

The BMJ

The BMJ adopted CRediT in 2023 and retained its long-standing guarantor requirement on top. BMJ’s submission system asks for the CRediT roles in structured form and then asks the corresponding author to identify the guarantor by name. The published article carries both: the CRediT statement as prose, and the guarantor sentence beneath it. BMJ also continues to require declarations of relationships and activities (the BMJ-specific competing interests format) which sit alongside but separately from CRediT.

Common failure modes

Three patterns recur in submissions to medical journals. First, role inflation: assigning Conceptualization to every author by reflex. CRediT is a record, not a recognition device; if a co-author did not contribute to conceptualisation, do not assign that role. Second, byline order substituting for qualifiers: a paper with five equal first-authors should mark all five as Equal on the roles they share, not just rely on a footnote saying “these authors contributed equally.” Third, missing the writing roles: every paper has someone who wrote the first draft. If your CRediT statement omits Writing – original draft, the editor will ask.

Adoption status and trajectory

As of early 2026 the CRediT adoption ledger records 70+ publishers with active CRediT support and structured submission workflows in most major medical and biomedical journals. The ICMJE has not made CRediT mandatory across its full membership, but its 2024 update to the Recommendations explicitly endorses CRediT as an acceptable mechanism for describing contributions, and several ICMJE journals require it. Outside ICMJE, the trajectory is the same: PLOS, Cell Press, Springer Nature, Wiley, Taylor & Francis, Elsevier, OUP, CUP, and a long tail of society publishers now require structured CRediT at submission.

What to do next

If you are preparing a submission, work through these in order: (1) settle the authorship list against ICMJE criteria; (2) draft the CRediT role assignment in a shared document with qualifiers; (3) have each author confirm their roles in writing before submission; (4) enter the structured data in the submission portal and copy the prose statement into the manuscript; (5) declare the guarantor and any medical writers separately. The CASRAI CRediT authors guide contains a downloadable role-assignment worksheet that has saved more co-author disputes than any other artefact we publish.

Related dictionary entries
References

ICMJE, Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals (2024 update). NISO Z39.104-2022, CRediT, Contributor Roles Taxonomy. Allen et al., Nature (2014), Publishing: Credit where credit is due. Brand et al., Learned Publishing (2015), Beyond authorship: attribution, contribution, collaboration, and credit. Holcombe, Publications (2019), Contributorship, not authorship.
April 15, 2026
What ‘broader impacts’ means under the new NSF policy
The US National Science Foundation’s broader-impacts criterion, in force since the mid-1990s and refreshed periodically since, received its most significant revision in 2024 with rolling implementation through 2025-2026. The new policy sharpens what counts as a credible broader-impacts plan, what evidence applicants need to provide, and how reviewers should weight the criterion. This post is a practical orientation for applicants and research-administration offices handling NSF proposals in 2026.

Broader-impacts thinking has clear analogues in civic health policy, where bodies addressing London public health frame research value in terms of population outcomes and inequalities rather than citations alone.

Broader impacts, briefly

NSF merit review uses two co-equal criteria: intellectual merit (the scientific value of the proposed work) and broader impacts (the benefits to society beyond the immediate science). Broader impacts can be of many kinds: education and training of students, public engagement, infrastructure development, dissemination beyond peer-reviewed venues, increased participation of under-represented groups in STEM, partnerships with industry or non-academic users, contributions to national interests including economic competitiveness and security.

The criterion has been criticised since its introduction for being weakly evaluated, for being treated as an afterthought to intellectual merit, and for accepting plausibility statements in lieu of evidence-based plans. The 2024 revision is the NSF’s response to those criticisms.

What the new policy changes

Three substantive changes.

First, evidence-based planning. The new policy expects broader-impacts plans to be grounded in evidence from the broader-impacts and informal-STEM-learning literature. An applicant claiming that a proposed K-12 outreach activity will improve student interest in science needs to ground that claim in published evidence about what works in K-12 STEM engagement, not just assert it. The policy explicitly references the work of Bevan, Falk, Dierking, and others in the informal-learning research community.

Second, measurable outcomes. Plans must specify what success looks like in observable terms, what data will be collected to assess success, and how the assessment will be conducted. Vague aspirations (“will inspire the next generation of scientists”) are not sufficient; observable indicators are required. The policy stops short of requiring randomised controlled trials of outreach activities but moves substantively toward evaluation rigour.

Third, integration with intellectual merit. The policy emphasises that broader impacts should be substantively connected to the research, not bolted on as a separable activity. A proposal whose broader-impacts plan is a generic education-outreach activity unrelated to the science is now explicitly weaker than a proposal whose broader-impacts plan extends the science in directions that benefit specific communities.

What this means for applicants

The practical implications. First, build the broader-impacts plan into the proposal narrative, not into a separate annex. Show the connection between the science and the broader impact. The CASRAI NSF applicant guide walks through structural patterns that integrate the two.

Second, cite evidence. If you propose outreach to under-represented groups, cite published evidence about effective interventions for those groups. If you propose a teacher-development workshop, cite the literature on effective teacher development. The expectation is not that applicants become broader-impacts researchers; it is that the plan is grounded in someone’s research, with citations.

Third, specify the assessment. What will you measure? How will you measure it? Who will analyse the data? A typical strong assessment specifies: outcome measure (e.g., self-reported student interest in STEM career on a validated instrument); data collection (pre/post survey administration); analysis plan (paired-difference test with effect-size reporting); reporting venue (project annual report plus a conference paper).

Fourth, partner with people who do this work. Researchers who are not broader-impacts specialists should partner with informal-STEM-education professionals, with community-engagement specialists at their institution, with K-12 educators who can co-design and assess interventions. The partnerships should be visible in the proposal: who is doing the work, what their qualifications are, what their role is.

What this means for research administration

Research-administration offices supporting NSF proposals should be upskilling around the new policy. Three priorities. First, build a broader-impacts library: a curated set of evidence-based plans, vetted partners, and assessment instruments that researchers can adapt. Second, offer structured proposal-development support that includes broader-impacts review by someone qualified to assess the evidence base. Third, support post-award assessment: the broader-impacts assessment plans now in proposals need to be executed during the award and reported in the final project report.

The CASRAI institutional broader-impacts guide includes a checklist for research-administration offices building this capacity.

How reviewers should evaluate

NSF’s revised reviewer guidance asks reviewers to evaluate broader impacts on the same evidence-based and outcome-specified terms. A reviewer should ask: is the plan grounded in evidence? Are the outcomes measurable? Is the assessment credible? Is the broader-impacts work integrated with the research?

This is a substantive shift. Pre-2024 reviewer guidance often produced broader-impacts ratings that reflected the reviewer’s gestalt impression of the applicant’s commitment. The new guidance pushes toward more analytic evaluation, with the explicit recognition that broader impacts is a domain with its own expertise and its own literature.

The wider implications

Three wider implications worth noting.

First, the policy normalises broader-impacts research as a discipline. NSF has historically funded broader-impacts research thinly; the new policy implicitly raises the visibility of the field and the demand for its outputs. We expect funding for broader-impacts research itself to increase in subsequent budget cycles.

Second, the policy aligns with the international move toward structured impact reporting. The UK’s Research Excellence Framework impact case studies, the EU’s Horizon Europe expected-impact framework, and several other funder frameworks all push in similar directions. CASRAI’s engagement, impact, and SDG domain tracks the international landscape.

Third, the policy creates a soft incentive toward partnerships between research-intensive universities and the institutions (community colleges, K-12 systems, science museums, community organisations) that have broader-impacts capacity. The partnerships, where they work, are mutually beneficial; where they do not, broader impacts becomes a service-delivery problem dressed as a research-grant activity.

Open questions

Two open questions for 2026 and beyond. First, the resource implications: an evidence-based broader-impacts plan with a real assessment costs money, sometimes a substantial fraction of the project budget. NSF has signalled that meaningful broader-impacts costs are budgetable and reviewable on their merits, but the budget pressure is real, particularly for small grants. Second, the equity implications: applicants from research-intensive universities have easier access to broader-impacts capacity than applicants from less-resourced institutions. The new policy may inadvertently widen the gap between institution categories. NSF is aware of this risk and the next policy update is expected to address it.

For applicants in 2026, the operating posture is to take the new policy seriously, partner with people who do this work, ground your plan in evidence, specify the assessment, and integrate broader impacts with the research. The bar has risen; the proposals that clear it will be substantially stronger than the pre-2024 baseline.

Related dictionary entries
April 8, 2026
GenAI in scholarly authorship: the 2026 disclosure landscape
The 2023 ICMJE position that generative AI cannot be a co-author has aged into a stable consensus across scholarly publishing, but the implementation surface around it has grown fast. In 2026 the question is no longer whether to disclose AI use in a manuscript; it is how, where, and with what evidence. This post maps the current disclosure landscape, the technical mitigations that publishers expect authors to apply, and the residual uncertainty around detection.

The ICMJE 2023 position and its echoes

In January 2023 the ICMJE updated its Recommendations to add that chatbots cannot be authors because they cannot meet the accountability criterion: an LLM cannot take responsibility for the integrity of the work, cannot approve the final version in any meaningful sense, and cannot be contacted by readers seeking clarification. The position was endorsed within weeks by the World Association of Medical Editors and by COPE. By mid-2023 every major publisher had aligned. The AI co-authorship rejection is now treated as a settled norm.

What replaced the brief flurry of “ChatGPT as co-author” papers was a more nuanced question: how should authors disclose AI use when the system is a tool? This is where 2024 and 2025 brought significant fragmentation, and where 2026 has begun to consolidate.

The publisher landscape in 2026

Nature and the Springer Nature stable

Nature requires authors to declare any use of LLMs in the Methods section (for research articles) or in the acknowledgements (for editorial and review content). The declaration must specify the model, the version, the date of use, and the purpose. Nature does not permit AI-generated images or figures except where the AI generation itself is the subject of the research. Springer Nature has cascaded a similar policy across its journals with light variation.

Cell Press and Elsevier

Cell Press and Elsevier journals require disclosure of AI-assisted writing in a dedicated declaration that sits alongside competing interests and funding. The declaration is structured: type of tool, purpose (e.g., language polishing, literature search, code generation, image analysis), and a confirmation that the authors take full responsibility. Elsevier additionally requires that AI-generated text be reviewed and edited by the authors and explicitly forbids using AI for peer review.

Wiley

Wiley’s policy distinguishes between using AI as a tool (allowed with disclosure) and using AI to generate substantive intellectual content (not allowed). The distinction is fuzzy at the boundary, and Wiley’s submission system asks authors to self-classify. Wiley also publishes its Best practice guidelines on research integrity and publishing ethics which were updated in 2024 to cover GenAI in detail.

PLOS, eLife, F1000Research

The open-publishing platforms have generally taken the position that AI use must be disclosed and that authors are responsible for verification, but they have been more permissive about disclosed and reviewed AI use than the closed-access incumbents. eLife in particular has experimented with AI-assisted peer review summaries, with disclosure to authors and readers.

What “disclosure” actually requires

The fragmentation across publisher policies has converged on a common five-element disclosure, which CASRAI’s AI disclosure helper assembles into a publisher-specific declaration:
1. Tool and version. Not “ChatGPT” but “GPT-4o (OpenAI), version of 2025-12-04.”
2. Purpose. One of: language polishing, translation, literature search, code generation, data extraction, image analysis, hypothesis generation, draft writing. If the use spanned multiple purposes, list each.
3. Scope. Which sections or artefacts were involved. “Abstract and discussion polished” is meaningfully different from “first draft written.”
4. Human verification. A statement that named authors have reviewed and verified the output and take responsibility for it.
5. Prompt and output retention. Increasingly, journals are asking authors to retain prompts and outputs for audit. Cell Press now formally asks; Nature recommends. Treat this as a 5-year retention obligation.
See our AI disclosure for authors guide for the publisher-by-publisher decision tree.

The hallucination problem

The single largest editorial concern in 2026 remains hallucination: an LLM fabricating a citation, a method, or a result and the authors failing to catch it. Retraction Watch tracked over 200 retractions in 2024 and 2025 attributable in whole or part to undisclosed AI-generated fabrications, primarily fictitious references but increasingly fabricated quantitative results in tables.

The mitigations are well-known and surprisingly under-applied:
- Citation verification. Every citation in an LLM-generated draft must be checked against the actual source. Tools like Scite, Semantic Scholar’s citation graph, and Crossref’s metadata API help. The bare minimum: every DOI must resolve and the paper at that DOI must say what the LLM claims.
- Numerical verification. If an LLM produces a number, the human author must reproduce the number from the underlying source. “The LLM said it” is not provenance.
- Retrieval-augmented generation (RAG). Grounding an LLM in a fixed corpus of verified sources, with citation chaining, reduces but does not eliminate hallucination. RAG-based research-writing tools (Elicit, Consensus, scite Assistant) have an accuracy edge over raw LLMs precisely because they constrain the model to a verifiable corpus.
Munafò and colleagues at the UK Reproducibility Network have argued, correctly in our view, that AI-assisted writing should sit inside the same reproducibility envelope as the rest of the work: prompts and outputs are part of the methods, not part of the prose.

Detection and watermarking

The detection problem has not been solved. Tools that claim to identify AI-generated text by perplexity or burstiness have unacceptable false-positive rates against careful human writers and are easily defeated by simple paraphrasing. AI-assisted writing is, on the open web, essentially undetectable in 2026.

Three more promising directions exist. First, watermarking: the major LLM providers have prototyped statistical watermarks (Google’s SynthID-Text, OpenAI’s research-stage text watermark) that embed a detectable signal in token-selection statistics without affecting fluency. Adoption has been slow because authors can defeat watermarks by re-rolling with a different model, and because no publisher has committed to refusing un-watermarked submissions. Second, provenance metadata: the C2PA standard (originally for images) is being extended to text, with cryptographically signed assertions of generation source. Third, process auditing: rather than detecting AI in the output, audit the authors’ process artefacts (version history, prompt logs, draft trail). This is the direction in which institutional integrity offices are moving.

For authors, the practical takeaway is that you should not rely on undetectability. The conservative path is disclosure plus verification.

What about peer review?

The 2026 consensus is that peer reviewers may not paste unpublished manuscripts into a third-party LLM. The reason is confidentiality, not anti-AI sentiment: a paper under review is privileged information and most LLM providers retain inputs in some form. NIH and several large funders have made this an explicit policy for proposal review; publishers are catching up. eLife and a handful of others are experimenting with publisher-hosted LLM tooling that does not exfiltrate the manuscript, which threads the needle.

Where this is going

Three trajectories are visible. First, the disclosure form will converge: expect a NISO or COPE-led standardisation of GenAI disclosure within 18-24 months, modelled on the structured CRediT statement. Second, prompt-and-output retention will become mandatory for high-stakes journals (clinical, regulatory-relevant), and audited at random. Third, the line between “AI as tool” and “AI as substantive contributor” will be tested by hybrid systems where the human author’s contribution is curation, framing, and verification rather than generation. We expect the integrity community to draw a harder line on quantitative and methodological substance than on prose: an AI may polish your discussion section with disclosure, but an AI may not propose your analytic method without that proposal being independently validated and disclosed.

For now, the operating rule is straightforward. If you used AI, disclose it specifically. If the AI produced text or numbers in your paper, verify them yourself. If a publisher asks for prompts and outputs, retain them. If you are reviewing a paper, do not paste it into a chatbot. The GenAI disclosure domain at CASRAI tracks the publisher-by-publisher policy text for authors who need to comply across multiple submission targets.

Related dictionary entries
References

ICMJE, Recommendations (January 2023 update, defining authorship to exclude AI). WAME, Chatbots, ChatGPT, and Scholarly Manuscripts (2023). COPE, Authorship and AI tools position statement (2023, reaffirmed 2025). Nature editorial, Tools such as ChatGPT threaten transparent science; here are our ground rules for their use (2023). Munafò et al., The reproducibility debate is an opportunity, not a crisis (PLOS Biology, 2022).
April 1, 2026
Persistent identifiers in 2026: ORCID + ROR + RAiD + DOI
By 2026 the persistent-identifier (PID) layer of scholarly infrastructure has stabilised into a quartet: ORCID for people, ROR for organisations, RAiD for projects, and DOI for almost everything else (articles, datasets, software, instruments, samples, preregistrations). Each is operationally distinct, each has its own governance, and the seams between them are where most metadata loss still happens. This post is a tour of the state in early 2026, the integrations that work well, and the crosswalk gaps that institutions and publishers are still grinding through.

Identifier discipline is not confined to scholarly PIDs. Pathogen-genomics surveillance relies on the same instinct — stable, rule-based names that machines and humans can both resolve — as seen in the SARS-CoV-2 lineage nomenclature maintained by the Pango community.

ORCID at 2 million+ active iDs and counting

The Open Researcher and Contributor iD is by some distance the most successful PID for people. As of 2026, ORCID reports over 21 million registered iDs with active use across publishing, funding, and institutional workflows. The 2024 integration of CRediT at the ORCID-record level was the missing piece that closed the loop: a contributor’s roles on a specific work can now be carried persistently on their ORCID record, not just in the publisher’s JATS.

The CASRAI-stewarded contribution to ORCID’s data model also added affiliation history with PIDs (ROR for the organisation, RAiD for projects, DOI for grants), which means a complete ORCID record now resolves the four-way join. The ORCID federation page at CASRAI documents the per-country member organisations and the institutional integration patterns; the ORCID implementation page documents the technical integration via the public API and the member API.

What’s still hard with ORCID in 2026

Coverage is uneven by discipline. Life sciences and physics push close to saturation; humanities, education, and applied fields still have large populations who have not registered. Coverage is also uneven by career stage: PIs are nearly universal, postdocs and senior researchers have high coverage, graduate students and undergraduate contributors are sparse. The policy question of whether to require ORCID at submission, mandate it for funding eligibility, or leave it voluntary remains contested; the major medical journals (NEJM, JAMA, BMJ, Lancet) effectively require it for corresponding authors, the broader landscape does not.

ROR: the organisational identifier that finally worked

The Research Organization Registry launched in 2019 and crossed 100,000 active records around 2023. By 2026 ROR is the default organisational PID across Crossref, DataCite, ORCID, and most institutional CRIS systems. The decision to keep ROR governance light, the data open under CC0, and the IDs free at all volumes has been validated by adoption.

ROR replaced a Babel of organisational identifiers (GRID, ISNI for orgs, ad-hoc Crossref Funder Registry entries, individual database keys at each tool). The integration pattern is clear: every author affiliation should resolve to a ROR ID, and that ID should travel through the metadata to Crossref, DataCite, ORCID, and the institutional repository. Most publishers now require or strongly prefer ROR at submission.

ROR’s main 2025-2026 work has been on hierarchies: capturing the parent/child relationships between research institutions, hospitals, university systems, and consortia in a queryable way. A hospital with an academic affiliation needs both identifiers; a multi-campus university needs the campus-level resolution; an INSERM unit needs both the unit ROR and the parent INSERM ROR.

RAiD: the project-level PID

The Research Activity Identifier is the newest of the quartet and is filling a real gap. A project (a grant, a clinical trial, a multi-institution collaboration, a piece of fieldwork) was for years the orphan of the PID world: it had no canonical identifier, so its outputs (papers, datasets, software, training of students) could not be reliably linked back to it. RAiD, now an ISO standard (ISO 23527:2022) and operated as an open infrastructure with multiple national service providers, fills the gap.

The RAiD model is straightforward: a RAiD record names the project, lists its participants (with ORCID iDs), its institutions (with ROR IDs), its outputs (with DOIs and other PIDs), its funders (with ROR or Funder Registry), its dates, and its status. A funder issues or registers a RAiD on award; the awardees update it as the project evolves; the outputs reference it.

In 2026 RAiD adoption is strongest in Australia (where the ARDC operates a national RAiD service), New Zealand, the UK (where UKRI has integrated RAiD into its funding workflow), and parts of the EU. North American uptake has been slower; NIH’s evolving project-identifier approach overlaps with but is not identical to RAiD, and the harmonisation is still in negotiation.

DOIs across artefact types

The Digital Object Identifier is the workhorse. The interesting story in 2026 is not DOIs for journal articles (a settled problem) but DOIs for everything else.
- Data. DataCite issues DOIs for datasets; coverage is high in well-funded disciplines (biology, astronomy, climate, social sciences via ICPSR), lower in others. The FAIR-data push has pulled coverage up.
- Software. Software DOIs via Zenodo and via journal-specific archives (e.g., Software Impacts, JOSS) are now standard practice. Citation of software with DOIs is endorsed by the FAIR4RS principles.
- Preprints. Every major preprint server issues DOIs; the Crossref preprint relationship metadata links the preprint DOI to the eventual journal-article DOI.
- Preregistrations. OSF and AsPredicted issue preregistration DOIs.
- Samples. IGSN (International Generic Sample Number) is a specialised PID for physical samples, increasingly issued as a DOI under the DataCite umbrella.
- Instruments. The PIDINST initiative is rolling out DOIs for major research instruments, with full metadata about specifications and operators.
The crosswalk layer

The most underrated work in the PID ecosystem is the crosswalk layer: the mappings between identifier systems that make the quartet actually function as a graph. Crossref’s references and relations blocks link DOIs to other DOIs and to other PIDs. DataCite’s relatedIdentifier field plays the same role for datasets. ORCID’s work entries carry DOIs and increasingly RAiDs. ROR records carry relationships to other organisations and to Funder Registry entries.

Where crosswalks fail is at the institutional repository boundary. A paper deposited in a university IR may have a DOI, but the IR record’s author affiliation may not resolve to a ROR ID; the project funding may be in a free-text field rather than a Funder Registry ID; the dataset associated with the paper may have a DOI but no relation to the paper’s DOI declared. The result is a graph with broken edges, which then degrades discovery downstream.

The work to fix this in 2026 is mostly unglamorous metadata QA at the repository and CRIS layer. Several institutional CRIS vendors have shipped “PID completeness” dashboards that flag records with missing or unresolvable identifiers. The OpenAIRE Graph project rolls up the global picture and is the most useful external check.

What we expect to settle by 2027

Three threads are in motion. First, contributor-affiliation provenance: an ORCID record’s affiliation history with PIDs at both ends will become the canonical record, replacing the per-publisher author-affiliation strings that journals still maintain separately. Second, funder-PID standardisation: the Crossref Funder Registry and ROR are converging, with funders that are also research organisations getting unified records. Third, project-PID interop: RAiD, NIH project IDs, and the EU’s HORIZON identifiers are being mapped to allow a single project to be tracked across funder boundaries.

For institutions setting strategy now, the practical priorities are: require ORCID at hire and on every publication; map every internal organisation to ROR; integrate RAiD into the funding-receipt workflow if you have it; and run quarterly QA against your CRIS for missing PIDs. The CRIS guidance at CASRAI walks through the integration patterns by vendor.

Related dictionary entries
References

Meadows et al., Persistent Identifiers: The Building Blocks of the Research Information Infrastructure (Information Services and Use, 2019). Cousijn et al., A data citation roadmap for scholarly data repositories (Scientific Data, 2019). FAIR4RS Working Group, FAIR Principles for Research Software (RDA, 2022). ARDC documentation on RAiD operational service. ROR governance documentation and CC0 data dumps.
March 18, 2026