Blog

  • Reproducibility frameworks in practice: TOP, ARRIVE, CONSORT, PRISMA

    The reporting-guideline ecosystem has grown to nearly 600 distinct guidelines tracked by the EQUATOR Network. For an author or editor staring at this in 2026, the question is not which guideline to applaud but which to actually use, when, and at what depth. This post walks through the four frameworks that anchor the field, the FAIR4RS guidelines for research software, and the registered-report turn that is reshaping pre-publication reproducibility commitments.

    The four anchors

    TOP Guidelines

    The Transparency and Openness Promotion (TOP) Guidelines, developed at the Center for Open Science by Brian Nosek and colleagues, are the journal-policy framework rather than the per-paper checklist. TOP defines eight standards (citation, data transparency, analytic methods transparency, research materials transparency, design and analysis transparency, study preregistration, analysis-plan preregistration, replication) and three levels of stringency at which a journal can adopt each. A journal signing onto TOP commits to a profile of standard-by-standard adoption.

    TOP’s contribution is structural: it gave editors a vocabulary to discuss reproducibility policies and a benchmark against which their journals could be assessed. By 2026 the TOP Factor (a score of journals’ policies against the TOP standards) is widely used to compare journal reproducibility commitments, alongside the more famous and less informative Journal Impact Factor. The CASRAI reproducibility standards page tracks the current TOP adoption ledger.

    ARRIVE 2.0

    The ARRIVE guidelines (Animal Research: Reporting of In Vivo Experiments), revised in 2020 from the original 2010 version, are the canonical reporting standard for animal research. ARRIVE 2.0 introduced the Essential 10 (the must-report items) and the Recommended Set (the should-report items), which made the guideline more usable for both authors and reviewers.

    ARRIVE adoption in 2026 is high in funder mandates (NIH, MRC, NC3Rs) but uneven in journal enforcement. The retrospective audits keep finding that even ARRIVE-required papers miss core items (randomisation method, blinding, sample-size justification). The lesson is that requiring a guideline at submission is not the same as enforcing it at peer review.

    CONSORT 2010 and its extensions

    The CONSORT 2010 statement is the reporting standard for randomised controlled trials and the most-cited reporting guideline in scholarly publishing. A CONSORT-compliant RCT report covers the title and abstract, methods (design, participants, interventions, outcomes, sample size, randomisation, blinding, statistical methods), results (participant flow, baseline data, primary and secondary outcomes, ancillary analyses, harms), and discussion (limitations, generalisability, interpretation). The CONSORT flow diagram (enrolled, allocated, followed-up, analysed) is itself a reportability artefact that has done more for trial transparency than most policy documents.

    The 2025 revision of CONSORT (CONSORT 2025) is being finalised and is expected to integrate explicit reporting requirements for adaptive trial designs, machine-learning-derived endpoints, and patient-public involvement. The current standard is 2010 with several extensions (Cluster, Pragmatic, Non-pharmacological, Harms, Patient-reported outcomes, Outcomes, AI). Authors of any RCT should consult the relevant extension as well as the core standard.

    PRISMA 2020

    The PRISMA 2020 statement is the reporting standard for systematic reviews and meta-analyses. The 2020 revision modernised the 2009 original to reflect changes in search-and-screening practice (preprint searches, GitHub/OSF searches, ML-assisted screening), risk-of-bias assessment (ROB 2 for trials, ROBINS-I for non-randomised studies, AMSTAR-2 for review quality), and reporting formats (the PRISMA-S extension for search reporting, PRISMA-NMA for network meta-analyses).

    PRISMA’s role in the systematic-review economy is dispositive: journals routinely refuse review submissions that do not include a PRISMA flow diagram and checklist. The remaining failure mode is checklist-completion-without-substance, where a paper ticks the boxes but the underlying review work is shallow.

    Why these four anchors and not others

    The four cover the bulk of submission volume in clinical and life-science journals: RCTs (CONSORT), systematic reviews (PRISMA), animal studies (ARRIVE), and the meta-question of journal policy (TOP). For observational studies, STROBE is the analogue of CONSORT; for diagnostic accuracy studies, STARD; for case reports, CARE; for qualitative research, SRQR or COREQ; for AI-clinical-prediction models, TRIPOD-AI and PROBAST-AI. The EQUATOR Network’s searchable database remains the canonical entry point.

    Computational reproducibility

    The reporting-guideline tradition was built around clinical and life-science studies. Computational reproducibility (your code, your data, your dependencies, run on your computer, gives the same answer) was historically not in scope and is now belatedly the focus of much of the methodological community’s attention.

    The 2024-2025 convergence is around three pillars. First, data deposition in a FAIR-compliant repository with a DOI, with explicit licensing. Second, code deposition with a DOI (typically via Zenodo with a Git-tagged release), with explicit dependencies (environment files, container image hashes, or both). Third, computational environment via container (Docker, Singularity/Apptainer), or via a more lightweight pinned manifest (R’s renv, Python’s pip-tools, Julia’s Project.toml).

    The FAIR4RS Principles, finalised by the RDA working group in 2022 and now widely cited, extend the FAIR data principles to research software. Software should be Findable (DOI, descriptive metadata), Accessible (open repository where possible), Interoperable (using standards), and Reusable (with a clear licence, documentation, and provenance). FAIR4RS is being integrated into funder data-management-plan requirements in 2026; the UK’s UKRI, the EU’s HORIZON Europe, and several US funders now ask for software-management plans as a distinct artefact from data-management plans.

    Pre-registration and registered reports

    Preregistration (committing to your hypotheses and analysis plan before seeing the data) has moved from a niche reproducibility-community practice to a mainstream expectation in psychology, parts of medicine, and increasingly in economics and political science. The Center for Open Science’s preregistration tools have crossed 200,000 registered studies; ClinicalTrials.gov and the WHO ICTRP carry the trial register.

    The more interesting development is Registered Reports, a journal format in which a study protocol is peer-reviewed before data collection. If accepted at this Stage 1 review, the journal commits to publishing the Stage 2 manuscript regardless of whether the results are positive, negative, or null. Over 300 journals offer Registered Reports as of 2026, including several major medical journals. The empirical evidence is clear: Registered Reports show much lower positive-results rates than conventional submissions in the same fields, consistent with what we would expect if the conventional system suffers from publication bias.

    How to use this in practice

    For an author submitting a paper, the workflow is:

    1. Identify your study design and find the matching EQUATOR-listed reporting guideline (or guidelines, if multiple apply, e.g., a cluster RCT might use CONSORT plus the Cluster extension).
    2. Use the guideline’s checklist while drafting, not as a checkbox exercise at submission. The checklists are designed to prompt completeness.
    3. For computational components, deposit data and code with DOIs, declare dependencies, and consider a container if your environment is non-trivial.
    4. If your design supports it, consider preregistration or a Registered Report. The discipline of pre-specifying is itself the reproducibility intervention; the registration is the audit trail.
    5. In the methods, explicitly cite the guideline(s) you followed. Cite the deposited data and code with their DOIs in the references, not just in a parenthetical.

    Where this all goes

    The next wave of reporting-guideline work is around AI-clinical-prediction reporting (TRIPOD-AI, finalised in 2024; CLAIM for AI imaging studies), real-world-evidence studies (RECORD-PE, STaRT-RWE), and qualitative-meta-synthesis (ENTREQ). The structural question is whether the proliferation is helping or hurting. We think the answer is that the per-method guidelines are valuable but the cross-cutting transparency standards (TOP, FAIR, FAIR4RS, the registered-report meta-format) are doing the heavier lifting. Editors who pick a TOP profile and enforce it across submissions get more reproducibility uplift than editors who require a guideline checklist and then ignore the contents.

    Related dictionary entries

    References

    EQUATOR Network, Reporting Guidelines for Health Research (continuously updated). Nosek et al., Promoting an open research culture (Science, 2015, introducing TOP). Page et al., The PRISMA 2020 statement (BMJ, 2021). Percie du Sert et al., The ARRIVE guidelines 2.0 (PLOS Biology, 2020). Chambers, The Seven Deadly Sins of Psychology (Princeton, 2017, on Registered Reports). RDA FAIR4RS Working Group, FAIR Principles for Research Software (2022).

  • EOSC Federation governance: what changed in 2026

    The European Open Science Cloud (EOSC) entered its second formal phase at the start of 2026 with a substantially revised governance and funding model. The transition from the EOSC Association-led implementation phase (2021-2025) to the EOSC Federation operating model (2026 onwards) is the most significant infrastructure-governance change in EU open science since the FAIR principles were articulated. This post walks through what changed, who is affected, and how integrators should reposition.

    What EOSC was, and what it is becoming

    EOSC was conceived in 2016 as a federated cloud-based infrastructure providing researchers across the EU and associated countries with access to data, services, and computing for open science. The 2018 implementation phase, the 2021 launch of the EOSC Association, and the cascade of EU-funded projects (EOSC Future, EOSC-Pillar, FAIRsFAIR, EOSC Synergy, and many others) built out the technical layer: the EOSC Portal, the EOSC Marketplace, the AAI federation, the metadata aggregation via OpenAIRE, the persistent-identifier infrastructure connections.

    The challenge that has dogged EOSC since its inception is the gap between the technical layer and the operational sustainability layer. EU project funding built impressive infrastructure on time-limited grant terms; what happens after the grants run out has been the open question. The 2024-2025 EOSC strategic review concluded that the project-grant model was not sustainable and that EOSC needed a federated operating model funded by recurring contributions from member states and associated members.

    The federation model

    The 2026 EOSC Federation is structured as a tiered membership organisation. Core nodes are member-state-designated infrastructures providing strategic services (data repositories of national scale, compute infrastructure, identity providers, metadata aggregators). Core-node operation is co-funded by the home member state and by EOSC central funds; the criteria for core-node status include FAIR data implementation, sustainability commitments, and federation-API conformance.

    Federated nodes are participating infrastructures meeting a lighter set of conformance criteria; they integrate with the EOSC federation via documented APIs and contribute to the federated graph but are not core-funded. Most existing EOSC-listed services move into federated-node status by default.

    Affiliated services are third-party infrastructures (commercial cloud providers, international partners, non-EU regional clouds) that participate via specific agreements without being members of the federation governance.

    The governance structure has three tiers: a Strategic Council of member-state representatives setting policy; an Operational Board of core-node operators making day-to-day decisions; a Stakeholder Forum of federated nodes, researchers, and user-community representatives providing input.

    What changed for technical integrators

    Three concrete changes matter for technical integrators. First, the EOSC Interoperability Framework v2 ships with the federation launch. It refines the v1 framework with tightened metadata-quality requirements, mandatory ROR organisational IDs for service providers, and standardised provenance metadata for data-services interactions. The CASRAI EOSC IF entry has been updated.

    Second, the EOSC AAI consolidation moves the federation onto a unified authentication-and-authorisation infrastructure based on EduGAIN, MyAccessID, and ORCID. The previous federation of identity providers (with several parallel AAI implementations across projects) is being merged. Service providers integrating with EOSC need to support the consolidated AAI; legacy integrations have a 24-month transition window.

    Third, the persistent-identifier requirements have been tightened. Core-node services must issue or accept persistent identifiers (DOI, Handle, ORCID, ROR, RAiD as appropriate) for all federated artefacts; metadata records without PIDs are no longer ingested into the EOSC Graph. This brings EOSC into line with what OpenAIRE Graph has long been recommending; in practice many services already comply.

    What changed for researchers

    The visible changes for researchers are modest in 2026 and will accumulate through 2027-2028. The EOSC Portal remains the primary discovery surface; the EOSC Marketplace continues to list services with the new conformance tiers visible. The single sign-on becomes more reliable as the AAI consolidation reduces the friction across services.

    The deeper change is in the funder-mandate side. Several Horizon Europe calls in 2026 require EOSC-conformant data deposit (deposit in a core-node or federated-node repository, with FAIR metadata, with PID assignment). This raises the floor: research data from these projects must land somewhere that is part of the EOSC federation, with all the metadata-quality implications. The CASRAI institutional EOSC guide walks through the deposit options by discipline.

    The sustainability question

    The federation’s financial model is its hardest unresolved problem. Member-state contributions, EU central funding, and a small fee component from affiliated services are the three pillars. The 2026 budget is set; the 2027 and beyond budgets depend on the Multiannual Financial Framework negotiations that will roll through 2026 and 2027.

    The risk to the federation is a return to project-grant precarity: if member-state contributions falter, core-node operations become dependent on rolling EU-project funding again, and the long-term sustainability gain is illusory. The EOSC Association has been clear that member-state buy-in is the critical lever; the institutions and researchers who use EOSC infrastructure should be engaging their national funder representatives on the federation contribution case.

    What integrators should do in 2026

    For repository managers, the immediate priorities are: verify your service’s status (core node, federated node, affiliated, none); align your metadata to EOSC IF v2; ensure PID assignment is complete for federated artefacts; migrate to the consolidated AAI within the 24-month window.

    For institutional CRIS systems and CRIS vendors, the priorities are: ingest EOSC-Portal service metadata into the CRIS service-catalogue layer; support EOSC-conformant deposit workflows from the CRIS to federated repositories; surface EOSC-mandate compliance in the CRIS reporting layer. The CASRAI CRIS integration guide has been updated.

    For publishers, the touchpoint is data-availability and code-availability metadata. A paper depositing supporting data in an EOSC federated repository should declare so in its metadata; the EOSC Graph then ingests the linkage. Publishers should be updating their submission systems to capture EOSC-repository deposit identifiers.

    The international dimension

    EOSC has always sat in relation to the broader international open-science infrastructure: the African Open Science Platform, the LA Referencia network in Latin America, OpenAIRE-Nexus, the Asia-Pacific GRDi work. The federation governance explicitly includes a framework for international affiliation, and several non-EU national infrastructures have begun affiliation discussions. The medium-term direction — five to ten years out — is a loosely federated international open-science cloud with EOSC as one of several regional implementations.

    For institutions outside the EU but with significant EU collaboration, the practical implication is that EOSC conformance is becoming a useful target even where it is not mandated. Aligning with EOSC IF v2 and the related FAIR-data infrastructure makes EU collaboration smoother and positions for the broader international federation as it emerges.

    Related dictionary entries

  • DORA, CoARA, and the Hong Kong Principles: the responsible-assessment lineage

    The responsible-research-assessment lineage runs from DORA in 2012 to the Leiden Manifesto in 2015 to the Hong Kong Principles in 2020 to the CoARA Agreement in 2022 to UKRI’s 2024 R4RI mandate and to a steadily lengthening list of similar institutional mandates worldwide. The throughline is consistent: stop using journal impact factors and h-indices as proxies for research quality; assess the research itself, by people qualified to read it, with attention to a wider range of contributions and to the contexts that shape them.

    The throughline has been clear for over a decade. The implementation has been slow. This post traces what each step of the lineage actually committed signatories to, what has held the practical turn back, and what UKRI’s R4RI does that earlier mandates did not.

    DORA: the inflection

    The San Francisco Declaration on Research Assessment, drafted at the ASCB annual meeting in December 2012 and released in May 2013, was the first widely-signed statement that the journal impact factor should not be used as a proxy for individual-researcher quality. DORA’s core recommendation was simple: do not use journal-based metrics in hiring, promotion, or funding decisions; assess scientific content directly.

    The recommendation looked unobjectionable in 2013 and was endorsed within months by major funders (Wellcome, HHMI), publishers (PLOS, eLife, BMC), and societies. The signatory list has since grown past 25,000 institutions and individuals. The problem is that signing DORA and implementing it are not the same thing. Multiple audits over the 2017-2022 period found that most DORA signatories’ promotion-and-tenure committees still relied heavily on journal-prestige proxies, sometimes formally and more often informally.

    The Leiden Manifesto: the operating principles

    Published in Nature in April 2015 by Diana Hicks, Paul Wouters, and colleagues, the Leiden Manifesto for Research Metrics articulated ten principles for the use of bibliometrics: quantitative evaluation supports qualitative expert judgement (not replaces it); measure performance against the research missions of the institution, group, or researcher; protect excellence in locally relevant research; keep data collection and analytical processes open, transparent and simple; allow those evaluated to verify data and analysis; account for variation by field; base assessment of individual researchers on a qualitative judgement of their portfolio; avoid misplaced concreteness and false precision; recognise the systemic effects of assessment and indicators; scrutinise indicators regularly and update them.

    The Leiden Manifesto was more operationally useful than DORA precisely because it described how to use metrics responsibly, not just which metrics to avoid. The principle that quantitative evaluation supports rather than replaces qualitative judgement remains the most cited and the most violated.

    The Hong Kong Principles: linking integrity and assessment

    The Hong Kong Principles for assessing researchers, drafted at the 2019 World Conference on Research Integrity in Hong Kong and published in 2020 by Moher, Bouter, Kleinert and colleagues, took a different angle. The Hong Kong authors argued that current assessment systems actively encourage poor integrity practices (publication bias, salami slicing, p-hacking, gift authorship) because they reward publication count and JIF over rigour. The five Hong Kong principles ask assessors to value: responsible research practices, complete reporting, open science, a diverse range of research types and outputs, and a range of contributions to research.

    The Hong Kong contribution is the explicit link between assessment reform and research-integrity reform. As long as we assess researchers on counts and impact factors, we are paying them to publish more, not better. The Hong Kong frame is now the dominant interpretation among integrity researchers and is referenced explicitly in the CoARA Agreement.

    CoARA: the policy machine

    The Coalition for Advancing Research Assessment was launched in July 2022 by the European Commission, Science Europe, and a wide coalition of European universities, funders, and learned societies. The CoARA Agreement (the founding document) commits signatories to ten reform actions over a defined timetable, with the centrepiece being the move from a quantitative-indicator-led assessment to a portfolio-and-narrative-led assessment.

    What makes CoARA different from DORA, Leiden, and Hong Kong is that CoARA is institutional, has a secretariat, has working groups, and has a peer-pressure mechanism: signatory institutions must publish an Action Plan within one year of joining and report annually on progress. By early 2026 CoARA has over 700 signatory institutions across Europe and a growing footprint in Latin America, Africa, and Asia.

    The CoARA working groups are working through the practical implementation questions that DORA hand-waved: how do you build a hiring committee evaluation rubric that does not collapse back onto JIF; how do you train evaluators; how do you handle international comparisons when home and visiting institutions have different reform stages; how do you preserve fairness across career stages and disciplines.

    The narrative CV turn

    The practical artefact most visibly downstream of this lineage is the narrative CV. Replacing the long-list-of-publications CV with a structured narrative of contributions across several dimensions (research outputs, leadership, team-building, broader impact, and more) was first piloted by the Royal Society (2017), then formalised by the Royal Society of Biology and by UKRI as R4RI (Résumé for Researchers and Innovators).

    The narrative CV asks applicants to articulate, in a defined structure, what they have contributed to their team and field. The Royal Society R4RI structure (since adopted by UKRI and several other funders) has four modules: how you have contributed to the generation of knowledge; how you have contributed to the development of individuals; how you have contributed to the wider research community; how you have contributed to broader research and innovation users and audiences. Each module is several paragraphs of prose with selected outputs cited as evidence.

    UKRI’s 2024 mandate that R4RI be the default CV format across UKRI funding schemes was the first wholesale move of a major funder to mandate a narrative format. The 2025 evaluations of UKRI’s first-cohort R4RI grants are tentatively positive: applicants report finding the format more flexible but more demanding; reviewers report richer assessment material but slower review; outcome diversity (career stage, discipline, institution) appears to have widened modestly. The CoARA working group on narrative formats is using the UKRI experience as its primary case.

    For authors preparing R4RI-format CVs, the CASRAI narrative CV guidance walks through the four modules with examples and common failure modes (over-quantification, under-selection of outputs, generic claims).

    What’s holding implementation back

    Three frictions persist. First, international comparability. A researcher trained in a CoARA-signatory institution applies for a postdoc in a non-signatory institution and has to translate their narrative CV into a publications-and-h-index format, often disadvantageously. Second, evaluator training. The skill of reading a narrative CV well is not innate; evaluators trained on JIF-based assessment default back to it under time pressure. Third, algorithmic ranking. As long as institutions are ranked by university league tables that count high-impact publications, individual hiring committees will be reluctant to fully de-emphasise those publications.

    The work-arounds in 2026 are pragmatic: CoARA’s working groups are producing evaluator training materials; narrative-CV templates are increasingly tool-supported (ORCID’s narrative-CV draft module, the ARDC RAiD-linked R4RI tooling); and at least three European university-ranking systems (the THE Impact Rankings, the U-Multirank framework, the CWTS Leiden Ranking’s responsible-metrics variant) are explicitly excluding JIF-based criteria.

    The 2026 outlook

    The most likely 2026-2027 development is convergence of the European CoARA framework with the North American equivalent that is emerging from NIH’s UNITE initiative and from the OSTP’s open-science memo. The signal that this convergence is real will be a joint funder declaration on assessment reform, probably late 2026 or 2027. The risk is that responsible-assessment policies multiply without converging, leaving researchers to navigate a different framework with each funder.

    For institutions writing or revising assessment policies now, the practical advice is to sign CoARA (or DORA at minimum), commit to a narrative-CV pilot, train your evaluators, and report transparently on outcomes. The responsible-assessment domain at CASRAI tracks signatory institutions’ action plans and the published evaluations.

    Related dictionary entries

    References

    DORA, San Francisco Declaration on Research Assessment (2013). Hicks et al., Bibliometrics: The Leiden Manifesto for research metrics (Nature, 2015). Moher et al., The Hong Kong Principles for assessing researchers (PLOS Biology, 2020). CoARA, Agreement on Reforming Research Assessment (2022). UKRI, Resume for Researchers and Innovators (R4RI) guidance (2024 mandate).

  • Making sense of the EU AI Act for research administration

    The EU Artificial Intelligence Act entered into force in August 2024 with a staged implementation timeline that runs through 2027. By February 2025 the prohibited-AI-practices provisions and the AI-literacy obligation became binding; through 2025 the general-purpose-AI provisions came into effect; in 2026 the high-risk-AI obligations begin to apply; in 2027 the act is fully in force. Research-administration offices across Europe (and at non-EU institutions handling EU data subjects or EU collaborators) have been working through the implications. This post is a practical orientation, not legal advice, on what the act requires of research administration in 2026.

    What the act actually covers

    The EU AI Act is risk-tiered. Prohibited practices (social scoring, real-time biometric identification in public spaces with narrow exceptions, exploitative manipulation) are out, full stop. High-risk AI systems — defined in Annex III to include AI used in education, employment, law enforcement, critical infrastructure, and several other domains — face substantial obligations around risk management, data governance, technical documentation, transparency, human oversight, accuracy, and post-market monitoring. Limited-risk AI (chatbots, emotion-recognition systems, AI-generated content) faces transparency obligations. Minimal-risk AI faces none specific to the act.

    The research-specific carve-outs are important but narrower than is sometimes claimed. The act excludes AI systems and models developed solely for the purpose of scientific research and development; it does not exclude AI systems used in the conduct of research that is not itself AI research. A clinical-trial protocol that uses an AI system for patient stratification is not exempt because it is research; the AI system is being deployed in a context (healthcare) covered by the act. The exemption is for AI as an object of study, not AI as a tool of study.

    Where research-administration touches the act

    Five touchpoints in practice.

    1. AI literacy obligation

    Article 4 requires providers and deployers of AI systems to take measures to ensure a sufficient level of AI literacy of their staff and others using AI systems on their behalf. This applies to research-administration staff using AI tools (proposal-screening assistants, plagiarism detection with AI components, AI-assisted compliance review). The required “sufficient level” is not specified in detail; the European AI Office and national competent authorities are expected to publish guidance. The CASRAI EU AI Act entry tracks the guidance as it emerges.

    Practically, institutions should be running AI-literacy training for research-administration staff in 2026. This need not be elaborate; an annual two-hour training covering what AI systems the institution uses, what their limitations are, what the disclosure obligations are, and where to escalate concerns is a defensible baseline.

    2. High-risk AI in education and employment

    Annex III includes AI systems used in education (admissions decisions, student assessment, allocation to programmes) and in employment (recruitment, performance evaluation, task allocation). University admissions offices using AI to triage applications fall within high-risk; research-administration offices using AI to score research proposals likely do not, but the boundary is being tested. Employment decisions about research staff — using AI to rank job applicants or to score performance for promotion — clearly fall within high-risk.

    For research administration, the practical question is whether any AI system in current or planned use crosses the threshold. The compliance checklist runs: identify all AI systems in use; categorise each against the act; for high-risk systems, conduct a fundamental-rights impact assessment; ensure human oversight is meaningful, not nominal; document the risk-management system; register in the EU database.

    3. GenAI transparency obligations

    Article 50 requires that AI-generated content be marked as such, with limited exceptions. For research administration, this affects AI-generated text in proposal review, AI-generated summaries of compliance documents, AI-generated translations of regulatory text. Where AI is used to generate content that will be read by a human as if it were human-produced, the act requires a marker.

    This dovetails with the publisher-led GenAI disclosure conventions for scholarly content. The CASRAI institutional GenAI disclosure guidance integrates the publisher requirements and the EU AI Act obligations into a single workflow.

    4. Data governance and GDPR alignment

    The AI Act intersects extensively with the GDPR. High-risk AI systems must use training, validation, and testing data sets that are relevant, sufficiently representative, free of errors, and complete. For systems trained on personal data, the GDPR’s purpose-limitation and minimisation principles apply alongside the AI Act’s data-governance requirements. Research administration that procures or deploys AI systems should ensure the AI vendor can document training-data provenance and consent status for any personal data used.

    5. Research-exemption boundary cases

    The research exemption is being tested at the boundary. A university research group developing an AI system as their research output is exempt; the same group using the system in a clinical context with EU patients is not. A university operating a public-facing AI service developed in-house is a provider under the act and subject to the full provider obligations even if the development was research. The European AI Office has indicated it will publish boundary guidance through 2026; until it does, the conservative reading is that any AI use outside the development phase brings the act into play.

    The compliance checklist

    The practical 2026 checklist for a research-administration office:

    • Inventory all AI systems in use or planned use across research administration.
    • Categorise each system against the AI Act risk tiers.
    • For high-risk systems, conduct a fundamental-rights impact assessment.
    • For GenAI use, ensure transparency markers are applied to AI-generated content.
    • For employment-decision systems involving research staff, ensure human oversight is documented and meaningful.
    • Run AI-literacy training for relevant staff.
    • Verify that AI vendors can document training-data provenance and consent.
    • Align AI Act compliance with GDPR processes; do not run parallel programmes.
    • Track guidance from the European AI Office and national competent authority.
    • Document everything; the act’s audit posture is documentation-heavy.

    Non-EU implications

    The act’s extraterritorial reach matters for non-EU institutions. If an institution outside the EU operates an AI system whose output is used in the EU, the act applies. A US university running AI-assisted admissions for an EU campus, a UK research administration office using AI to triage proposals from EU collaborators, a Canadian institution running a GenAI service available to EU users — all may fall within the act’s scope. Non-EU institutions with material EU engagement should run the same compliance checklist as EU institutions.

    What’s still uncertain

    Several material questions remain open through 2026 and will be resolved by Commission guidance, national-authority interpretation, or early case law. Where does the boundary of “research and development” sit? How is “sufficient level of AI literacy” measured? What documentation suffices for the fundamental-rights impact assessment? How does the act interact with existing sectoral regulation (clinical-trials regulation, education-sector law, employment law) in member states? The CASRAI compliance and regulatory domain is tracking these questions and publishing updates as guidance emerges.

    For now, the operating posture for research administration is: take the inventory; do the risk-tiering; document the high-risk systems; run the literacy training; treat the act as a serious ongoing compliance programme, not a one-off exercise. The penalties under the act are substantial and the enforcement architecture is being built; the institutions that started in 2024-2025 are well placed, those that haven’t started should begin now.

    Related dictionary entries

  • Research data management plans go machine-actionable: the maDMP transition

    The Data Management Plan has been an awkward artefact of the funded-research workflow since the 2010 NSF mandate. For a decade it was a few-page PDF, written at proposal stage, signed off by a librarian, and largely forgotten until the next proposal. In 2026 the picture is different. The machine-actionable DMP (maDMP) has matured from a Research Data Alliance working-group concept into operational infrastructure at major funders, repositories, and institutions. This post walks through what changed, what the tooling actually does, and what the corresponding software-management-plan turn means for funded projects in 2026.

    The maDMP idea

    The traditional DMP described what the project would do with its data: which formats, which repositories, what metadata, what retention. The information was useful but was locked in prose; downstream systems could not consume it, so the same information was re-entered in each new context (the repository deposit form, the funder report, the institutional CRIS).

    The maDMP idea, formalised by the RDA DMP Common Standard, is to express the DMP as structured machine-readable data (a JSON document conforming to the Common Standard schema) that downstream systems can consume directly. The same information that is human-readable in the PDF is also available as queryable structured data, with PIDs (ORCID for people, ROR for organisations, DOI for repositories, RAiD for the project).

    The Common Standard, finalised by the RDA Active DMPs working group, defines the JSON schema, the controlled vocabularies, and the core required fields. Version 1.1 was the working release through 2022-2024; the 1.2 update (2024) added software-management-plan extension fields and refined the contributor model to align with CRediT.

    The tooling landscape

    By 2026 several mature tools implement the maDMP standard, with substantial integration across regions.

    DMPRoadmap and its descendants

    DMPRoadmap is the open-source maDMP platform jointly developed by the UK Digital Curation Centre and the University of California Curation Center. It powers the UK’s DMPonline service (used by all UKRI councils, NIHR, Wellcome, and most UK universities) and the US’s DMPTool (used by NSF, NIH, NASA, DOE, and a long list of US institutions). The 2024 unification of DMPRoadmap’s code base across the UK and US deployments was a significant administrative move; the platform is now a single shared upstream with regional customisations.

    Argos

    Argos is the European maDMP platform developed by OpenAIRE and EUDAT. Argos is the default DMP tool for HORIZON Europe-funded projects and integrates tightly with the EOSC Zenodo deposit workflow, the OpenAIRE Graph, and the Funding and Tenders portal. Argos’s interesting design choice was to model the DMP as a layered set of templates (funder-specific layered on discipline-specific layered on institutional), which allows a single underlying maDMP to be presented through whatever lens an evaluator needs.

    RDA-DMP-Common-Standard-native tools

    The RDA Common Standard is now natively supported by most major repository platforms (DataCite Fabrica, Zenodo, Figshare, Dryad), by several CRIS systems (Pure, Elements, Converis, DSpace-CRIS), and by funder portals. The interop story is real: a maDMP produced in DMPonline can be exported as Common Standard JSON, consumed by Argos for a HORIZON proposal, and round-tripped to a DataCite repository at the deposit stage.

    What an maDMP integration looks like in practice

    The integration patterns that have settled in 2026 follow this shape. At proposal stage the researcher uses DMPonline/DMPTool/Argos to draft a maDMP, pulling in ORCID, ROR, and grant metadata where available. The maDMP is exported as Common Standard JSON and attached to the proposal.

    At award stage the funder pulls the maDMP into its grant-management system; the project’s RAiD or grant DOI is back-linked to the maDMP record. The maDMP becomes the canonical project plan that the institution’s CRIS, repository, and reporting workflow all reference.

    At repository deposit the dataset metadata (title, authors, methods, related publications) is pre-populated from the maDMP fields. Saving the deposit triggers an update to the maDMP record itself, marking that planned dataset as deposited, with the resulting DOI.

    At reporting stage the funder’s annual report and the institution’s research-information report both query the same maDMP record. Re-keying of the same information across systems is materially reduced.

    The user-visible artefact is still a PDF (funders and reviewers still want one) but it is generated from the structured record on demand, not authored in prose.

    Where it falls short

    Three frictions remain in 2026. First, discipline coverage: the Common Standard was drafted with bench-science and social-science data primarily in mind. Humanities collections, qualitative archives, and operational data (e.g., conservation biology fieldwork data) fit the schema imperfectly. The RDA working groups have been extending the controlled vocabularies but the long tail is long.

    Second, repository alignment: not every repository accepts maDMP-imported deposits cleanly. The big repositories do; the long tail of institutional repositories often have legacy metadata schemas that need mapping.

    Third, privacy and sensitivity: a maDMP for a clinical study contains potentially sensitive information about data access controls, IRB approvals, and consent terms. The Common Standard supports flagging fields as confidential, but the operational practice of differential disclosure to different downstream consumers (funder, repository, public) is still being worked out.

    The Software Management Plan turn

    The 2024-2025 development that genuinely shifted the field was the formal addition of Software Management Plans as a sibling artefact to DMPs. Funders had been asking for them informally for years; the 2024 explicit requirement at NWO (Netherlands), NIH (for selected programmes), DFG (Germany), and several UKRI councils brought the SMP into the mainstream.

    An SMP describes how the project’s software will be developed and stewarded: licence, repository, citation, dependencies, sustainability beyond the project life. The 2024 SMP guidance from the Software Sustainability Institute defined a five-section structure that has been broadly adopted: software descriptions, software development practice, software deposit and citation, sustainability and maintenance, and intellectual property and licensing.

    The maDMP Common Standard 1.2 added an SMP extension that allows the SMP to be expressed alongside the data plan in the same JSON document, with cross-references between data products and the software that produces them. By 2026 this is becoming the default: a project produces one structured maDMP that covers data and software, with both halves referenced from the same RAiD.

    FAIR data integration

    The maDMP transition has reinforced the practical operationalisation of FAIR. A maDMP that names a repository (with a DOI), a metadata schema, a controlled vocabulary, and a licence is structurally more FAIR-compliant than a prose DMP that mentions “a suitable repository.” The maDMP-to-deposit integrations make F (findable) and A (accessible) measurable; the I (interoperable) and R (reusable) elements need the metadata-schema and licence choices to actually be implemented.

    The RDA FAIR Data Maturity Model indicators are now being calculated automatically from maDMP records and repository deposits in several institutional CRIS systems. The 2026 picture is that institutions can measure FAIR-ness of their research outputs at the data-product level, not just claim it; this is a real improvement on the 2020 baseline.

    What to do if you are starting now

    For a researcher writing a proposal in early 2026:

    1. Use the funder-recommended maDMP tool (DMPonline, DMPTool, or Argos depending on funder).
    2. Identify your data products and their target repositories early; choose repositories that accept Common Standard imports.
    3. If your project produces software, draft an SMP alongside the DMP using the SSI five-section structure. The Common Standard 1.2 lets you keep them in one record.
    4. Provide ORCID and ROR identifiers everywhere they are asked for. Link the maDMP to the project’s RAiD if your funder supports it (Australian, UK, EU funders are most likely to).
    5. Treat the maDMP as a living document. Update it on data deposit, on milestone reporting, and on project closeout. The point of machine-actionability is that updating it is cheap.

    For institutions setting policy, the RDA federation page tracks the working-group outputs and adoption status. The maDMP domain at CASRAI carries the canonical Common Standard reference.

    Related dictionary entries

    References

    RDA Active DMPs Working Group, DMP Common Standard for machine-actionable DMPs (v1.1, 2022; v1.2, 2024). Miksa et al., Ten principles for machine-actionable data management plans (PLOS Computational Biology, 2019). Software Sustainability Institute, Checklist for a Software Management Plan (2024 revision). EOSC, EOSC FAIR Implementation Profiles (2023).

  • Three CRediT misuses we see in submitted papers

    CASRAI’s editorial network includes journal editors who handle CRediT statements daily, and we periodically aggregate the patterns of misuse they see. Three failures recur across disciplines, journal sizes, and submission systems. None are scandalous; all are correctable with attention. This post catalogues them with concrete examples and the editorial responses that work.

    Failure one: role inflation

    Role inflation is the most common CRediT failure by a wide margin. It is the practice of assigning every author every role, or near-every role, regardless of what they actually did. A typical inflated statement reads like a litany: Author A: Conceptualization, Methodology, Investigation, Formal analysis, Data curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Funding acquisition, Project administration. Author B: Conceptualization, Methodology, Investigation, Data curation, Writing – review & editing. Author C: Conceptualization, Methodology, Writing – review & editing. Every author is conceptualisation-positive; every author methodology-positive; every author writing-positive.

    The pattern is recognisable and almost always wrong. Five authors did not all conceive the study. Five authors did not all design the method. Five authors did not all write the original draft. Role inflation reflects a misunderstanding of what CRediT is for: it treats the role assignment as a credit allocation (the more roles you have, the more credit you get), when CRediT is a description of contribution. As Liz Allen and the original CRediT designers were explicit, the taxonomy is meant to record what each contributor actually did, not to maximise their visible role count.

    The editorial fix

    Editors increasingly push back at submission. The Lancet‘s convention of requiring each author to write a prose contribution statement in their own words is unusually effective; it forces a moment of reflection on what the author actually did. Several other journals have adopted variations. The CASRAI CRediT authors guide includes a role-assignment worksheet that asks each author to write a one-sentence justification per role before the statement is finalised; the discipline of writing the justification surfaces most cases of role inflation before submission.

    Where inflation has already made it into a submission, the editorial response is to ask the corresponding author to revise. The framing that works is methodological: “We use CRediT to describe what each contributor actually did. Please review the role assignments and confirm that each role corresponds to a substantive contribution by that author.” This is rarely contentious; in our experience the corresponding author tightens the statement on review.

    Failure two: byline order substituting for qualifiers

    The degree-of-contribution qualifier was added to NISO Z39.104 specifically to resolve byline-order disputes. A paper with three co-first-authors should mark them all as Equal on the roles they share; a paper with a clear lead on one role and supporting contributors on others should use Lead and Supporting accordingly. The qualifier is structurally what byline order has long tried to encode implicitly.

    The misuse we see is statements that ignore the qualifier and rely on byline order or footnotes to communicate contribution magnitude. A typical example: a paper with five authors and a footnote saying “authors 1 and 2 contributed equally” but a CRediT statement that assigns roles without qualifiers, leaving the reader to infer what “equally” means across the roles. Is author 1’s Investigation equal to author 2’s Investigation? Is author 1’s Formal analysis equal to author 2’s Formal analysis? The footnote does not say; the unqualified CRediT statement does not say.

    The editorial fix

    Adopt the qualifier explicitly. If two authors contributed equally to a role, mark both Equal on that role. If one author was the lead and others supported, mark Lead and Supporting. Footnotes about equal contribution become redundant; the structured statement carries the information.

    For journals, the editorial implementation is to require the qualifier in the submission system. The CRediT JATS specification supports the qualifier via the specific-use attribute; submission systems should expose this and require it. A few publishers have already moved here; we expect most to follow through 2026.

    Failure three: missing writing roles

    Every paper has someone who wrote the first draft. If a CRediT statement omits Writing – original draft, the editor will ask. This is the third recurring failure: statements that distribute Methodology, Investigation, Formal analysis, and Supervision but leave Writing – original draft unassigned.

    The pattern usually reflects a real ambiguity. In a paper with three co-equal authors who jointly drafted, who gets Writing – original draft? The answer is all three, marked Equal. In a paper where a postdoc drafted under supervision and a senior author heavily revised, who gets which writing role? Almost always: postdoc gets Writing – original draft (lead); senior author gets Writing – review & editing (lead). In a paper where a paid medical writer drafted, the medical writer is typically not an author per ICMJE — they are acknowledged separately — and the authors who substantively shaped the draft get Writing – original draft as appropriate.

    The editorial fix

    Editors should treat “who wrote the first draft” as a required question at submission. The BMJ asks this explicitly. The CASRAI worksheet asks it. If the statement does not name a Writing – original draft contributor, the editor’s standard response is a one-line query: “Please indicate which author or authors discharged the Writing – original draft role; the role is currently absent from the CRediT statement.” In our editor network this query gets a fast, accurate response and the role is added before review proceeds.

    Three lesser failures worth a paragraph each

    Beyond the big three, three lesser failures are worth noting. First, conflating Methodology and Formal analysis: the role definitions distinguish these (Methodology is the study design; Formal analysis is the statistical or analytical work on the resulting data) and assigning both to the same person without distinction loses information. Second, assigning Software to anyone who touched a computer: Software is meaningful programming work, not opening Excel; if the contributor wrote no code, did not script the analysis, did not configure REDCap, they probably did not discharge the Software role. Third, missing Funding acquisition: someone wrote the grant. If the CRediT statement does not name a Funding acquisition contributor and the paper is grant-funded, the role is missing.

    What CASRAI recommends

    Four practical recommendations. First, use the role-assignment worksheet at the drafting stage, not at submission; it catches most misuse early. Second, require the degree-of-contribution qualifier in your journal submission system. Third, treat missing Writing – original draft as a default editorial query. Fourth, when in doubt about role inflation, ask each author to write a one-sentence justification per role; the discipline reveals the over-assignment naturally.

    For the broader system, the most useful intervention is journal submission system support. Adoption at the policy level is now widespread, but the per-submission UX varies enormously. A submission system that prompts for qualifiers, validates that every role has a contributor, and asks per-author confirmation of role assignment catches most failures before they reach editorial review. We expect this UX to converge through 2026 as publishers update their Editorial Manager and ScholarOne configurations.

    Related dictionary entries

  • FAIR data assessment frameworks: a buyer’s guide for institutions

    The FAIR Principles (Findable, Accessible, Interoperable, Reusable) were published in 2016 and have become the dominant framework for talking about research-data quality. The harder problem – measuring whether a particular dataset, repository, or institutional output is actually FAIR – has produced a small ecosystem of assessment frameworks. In 2026 there are five we recommend institutions consider, and they answer slightly different questions. This post is the practical buyer’s guide.

    The five frameworks that matter

    The frameworks differ along two axes: what they assess (a single dataset, a repository, an institution’s overall position) and how they assess (automated against metadata, structured self-assessment, third-party audit). A well-equipped institution uses two or three of them for different purposes.

    RDA FAIR Data Maturity Model

    The RDA FAIR Data Maturity Model, finalised by the RDA working group in 2020, is the canonical indicator framework. It defines 41 indicators across the four FAIR pillars, each at one of four maturity levels (essential, important, useful, neutral). It does not prescribe the assessment method; it provides the rubric.

    The Maturity Model is the most-cited framework in funder documents and is the de-facto common reference for FAIR assessment. Its strength is interoperability: a tool that calculates against the RDA indicators produces results comparable across institutions and disciplines. Its weakness is that the indicators are abstract; turning them into operational checks requires a tool.

    F-UJI

    F-UJI (FAIRsFAIR Research Data Object Assessment Service), developed by FAIRsFAIR and PANGAEA, is the most-used automated assessment tool. F-UJI takes a single dataset identifier (typically a DOI), retrieves its metadata, and runs a battery of automated checks against the RDA Maturity Model indicators. It produces a numeric FAIR score and a detailed report.

    F-UJI is genuinely useful for dataset-level assessment because it actually fetches and tests the metadata. It catches real failures (missing licence, missing schema declaration, dead landing-page links) that self-assessment tools miss. Its limits are also real: it can only check what is machine-discoverable, so a dataset can score well on F-UJI and still be unusable in practice if the documentation is poor. As of 2026 F-UJI is available as a hosted service and as a self-deployable container.

    FAIR-Aware

    FAIR-Aware, developed by DANS, is a structured self-assessment tool aimed at researchers who are about to deposit a dataset. It asks ten questions about the dataset’s intended preparation and produces guidance on which FAIR principles are being met and which need work. FAIR-Aware is pedagogical rather than evaluative: its purpose is to nudge depositors into thinking about FAIR before they deposit, not to score them afterwards.

    FAIR-Aware is the right tool when an institution is trying to improve deposit quality and researcher FAIR literacy. It is the wrong tool when the question is “how FAIR are our existing holdings.”

    CESSDA self-assessment

    The CESSDA FAIR self-assessment, oriented toward social-science data archives, is closer to a structured repository audit. It asks the repository to evidence its compliance against a published framework that maps to the RDA indicators. CESSDA is interesting because it is discipline-specific: it knows that social-science data has particular consent, sensitivity, and harmonisation patterns and asks questions sensitive to those.

    ARDC FAIR Framework

    The Australian Research Data Commons FAIR framework, developed for the Australian context with funder backing, includes a self-assessment, a checklist for repository operators, and a benchmark for institutional services. The ARDC framework’s strength is that it has been operationalised at scale across Australian universities; its principles translate well but its administrative artefacts are Australia-specific.

    The institutional decision tree

    A reasonable institutional approach in 2026 looks like this:

    1. For overall institutional position: use the RDA Maturity Model as the reference framework. Cite it in policy documents, training materials, and funder reports. It is the common language.
    2. For deposit-time researcher guidance: deploy FAIR-Aware (or your repository’s built-in equivalent) at the deposit interface. The goal is researcher behaviour change, not measurement.
    3. For periodic dataset-quality auditing: run F-UJI against a representative sample of your repository holdings on a quarterly cycle. Use the results to drive metadata quality improvements at the repository level.
    4. For repository certification: pursue CoreTrustSeal certification for any repository whose data underlies cited research outputs. CoreTrustSeal is more rigorous and more useful externally than the FAIR self-assessments.
    5. For discipline-specific work: layer the relevant disciplinary framework (CESSDA for social sciences, the FAIRsharing community standards for life sciences, etc.) on top of the generic frameworks.

    What the frameworks miss

    The frameworks all do well at the F (Findable) and A (Accessible) pillars because these map well to machine-checkable metadata. They do less well at I (Interoperable) and R (Reusable) because interoperability and reusability depend on context that automated tools cannot evaluate (is the data dictionary actually meaningful? do the controlled vocabularies match the relevant community standards? would another researcher in this field find the documentation sufficient?).

    The mitigation is to pair automated assessment with structured peer review of high-value datasets. F-UJI tells you the metadata is well-formed; a peer review tells you the data is actually useful. The institutional practice we have seen working is to run F-UJI quarterly and to commission peer-data-reviews of the institutionally-flagged “strategically important” datasets annually.

    FAIR assessment and CoreTrustSeal

    CoreTrustSeal certification is the closest thing to an external audit of a repository’s trustworthiness. It is more rigorous than FAIR self-assessment because it requires a substantive submission and an external review. It covers governance, sustainability, technical infrastructure, data quality, and discoverability. By 2026 most major institutional and disciplinary repositories are CoreTrustSeal-certified; the certification is increasingly required in funder data-management requirements.

    CoreTrustSeal and FAIR assessment are complementary. CoreTrustSeal asks “is this repository a trustworthy place to deposit data?”; FAIR assessment asks “are the datasets in this repository FAIR?” An institution should be able to answer yes to both.

    Integration with the institutional CRIS

    The 2024-2025 development that has changed institutional practice is the integration of FAIR assessment into the institutional CRIS. Pure, Elements, Converis, and DSpace-CRIS now ship modules that show FAIR scores for each dataset record, computed from the underlying repository deposit. The institutional dashboard can then aggregate (overall FAIR score by department, by year, by funder), spot drift, and flag low-scoring records for improvement.

    The pattern that works is: institutional repository (DSpace, Figshare, Dataverse) exposes dataset metadata; CRIS pulls the metadata daily; F-UJI runs against new and updated records; FAIR score is written back to the CRIS record; institutional dashboards consume the score. The total effort is moderate (a sprint of integration work) and the resulting visibility is genuinely useful.

    The funder-mandate angle

    FAIR assessment is increasingly cited explicitly in funder mandates. HORIZON Europe requires FAIR data management; NIH’s 2023 DMS policy uses FAIR language; UKRI references FAIR in its open-research statement. The mandates rarely specify how FAIR is to be assessed, which gives institutions latitude but also creates ambiguity at audit time.

    Our recommendation to institutions is to declare in your data policy which framework you use (the RDA Maturity Model is the safe choice as the common reference), which tooling you operate (F-UJI for automation, FAIR-Aware for researcher guidance, CoreTrustSeal for repository certification), and what your service-level commitment is to researchers depositing data. The reporting back to funders then has a documented basis.

    What to watch in 2026-2027

    The convergence work to watch is the FAIR Implementation Profiles (FIPs) approach, in which a community or institution declares its specific choices for each FAIR principle (which PID system, which metadata schema, which licence, which controlled vocabulary). FIPs are being piloted across EOSC and are likely to become the operational layer between the abstract FAIR principles and the per-dataset assessment. By 2027 we expect FAIR assessment tools to consume FIPs as configuration: “assess this dataset against the GO FAIR Life Sciences FIP” will be a meaningful operation.

    Related dictionary entries

    References

    Wilkinson et al., The FAIR Guiding Principles for scientific data management and stewardship (Scientific Data, 2016). RDA FAIR Data Maturity Model Working Group, final report (2020). Devaraju and Huber, F-UJI: An automated FAIR data assessment tool (FAIRsFAIR / PANGAEA, 2021). CoreTrustSeal Board, Trustworthy Data Repositories Requirements (current version). DANS, FAIR-Aware tool documentation.

  • PRISMA 2026: the next-generation systematic-review reporting standard

    The PRISMA statement has been the dominant reporting standard for systematic reviews and meta-analyses since 2009, with its most recent major revision in 2020. The 2026 update, drafted through 2024 and 2025 and finalised at the end of 2025, adds machine-readability, structured handling of AI-assisted screening, and explicit support for living systematic reviews. This post walks through what changed, why it matters, and what reviewers and journals should do to update their practices.

    What PRISMA 2020 left unresolved

    PRISMA 2020 added much-needed clarity to several persistent ambiguities — the role of registries and protocols, transparent reporting of search strategies, structured presentation of risk-of-bias assessment — but it left several gaps that became more pressing through 2021-2024.

    First, AI-assisted screening. By 2023, a substantial fraction of new systematic reviews used machine-learning tools for title-and-abstract screening (Abstrackr, Rayyan’s ML mode, Covidence’s automation, Distiller’s classification, bespoke models). PRISMA 2020 had no place to report this; reviewers either omitted it, mentioned it in passing, or invented their own reporting conventions. The result was a reproducibility gap: a reader could not tell whether a review had used AI to filter studies, what the AI’s parameters were, or how human checking was integrated.

    Second, living reviews. The conventional systematic review is a snapshot: search to a date, screen, extract, synthesise, publish. A living systematic review is continuously updated as new evidence emerges. PRISMA 2020’s reporting conventions assumed a snapshot model; reviewers running living reviews had to adapt the checklist by hand.

    Third, structured machine-readability. PRISMA 2020 specified what to report but not how to deposit it in structured form. The result was that systematic-review metadata lived as free text in PDFs, unreachable by tools that wanted to aggregate methodological features across reviews.

    What PRISMA 2026 changes

    The 2026 revision is layered: the 27-item core checklist remains, with three items extended and four new items added. The extensions are backward-compatible — a review that satisfies PRISMA 2020 also satisfies the unchanged items of PRISMA 2026 — and the new items are clearly flagged. The full statement is being published in the usual cluster of journals (BMJ, PLOS Medicine, Journal of Clinical Epidemiology, Systematic Reviews) with simultaneous open-access release.

    The AI-screening item

    The new item 8b requires reviewers who used AI or machine-learning tools in study identification, screening, or data extraction to report: the tool used (name and version), its training data or pre-training source, the threshold for AI-flagged inclusion versus human review, the human-checking strategy (full re-screen, sample re-screen, only AI-rejected items), and the integration into the overall workflow with quantitative reporting of agreement rates.

    This is non-trivial reporting and will catch many reviews unprepared. The recommendation from the working group is that AI-screening parameters should be set in the protocol (registered on PROSPERO or an equivalent registry) before screening begins, and that the reporting follow the protocol. A review that decides post hoc to use AI screening without protocol support is on weaker ground for both methodology and reporting.

    The living-review checklist

    PRISMA 2026 adds a parallel reporting checklist for living systematic reviews: items covering the update frequency, the trigger for re-running the search, the handling of new evidence that changes pooled estimates, and the versioning of the published review. The checklist is meant to be applied at each update, with structured logging of what changed between versions.

    For journals publishing living reviews, the implication is that they need an editorial process that supports versioned publication. The BMJ, Cochrane Library, and several others have living-review streams; many others do not, and PRISMA 2026’s existence will push more journals toward supporting the format.

    The machine-readable flow diagram

    The PRISMA flow diagram has been the visual centrepiece of every systematic review since 2009. PRISMA 2026 introduces a structured JSON representation alongside the visual diagram, with the diagram regeneratable from the JSON. The JSON captures: records identified per source, records duplicate-removed, records screened, records excluded with reasons categorised, reports retrieved, reports assessed for eligibility, reports excluded with reasons categorised, studies included, reports of those studies included.

    The structured format means a reader (or a tool) can query the flow programmatically. The intended downstream uses include automated meta-research, evidence-synthesis platforms ingesting reviews at scale, and the construction of multi-review evidence maps from machine-readable inputs. The CASRAI reproducibility domain has begun cataloguing the JSON schema.

    What journals should do

    For journals publishing systematic reviews, three updates are needed. First, update the submission template to ask for PRISMA 2026 compliance (the working group has issued model wording). Second, require deposit of the machine-readable flow diagram JSON alongside the PDF; the BMJ has pioneered this and the model is straightforward. Third, accept registered living-review submissions with a path to versioned publication, even if the current editorial workflow assumes single publication.

    What reviewers should do

    For systematic reviewers, the practical changes are: include PRISMA 2026 compliance in your protocol and pre-register it; if you use AI screening, plan the reporting against item 8b from the outset; produce the flow-diagram JSON during the review (most modern reference-management tools will export it) rather than reconstructing it at write-up; if your review is intended to be living, declare so in the protocol with the update strategy specified.

    EQUATOR Network alignment

    PRISMA 2026 has been developed in close coordination with the EQUATOR Network and is the first major EQUATOR-listed reporting guideline to include both AI-assisted research conduct and machine-readable structured outputs. The expectation is that other EQUATOR guidelines (CONSORT, STROBE, ARRIVE) will follow similar patterns in their next revisions.

    Areas of ongoing debate

    Two questions in the PRISMA 2026 development process were not closed and deserve continued attention. First, the threshold for AI use that triggers item 8b. The current language is “any use of AI or machine learning in study identification, screening, or data extraction.” Some reviewers argued for a higher threshold — only deep-learning-based tools with non-trivial filtering thresholds — while others argued for a lower one — any automation including deduplication. The published version errs toward broader disclosure.

    Second, the scope of the structured output. The flow-diagram JSON is the first machine-readable PRISMA item, but the same logic could apply to the risk-of-bias assessment, the data-extraction sheet, and the synthesis. The working group elected to start small with the flow diagram and expand in future revisions.

    For the CASRAI community, the takeaway is that systematic-review reporting is moving in the direction we have argued for: structured, machine-readable, integrated with the PID infrastructure (review DOIs, protocol DOIs, dataset DOIs), and explicit about modern tooling. The remaining gaps are tractable. PRISMA 2026 is a substantial step.

    Related dictionary entries

  • Data papers, software papers, and the limits of CRediT

    The 14 roles of CRediT were designed against the model of a conventional research article reporting empirical work: a study with a hypothesis, a method, data, analysis, and a written argument. Data papers and software papers fit this model awkwardly. A data paper describes a dataset; a software paper describes a piece of software. The intellectual contribution is the artefact itself, not the prose around it. The CRediT roles, applied to these papers, produce statements that are technically valid but substantively misleading. This post catalogues the friction and suggests where the taxonomy could be extended.

    What a data paper actually is

    A data paper, as the genre has developed in venues like Scientific Data, Earth System Science Data, GigaScience, and the data-paper streams of disciplinary journals, is a peer-reviewed description of a dataset: its provenance, its collection method, its quality, its access conditions, and its potential reuse. The dataset itself lives in a repository with its own DOI; the data paper provides the citable, peer-reviewed scholarly record that the dataset exists, that it was collected with rigour, and that it is fit for reuse.

    The intellectual labour behind a data paper is mostly not in the paper. It is in the years of fieldwork or instrument operation that produced the data, the protocols that ensured comparability across collection events, the curation work that turned raw observations into a structured deposit, the documentation that lets a stranger understand what the data mean. The paper is a summary record of that work.

    Where CRediT falls short for data papers

    Three friction points. First, Investigation and Data curation bear most of the load and they are not differentiated finely enough. A field ecologist who spent years collecting samples, a lab technician who processed them, a data manager who normalised the schema, and a metadata specialist who wrote the documentation are all plausibly Investigation or Data curation; the roles do not distinguish them. The result is that two papers with very different actual contributorship patterns can have identical-looking CRediT statements.

    Second, Resources overlaps with Investigation in a confusing way. A data paper describing a long-term ecological observatory has a Resources contribution (the observatory itself) that is distinct from the per-sample Investigation. CRediT does not currently cleanly separate “provided the infrastructure that produced the data” from “provided the samples that went into the data.”

    Third, Writing – original draft is often the smallest contribution, not the largest, and assigning it Lead can misrepresent the contribution structure. The person who wrote the paper is often a relatively junior team member, not the senior person whose intellectual contribution was the protocol and the multi-year campaign.

    Software papers and the JOSS model

    Software papers, exemplified by the Journal of Open Source Software (JOSS), face an analogous problem from a different direction. A JOSS paper is short — often under 1,000 words — and is paired with a peer-reviewed software repository. The intellectual contribution is the software: its design, its implementation, its tests, its documentation, its maintenance over time. The paper is a stub.

    JOSS itself uses CRediT for its papers and has done so since 2020. The community has converged on a set of mappings:

    • Conceptualization covers software design and architectural decisions.
    • Software covers implementation. This is the central role for most JOSS contributors.
    • Validation covers testing, both unit tests and validation against reference implementations.
    • Methodology covers the algorithmic content, where the software implements a non-trivial method.
    • Writing – original draft covers the paper itself. The README, the developer documentation, and the user docs are also writing work, but they are not the JOSS paper.
    • Supervision covers project leadership; Project administration covers maintenance and coordination.

    The friction in this mapping is that the Software role is overloaded. It conflates the initial implementation, ongoing maintenance, bug-fixing, refactoring, and tooling. A contributor who implemented the core algorithm and a contributor who maintains the CI/CD pipeline both get “Software” with no further distinction. For long-lived software with many contributors over years, the role assignment ends up giving everyone Software (lead/equal/supporting) and the differentiation lives in the GitHub commit history, not in CRediT.

    The FAIR4RS angle

    The FAIR4RS Principles for research software, finalised in 2022, set out what FAIR means for software: findable, accessible, interoperable, reusable. They explicitly acknowledge that software citation needs a richer model than data citation, because software has versions, dependencies, and ongoing development that data typically does not.

    FAIR4RS implies, though does not directly require, a richer contributorship taxonomy for software. The Software Citation Implementation Working Group has been chewing on this for several years. Their working position is that CRediT remains the right vocabulary for software paper contributorship, but that the software repository itself should carry its own contributor metadata using a complementary scheme — typically CITATION.cff with extended fields — that captures the per-version, per-component contributorship that CRediT cannot.

    The mapping problem

    For data papers and software papers, the operational reality is that two parallel records exist: the paper’s CRediT statement and the dataset or software repository’s contributor metadata. They overlap but do not align cleanly. The dataset DOI and software DOI live in DataCite; the paper DOI lives in Crossref; the relations between them are declared in the metadata but not always reciprocally.

    The CASRAI research outputs domain tracks the mapping conventions in current use. Our recommendation, for now, is that data papers and software papers should publish a CRediT statement covering the paper’s contributorship and should additionally publish a richer contributor metadata file with the dataset or software, using CRediT roles plus the disciplinary-specific extensions that have emerged.

    Possible extensions

    Three extensions would meaningfully improve the situation. First, sub-roles within Software: an extended taxonomy with implementation, testing, documentation, maintenance, and integration as sub-roles would give a software paper a more truthful contributorship statement. This work has been drafted by the FORCE11 software citation working group but not formally proposed as a CRediT extension.

    Second, distinguished Investigation roles for data papers: collection, processing, curation, documentation as sub-roles of Investigation and Data curation would let a data paper describe its contributorship more faithfully. The challenge here is keeping the taxonomy usable; an over-elaborate vocabulary loses adoption.

    Third, artefact-level role assignments: the current CRediT statement applies at the paper level. For a paper that describes a dataset and a software package, it might be more useful to have role assignments at the artefact level (paper, dataset, software each get their own statement) with cross-references. This would require schema work in Crossref, DataCite, and ORCID.

    What to do now

    For authors of data papers, the practical advice is: use CRediT for the paper; deposit a complementary contributors.json with the dataset that captures finer-grained roles; cross-reference the two in the related-identifier blocks. For authors of software papers, use CRediT for the paper and CITATION.cff for the repository, with the CFF carrying the rich per-component contributor data. The CASRAI data and software papers guide has worked examples.

    For the CRediT stewardship group, the recommendation is to prioritise the data-paper and software-paper mapping problem in the v2026.3 revision discussion. The friction is real, the workarounds are working but ugly, and the taxonomy will be strengthened by a thoughtful extension.

    Related dictionary entries

  • Paper mills and tortured phrases: the integrity crisis in 2026

    The scholarly-publishing integrity ecosystem ended 2025 with the highest retraction rate ever recorded and the clearest evidence yet that industrial-scale fraud is structurally embedded in the literature. The numbers are sobering: Retraction Watch’s database crossed 60,000 entries in 2025; Hindawi/Wiley alone retracted over 11,000 papers across 2023-2024 following paper-mill detection; the Problematic Paper Screener now flags new manuscripts at a rate that strains journals’ capacity to investigate. This post maps the current threats, the detection tooling that has matured, and the United2Act coordination work that is beginning to produce a coherent industry response.

    Paper mills: the supply side

    A paper mill is a commercial operation that fabricates manuscripts and sells authorship slots on them. The mills emerged at significant scale around 2010-2012, driven by promotion-and-tenure incentives in jurisdictions where publication count is a hard quantitative requirement (early-career clinical researchers in some countries face explicit per-promotion-step publication quotas). The mills industrialised what individual fabrication had done for decades.

    The 2022-2024 Hindawi crisis (Wiley’s acquired open-access portfolio was infiltrated at scale, leading to 11,000+ retractions and the closure of several journals) made the systemic nature visible. The Hindawi pattern was: mill-generated manuscripts submitted to special-issue calls in low-rigour journals, peer-reviewed by mill-affiliated reviewers in coordinated networks, published, and used for career advancement. The breakdown was multifactorial: high-volume special-issue calls without sufficient editorial oversight; reviewer networks that the journal could not detect were coordinated; a financial incentive structure that rewarded throughput.

    The 2024-2025 response was substantial. Wiley shut down the Hindawi brand, retracted at scale, and rebuilt its peer-review controls. Other publishers running similar special-issue programmes audited and tightened. The COPE-led United2Act initiative (United2Act for paper mills, launched 2023) produced industry-wide commitments to detection cooperation, transparent retraction practices, and improved reviewer verification.

    Tortured phrases: a detection lever

    The tortured phrases concept, coined by Guillaume Cabanac and Cyril Labbé in 2021, was a methodological breakthrough. A tortured phrase is a clumsy paraphrasing of a standard technical term, typically introduced by attempting to evade plagiarism detection by automatic word substitution. “Counterfeit consciousness” for “artificial intelligence,” “haphazard backwoods” for “random forests,” “fake neural organization” for “artificial neural network.” Once recognised, tortured phrases are a reliable signal of mill involvement, because no human author working in their field would write “haphazard backwoods” when they meant random forests.

    Cabanac and Labbé’s Problematic Paper Screener (PPS) operationalises tortured-phrase detection at scale. The PPS continuously scans the published literature against a curated dictionary of tortured phrases, flagging papers that contain them. By 2026 the PPS has flagged over 14,000 papers; many have been retracted, more are under investigation, and a substantial subset will likely remain in the literature without action because the journals are unresponsive or defunct.

    The PPS is open infrastructure (the dictionary is public, the methodology is published, the flagged papers are listed). It has been criticised for false positives (some flagged papers turn out to have innocent explanations, e.g., automated translation from a non-English original) but the precision is high enough that an editor receiving a PPS flag should treat it as a serious signal warranting investigation.

    Image manipulation

    The other major detection front is image manipulation, particularly in life-science papers where Western blots, microscopy images, and gel electrophoresis are routinely fabricated by duplication, splicing, or AI generation. Elisabeth Bik’s catalogue of image-duplication cases has been the canonical reference for over a decade. The 2022-2024 development was the deployment of automated image-similarity tools (Imagetwin, Proofig) by major publishers; by 2025 most large publishers run automated image screening on every submission.

    The 2025 escalation is AI-generated images. A diffusion-model-generated Western blot is more difficult to detect than a duplicated one because there is no source to find. The detection community has begun work on AI-generated-image detection but the arms race is genuinely real, with no settled tool. The current best practice is to require raw data deposition (the original blot scan, the unprocessed microscopy stack) alongside the published image, with image-manipulation tools running on both. Several Cell Press and EMBO journals now require this for all life-science submissions.

    Citation cartels

    Citation cartels are coordinated networks of authors who systematically cite each other to inflate their citation counts and journal impact factors. The classic cartel pattern is journal-level: a journal’s editorial board reciprocally cites other journals’ editorial boards, all benefiting from the inflated cross-citation. The author-level pattern is similar: a network of researchers in adjacent specialties cites each other across many papers.

    Detection is statistical: cartels show citation patterns that are sharply non-random in the citation graph. The 2023-2024 work by Albers Mohrman and others operationalised the detection at the journal-citation-network level; Clarivate has begun excluding cartel-implicated journals from the JCR. The author-level cartels are harder to act against, but the existence of the signal is becoming part of the institutional-integrity toolkit.

    The retraction infrastructure

    Retraction has historically been slow, opaque, and inconsistently practiced. The 2022 NISO recommended practice on retraction (NISO RP-45-2022) and the 2024 Crossref retraction-metadata revisions have begun to change this. A retracted paper now carries structured machine-readable metadata about the retraction reason, the implicated parties, and the relationship to other papers; downstream services (PubMed, Google Scholar, Scopus, citation databases) consume the metadata and surface retraction notices alongside the paper.

    The remaining gap is the unretracted-but-suspect paper. A paper flagged by the Problematic Paper Screener but never investigated by the journal sits in the literature unmarked. The 2024 COPE-led discussion of expressions of concern as an interim status (the paper is under investigation but not yet retracted) is one direction. A more radical proposal, now being piloted by several preprint servers and one or two journals, is to surface the PPS flag directly on the article landing page even before the journal acts, with a clear distinction between “flagged by automated screener” and “retracted by publisher.”

    The United2Act response

    United2Act, launched in 2023 with COPE and STM coordinating, brought publishers, researchers, integrity offices, and regulators together to address paper mills. The 2024 United2Act communique committed signatories to: cooperate on detection (sharing reviewer-misconduct signals across publishers); standardise retraction practices; improve reviewer verification; coordinate with institutions on consequences for authors of fabricated papers.

    The 2025 work has been operational: the COPE/STM joint paper-mill database (publishers can submit suspect manuscript signatures and the database flags coincidences); reviewer-verification protocols (ORCID iD plus institutional email plus referee history); coordination with national integrity offices in jurisdictions where paper-mill commissioning is concentrated.

    The honest assessment is that United2Act has bought the industry better coordination but has not solved the structural incentive problem. As long as researchers face quantitative publication requirements for promotion, the demand for fabricated authorship slots will exist. The longer-term fix is on the responsible-assessment side (see our responsible-assessment domain); the integrity-side work is harm reduction.

    COPE flowcharts: the per-case operational layer

    The COPE flowcharts, maintained and updated by the Committee on Publication Ethics, are the operational toolkit for editors handling suspected misconduct. The flowcharts cover (among many) plagiarism in a submitted manuscript, plagiarism in a published article, redundant publication, fabricated data, undisclosed conflict of interest, undisclosed AI use, image manipulation, authorship disputes, paper-mill suspicion, and citation manipulation.

    An editor confronted with a suspect submission in 2026 should pull the relevant COPE flowchart, follow the documented procedure, and document the decision trail. The flowcharts are not a substitute for editorial judgement, but they are an audit-defensible baseline. The 2024-2025 COPE updates added flowcharts specifically for AI-assisted fabrication, paper-mill suspicion based on tortured-phrase detection, and image-manipulation findings from automated tools.

    What to do at the institutional level

    For an institutional research-integrity office in 2026, the practical priorities are: (1) monitor your own institution’s authors against the PPS and the Retraction Watch database; (2) integrate retraction-metadata feeds into your CRIS so you can detect when your authors’ papers are retracted elsewhere; (3) participate in United2Act or its national-level analogues; (4) commit publicly to following COPE flowcharts and document decisions; (5) work with your promotion-and-tenure committees to remove the pure-count incentives that fuel the demand side. The research-integrity domain at CASRAI maintains the institutional-integrity playbooks.

    Related dictionary entries

    References

    Cabanac, Labbé, Magazinov, Tortured phrases: A dubious writing style emerging in science (2021 preprint and follow-up papers). Bik et al., The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications (mBio, 2016). Else and Van Noorden, The fight against fake-paper factories that churn out sham science (Nature, 2021). COPE, Paper mills – research, action plans, and resources (2023, updated 2024). United2Act, Joint Communique on Paper Mills (2023).