Tag: agreement on reforming research assessment

  • Leiden Manifesto Checklist for Research Offices

    The Leiden Manifesto for Research Metrics sets out ten principles, published as a comment in Nature in 2015, for the responsible use of quantitative indicators in research evaluation. Research offices can convert each principle into a direct audit question, testing whether KPI dashboards, promotion criteria and grant-review rubrics rely on a single metric, ignore field norms, or substitute for qualitative judgement.

    The Leiden Manifesto for Research Metrics is a ten-principle framework for the responsible use of bibliometric and other quantitative indicators in evaluating research, published by Diana Hicks, Paul Wouters, Ludo Waltman, Sarah de Rijcke and Ismael Rafols in Nature on 22 April 2015. It was formulated at the 19th International Conference on Science and Technology Indicators, held in Leiden, the Netherlands, in September 2014, and has since been cited more than 4,000 times, according to Google Scholar’s tracking of the original paper.

    What is the Leiden Manifesto for Research Metrics?

    The Leiden Manifesto is a response to what its authors called “impact-factor obsession” — the tendency of universities, funders and promotion committees to substitute a single number for expert judgement. It does not ban metrics. It requires that quantitative indicators support, rather than replace, informed peer assessment of research quality.

    The manifesto’s home institution is the Centre for Science and Technology Studies (CWTS) at Leiden University, where co-author Paul Wouters served as director. CWTS also produces the CWTS Leiden Ranking, a separate bibliometrics-based university ranking — a distinction research offices should not conflate when citing the source.

    What are the ten principles of the Leiden Manifesto?

    Each principle addresses a specific failure mode observed in metric-driven research assessment. The table below states each principle exactly as published, alongside the practical audit question a research office should ask of its own KPI or promotion framework.

    # Principle (Hicks et al., 2015) Audit question for your office
    1 Quantitative evaluation should support qualitative, expert assessment Does any committee decision rest on a metric alone, with no narrative peer input?
    2 Measure performance against the research missions of the institution, group or researcher Are KPIs generic, or tailored to the unit’s stated mission (teaching-intensive, applied, translational)?
    3 Protect excellence in locally relevant research Does the framework penalise work published in non-English or regionally focused outlets?
    4 Keep data collection and analytical processes open, transparent and simple Can an academic reproduce their own score from publicly documented methodology?
    5 Allow those evaluated to verify data and analysis Is there a formal, timely route to challenge or correct metric data before a decision is made?
    6 Account for variation by field in publication and citation practices Are raw citation counts compared across disciplines without field normalisation?
    7 Base assessment of individual researchers on a qualitative judgement of their portfolio Does promotion criteria require a portfolio narrative, or just an h-index threshold?
    8 Avoid misplaced concreteness and false precision Are decimal-point differences in impact factor or citation rate treated as meaningful?
    9 Recognise the systemic effects of assessment and indicators Has the office assessed whether its KPIs create incentives to game submission counts or venues?
    10 Scrutinise indicators regularly and update them Is there a scheduled review cycle for the KPI framework itself, not just for scores against it?

    How can a research office audit its KPI and promotion framework against it?

    Running the manifesto as a live audit tool means working through each principle against real artefacts: the appraisal form, the promotion rubric, and the departmental dashboard.

    1. Mark every clause in the promotion/tenure criteria naming a specific metric (impact factor, h-index, citation count).
    2. Check each marked clause has a qualitative narrative requirement alongside it (Principles 1 and 7).
    3. Confirm KPI targets are set per unit mission, not copied institution-wide (Principle 2).
    4. Check non-English-language or applied outputs score on the same scale as high-impact-journal outputs (Principle 3).
    5. Verify each dashboard metric’s data source and calculation method is documented and accessible (Principles 4 and 5).
    6. Confirm citation indicators are field-normalised, not raw counts compared across disciplines (Principle 6).
    7. Look for false precision — ranking staff by two-decimal citation averages (Principle 8).
    8. Ask whether the KPI framework has driven any unintended behaviour, such as salami-slicing publications or discouraging risky research (Principle 9).
    9. Set a fixed review date for the framework itself, independent of individual appraisal cycles (Principle 10).

    A framework that fails more than two or three of these checks is not aligned with the manifesto, regardless of how sophisticated its dashboard software looks. The most common failure in practice is Principle 6: comparing raw citation counts across a mathematics department and a cell biology department, where top-ranked mathematics journals carry impact factors around 3 while top-ranked cell biology journals carry impact factors around 30 — a field-scale gap the manifesto’s authors cite directly as evidence that uncorrected cross-field comparison is meaningless.

    How does the Leiden Manifesto compare with DORA and CoARA?

    The Leiden Manifesto did not appear in isolation. The 2013 San Francisco Declaration on Research Assessment (DORA) preceded it, while the Coalition for Advancing Research Assessment (CoARA) has since built a sector-wide agreement on reforming assessment practice. Research offices are frequently asked which one to adopt.

    Framework Published Format Primary focus
    Leiden Manifesto 22 April 2015 (Nature comment) 10 principles Correct use of quantitative indicators across disciplines and settings
    DORA 2013 (San Francisco Declaration) General recommendations + signatory pledge Eliminating journal impact factor as a proxy for article or researcher quality
    CoARA 2022 (Agreement on Reforming Research Assessment) Institutional commitment agreement Sector-wide reform of hiring, promotion and funding assessment criteria

    DORA has been signed by more than 27,000 individuals and organisations, according to DORA’s own published tally as of March 2026, making it the higher-profile pledge. But when Loughborough University’s LIS-Bibliometrics committee chose a framework for its own policy in 2018, policy manager Elizabeth Gadd selected the Leiden Manifesto because it takes a “broader approach to the responsible use of all bibliometrics across a range of disciplines and settings” — not only journal-level metrics. Elsevier separately announced on 14 July 2020 that it would use the manifesto’s principles to guide its CiteScore methodology.

    In the UK, the independently commissioned Metric Tide review (2015), led by James Wilsdon for the then Higher Education Funding Council for England, reached compatible conclusions and recommended metrics support, not replace, peer review within the research administration processes underpinning the Research Excellence Framework. A research office building a REF-adjacent KPI policy should treat the two as aligned, not competing, references.

    Common questions and what comes next for research offices

    Who wrote the Leiden Manifesto for Research Metrics?

    The manifesto was written by Diana Hicks, professor of public policy at Georgia Institute of Technology, and Paul Wouters, then director of CWTS at Leiden University, together with co-authors Ludo Waltman, Sarah de Rijcke and Ismael Rafols. It was published as a comment in Nature, volume 520, on 22 April 2015.

    Does the Leiden Manifesto ban the use of bibliometrics tools?

    No. The manifesto does not prohibit bibliometrics tools such as Web of Science, Scopus or Dimensions. It requires that any output from these tools — citation counts, h-indices, journal metrics — be interpreted alongside qualitative expert review and adjusted for field-specific citation norms before it informs a decision.

    Why does the importance of bibliometrics remain contested?

    Bibliometrics matter because they scale evaluation across thousands of researchers where individual peer review is impractical. The contested part is misuse: treating a single indicator as an objective proxy for quality, rather than one input alongside portfolio review, mission fit and field context, as the manifesto’s ten principles specify.

    How often should a research office review its KPI framework under the manifesto?

    Principle 10 requires indicators to be “scrutinised regularly and updated,” but sets no fixed interval. Good institutional practice, reflected in library and research-office guidance built on the manifesto, is an annual technical review of data sources plus a full policy review on the same three-to-five-year cycle as promotion-criteria revisions.

    The Leiden Manifesto’s ten principles were written as durable evaluation ethics, not a one-time compliance exercise. As institutions layer AI-assisted analytics, altmetrics and funder-mandated open-data reporting onto existing KPI frameworks, the manifesto’s core requirement — that quantitative evaluation support, not replace, expert judgement — becomes harder to satisfy by default and more important to audit deliberately. Research offices that build the checklist above into their annual promotion-criteria review cycle, rather than treating the manifesto as background reading, are the ones actually applying it.

  • PlumX vs Altmetrics: Compare Coverage Gaps

    PlumX Metrics and Altmetrics both track online attention to research outputs, but they are not interchangeable: PlumX organises data into five uncombined categories (Citations, Usage, Captures, Mentions, Social Media), while Altmetric.com aggregates sources into a single weighted Altmetric Attention Score. Choosing between PlumX vs Altmetrics for assessment therefore depends on whether an institution needs a granular breakdown or a single comparable number — and on disclosing what each tool does not cover.

    Altmetrics, as a category, is the collective term for non-citation indicators of research attention — social media mentions, news coverage, policy citations, blog posts, and readership counts — used alongside, not instead of, traditional bibliometrics.

    What is PlumX Metrics?

    PlumX Metrics is an altmetrics service developed by Plum Analytics, now owned by Elsevier and embedded directly into Scopus and SciVal article records. It does not produce a single composite score. Instead, it sorts attention data into five discrete categories: Citations, Usage, Captures, Mentions, and Social Media, displayed as a five-segment “Plum Print” whose circle sizes scale with activity in each bucket.

    The University of Waterloo Library notes that PlumX “deliberately does not aggregate their altmetric data sources into a single score” — a design choice that keeps categories separable but makes cross-article ranking harder than with a single number.

    What are Altmetrics and the Altmetric Attention Score?

    Altmetric.com, part of Digital Science, is the best-known commercial provider of the broader “altmetrics” concept. It compresses attention data from news outlets, blogs, policy documents, X/Twitter, and other sources into a single weighted number — the Altmetric Attention Score — visualised as a multicoloured “donut” where each segment represents a source type.

    This single-score design makes Altmetric easier to sort and benchmark at scale across large publication sets, which is why publisher platforms including Wiley and Springer Nature embed the Altmetric badge directly on article pages.

    PlumX vs Altmetrics: data sources and categories compared

    A 2024 study in Learned Publishing by Rasuli directly tested coverage differences between the two tools and found neither is universally superior: Altmetric.com had the strongest coverage of blogs, news articles, and X/Twitter mentions, while PlumX showed better coverage of Mendeley reader counts. That finding — a real, citable data point — is the clearest evidence that the two tools are complementary, not substitutable.

    Dimension PlumX Metrics Altmetric (Attention Score)
    Owner Plum Analytics, part of Elsevier Altmetric.com, part of Digital Science
    Category structure Five uncombined categories: Citations, Usage, Captures, Mentions, Social Media Single weighted score plus a source-level breakdown
    Composite score None — a five-category “Plum Print” visual Yes — the Altmetric Attention Score, one number
    Documented strength (Rasuli, 2024) Mendeley reader counts Blogs, news, and X/Twitter mentions
    Primary institutional integration Scopus and SciVal article records Altmetric Explorer; publisher platforms (Wiley, Springer Nature)

    Both tools also include a distinct Citations dimension in their scope: PlumX’s Citations category explicitly folds in Scopus citation counts alongside patent, clinical, and policy citations, while Altmetric treats citation data as a separate, secondary layer rather than a core category.

    What coverage gaps should institutions disclose?

    Neither tool captures the full universe of research attention, and both have known blind spots that assessment reports should state explicitly rather than imply away:

    • Platform coverage is uneven. The Rasuli (2024) comparison shows each tool systematically under-represents sources the other captures better, so a low score on one platform does not mean low attention overall.
    • Absence of a score is not absence of impact. An article with no PlumX or Altmetric activity may simply lack a DOI-linked record, an institutional repository deposit, or public discussion in the tracked window — not lack of scholarly value.
    • Composite scores obscure source mix. A single Altmetric Attention Score can be driven almost entirely by one viral social post; disclosure should note the underlying source breakdown, not just the headline number.
    • Gaming and reproducibility risk. NISO’s Recommended Practice RP-25-2016, the output of its Alternative Assessment Metrics initiative, explicitly flags data-quality, persistent-identifier, and manipulation-resistance requirements that altmetrics providers and institutions using them should meet.
    • Metrics indicate attention, not quality. INORMS’s SCOPE framework for responsible research evaluation stresses that any metric — including altmetrics — should be interpreted only against the specific purpose it was chosen to serve, not treated as a proxy for research quality.

    Research administrators compiling assessment dossiers should state which tool was used, the date the score was pulled, and which categories were included — omitting this context is the most common disclosure failure institutions make when citing either platform.

    Answer-first Q&A

    Is Altmetric reliable?

    Altmetric is reliable as an indicator of online attention, not as a quality measure. Because it harvests data from many external, non-standardised sources, coverage varies by discipline and platform, so scores should be interpreted alongside citation data rather than in isolation, per NISO’s altmetrics recommended practice.

    What is the difference between altmetrics and bibliometrics?

    Bibliometrics measure scholarly interest through formal citation counts in indexed literature, while altmetrics track online engagement — downloads, mentions, shares, and discussion — across academic and public channels. The two measure different things and are designed to complement, not replace, each other.

    Is PlumX Metrics free?

    The PlumX artifact widget is free for any digital object with a DOI and can be embedded on repository or publication pages at no cost. Full institutional dashboards and analytics through Scopus/SciVal, however, require a paid Elsevier subscription.

    What is the difference between Altmetric and PlumX?

    Altmetric compresses attention into one weighted Attention Score with a donut visual, while PlumX keeps five categories separate in a Plum Print graphic with no combined number. The practical difference is aggregation: one number for quick ranking versus five categories for granular review.

    Implications for research assessment

    As institutions build multi-metric assessment dashboards, the choice is rarely PlumX or Altmetric — most research-intensive universities license both, because Scopus-indexed institutions already have PlumX embedded and many also subscribe to Altmetric Explorer for its stronger media and policy tracking. What matters for defensible assessment practice is documenting scope: which categories were pulled, on what date, and which known coverage gaps apply.

    Frameworks such as INORMS’s SCOPE model give research administration teams a structure for that documentation, tying metric choice back to the specific evaluative purpose rather than treating either tool’s output as a self-evident ranking. Consult the CASRAI Dictionary for definitions of related terms such as citation, altmetrics, and bibliometrics when drafting assessment policy language.

  • eLife BioRxiv Model: Review After Posting Changes Peer Review

    eLife biorxiv review works in reverse order to a conventional journal: the paper is posted publicly on bioRxiv first, and eLife’s editors and reviewers evaluate it only after it is already visible to the world, publishing the result as a “Reviewed Preprint” rather than issuing an accept-or-reject verdict.

    A Reviewed Preprint is a bioRxiv or medRxiv manuscript that has been through eLife’s editorial and peer-review process and is published, alongside public reviews and an eLife Assessment, without a binary publication decision attached to it.

    What Is eLife’s Preprint-Only Review Model?

    eLife requires every submission to already exist as a preprint, typically on bioRxiv or medRxiv, before its editors will consider it. Editors — themselves active researchers — screen incoming preprints and select a subset for full review. In 2023, eLife formalised this into its Publish, Review, Curate model, removing the accept/reject gate entirely: any preprint that goes through full review is published as a Reviewed Preprint, regardless of how favourable the assessment turns out to be.

    This inverts the journal’s traditional role. Instead of deciding whether a paper reaches readers, eLife’s reviewers now decide how a paper readers can already see should be interpreted, through a public review and a standardised eLife Assessment describing the significance of the findings and the strength of the evidence.

    How Does eLife Review a Preprint Already on bioRxiv?

    The workflow eLife uses is consultative rather than adversarial, and it produces a single, consolidated verdict rather than several disconnected reviewer reports. In practice it runs through six stages:

    1. The author posts the manuscript to bioRxiv or medRxiv as a preprint.
    2. The author submits the same preprint to eLife for consideration.
    3. A reviewing editor screens the preprint and decides whether to send it for full review; many submissions are declined at this stage.
    4. Two or three external reviewers and the editor hold a consultative discussion to produce one consolidated set of comments rather than separate, sometimes-conflicting reports, with authorship and contribution details carried over from the original preprint.
    5. eLife publishes the preprint together with the public reviews and an eLife Assessment as a Reviewed Preprint.
    6. The author chooses whether, and when, to revise the work, resubmit it for further review, or declare it a Version of Record.

    This builds on a service eLife had already run since May 2020, when it launched “Preprint Review” to bring peer review to manuscripts already on bioRxiv, and on a submission pathway available since 2017 that let authors upload to bioRxiv while submitting to eLife in parallel.

    How Does This Differ From Traditional Pre-Publication Peer Review?

    The core difference is sequencing: in a conventional journal, review happens before the public ever sees the manuscript, and the outcome of that review is a gatekeeping decision. In eLife’s model, the manuscript is already public, and review adds an evaluative layer on top of it rather than deciding whether it exists at all.

    Feature eLife’s model Traditional pre-publication review
    Timing Publish first, review second Review first, publish only if accepted
    Outcome No accept/reject; all reviewed work is published as a Reviewed Preprint Binary accept/reject decision
    Transparency Reviews and eLife Assessment published openly Reviewer identities and comments usually confidential
    Author control Author decides when to revise or declare a Version of Record Author must satisfy editor/reviewers to be published at all
    Unit of evaluation Article-level assessment Journal-level acceptance, often read as a proxy for quality

    The trade-off is real, not just structural. Because Clarivate’s Journal Impact Factor methodology requires an indexed journal to publish only papers that editors have formally validated as acceptable, eLife’s decision to publish every reviewed preprint — regardless of the assessment’s verdict — led Clarivate to discontinue eLife’s Journal Impact Factor from its 2025 Journal Citation Reports release, ending a metric that had stood at 6.4.

    Where Does bioRxiv Fit Among Preprint Servers?

    bioRxiv (pronounced “bio-archive”) is a free preprint server for the life sciences, operated by openRxiv, a nonprofit dedicated to advancing scientific communication. It sits within a wider ecosystem of subject-specific preprint servers, several of which are frequently confused with one another or with journal-run review platforms such as Research Square’s In Review.

    Server Field Screening model
    bioRxiv Life sciences Basic screening only; operated by nonprofit openRxiv
    medRxiv Health sciences / clinical Additional screening for clinical risk; also run by openRxiv
    arXiv Physics, maths, computer science Moderated but not peer-reviewed; run by Cornell University
    Research Square Multidisciplinary Preprint posting plus optional “In Review” integrated peer review, tied to Springer Nature journals
    SSRN Social sciences, economics, law Basic screening; owned by Elsevier
    ChemRxiv Chemistry Basic screening; run by chemical societies

    The distinction that matters for the “biorxiv or arxiv” question is disciplinary scope, not rigour: arXiv predates bioRxiv by more than two decades and serves physical sciences, while bioRxiv (launched 2013) was purpose-built for biology. Neither performs peer review itself — that is precisely the gap eLife’s model was designed to fill for bioRxiv content.

    What Does This Mean for Research Administrators and Institutions?

    For research administration offices, the practical question is no longer whether a preprint has been reviewed, but whether assessment, promotion, and funding-reporting processes recognise a Reviewed Preprint as equivalent to a conventional accepted article. That question is not yet uniformly answered.

    • The US National Institutes of Health has permitted preprints to be cited in grant applications and biosketches since 2017, establishing precedent that funders can recognise unpublished-but-posted work.
    • eLife reports that a growing number of funders now explicitly recognise Reviewed Preprints, rather than only the eventual Version of Record, in research assessment.
    • Institutions signed to the San Francisco Declaration on Research Assessment (DORA) already commit to evaluating research on its own merits rather than journal-level metrics — directly compatible with article-level eLife Assessments, since Clarivate no longer supplies a journal Impact Factor to fall back on.
    • Research administrators handling REF-style exercises, tenure dossiers, or grant reports need local guidance on whether the Reviewed Preprint, the eLife Assessment, or the Version of Record is the citable unit — under the 2023 model, all three can exist for one piece of work, each with its own DOI in a single version log.

    A data point often missing from commentary on the model: a 2019 eLife study by Abdill and Blekhman tracking bioRxiv preprint outcomes found eLife published almost as many bioRxiv preprints (394) in 2018 as any other single journal — over a third of its 1,172 articles that year — years before the 2023 model made this the default route.

    Common Questions About eLife and bioRxiv

    Is eLife a preprint?

    No. eLife is a journal, not a preprint server. It reviews manuscripts that authors have already posted as preprints on bioRxiv or medRxiv and publishes the result as a Reviewed Preprint — the preprint plus public reviews and an eLife Assessment, distinct from the original unreviewed posting.

    What is bioRxiv used for?

    bioRxiv is used to share life-sciences research immediately, before or independent of journal peer review. Researchers post manuscripts to establish priority, gather early feedback, and make findings available while formal review — at eLife or elsewhere — is still under way, sometimes for months.

    Why did eLife lose its impact factor?

    Clarivate discontinued eLife’s Journal Impact Factor because eLife now publishes every peer-reviewed submission as a Reviewed Preprint regardless of the review outcome, rather than issuing conventional accept/reject decisions. Clarivate’s indexing rules require journals to publish only editorially validated papers, so eLife’s model fell outside that requirement from the 2025 Journal Citation Reports release.

    Is eLife a high-impact journal?

    eLife’s citation performance was historically strong — its last Journal Impact Factor was 6.4 — but it no longer carries a Clarivate-assigned Impact Factor. Its standing is now judged through article-level eLife Assessments and public reviews rather than a single journal-wide citation metric.

    As more funders and institutions formalise how they treat Reviewed Preprints, Public Reviews, and eLife Assessments in research assessment, eLife’s model looks less like an isolated experiment and more like an early test case for peer review as a layer added on top of open preprints, rather than a gate placed in front of them. Research offices that decide this now — before it becomes a routine dossier question — will have a real advantage over those that wait for a funder mandate to force the issue.

  • Research Assessment Reform: Why Collective Action Beats Solo Signatories

    Research assessment reform needs collective action because hiring, promotion and funding criteria are set independently by thousands of institutions — a single university dropping journal-based metrics gains nothing if every competing institution, funder and publisher still rewards them. Recent research-on-research literature frames this explicitly as a collective action problem: individual declarations such as DORA signal intent, but only coordinated, system-wide commitments — the model CoARA is built around — actually rewrite the incentives that determine careers.

    A collective action problem in research assessment is a situation where no single institution can achieve reform on its own without risking a competitive disadvantage, so change only happens when many actors move together under a shared, verifiable commitment.

    What Is the Collective Action Problem in Research Assessment Reform?

    A 2025 study in Minerva by sociologist Alexander Rushforth, “Research Assessment Reform as Collective Action Problem,” argues that research evaluation change cannot be reduced to individual institutional choice. Rushforth traces this through the Netherlands’ national “Recognition and Rewards” initiative, formally launched in 2019 to coordinate system-wide changes in assessment practice across the Dutch science system.

    The framing matters because it shifts the diagnosis. If assessment culture were simply a matter of institutional willpower, a DORA signature would be sufficient. If it is instead a coordination failure — where no actor can safely move first — then reform requires simultaneous, mutually reinforcing commitments from institutions, funders and publishers together.

    Why Doesn’t an Individual DORA Signature Change Hiring Criteria?

    The San Francisco Declaration on Research Assessment (DORA), launched in 2012, asks signatories to stop using journal-based metrics such as the Journal Impact Factor as a proxy for the quality of individual articles or researchers. Signing carries no binding enforcement mechanism, and DORA itself has long acknowledged that the harder work begins after signature — its 2019 guidance “You’ve signed DORA, now what?” explicitly frames hiring, promotion and funding criteria as the next, unfinished step.

    Two structural problems keep that step unfinished when institutions act alone:

    • First-mover risk. An institution that stops counting journal prestige in tenure review can be undercut in recruitment and rankings by peers who have not changed, because researcher CVs are still read against metric-based expectations elsewhere.
    • Interoperability failure. Where assessment criteria diverge sharply between institutions and countries, researcher mobility suffers — a candidate assessed holistically at one university may be filtered out by a metrics-based shortlist at the next.

    Neither problem is solved by any single signature. Both require peer institutions, funders and disciplinary societies to move on a broadly shared timetable.

    How Does CoARA’s Coordinated Model Differ From Individual Declarations?

    The Coalition for Advancing Research Assessment (CoARA) was formed around the Agreement on Reforming Research Assessment, which the European Commission signed and endorsed alongside DORA on 8 November 2022. Unlike a one-off declaration, CoARA requires member organisations to commit to a shared action plan with defined milestones, reported progress and working groups that develop common tools and criteria across institutions — moving assessment reform from individual pledge to managed, collective process.

    That coordination logic was reinforced on 4 December 2025, when CoARA and DORA released a joint statement on aligning their respective reform efforts rather than running parallel, uncoordinated campaigns. Science Europe’s April 2026 position statement, “Connecting Open Science and Research Assessment Reform,” makes the same point from the funder side: it treats open science and assessment reform as “mutually reinforcing and interdependent drivers of research cultures,” explicitly a multi-actor framing rather than an institution-by-institution one.

    Dimension Individual DORA signature Coordinated (CoARA-style) commitment
    Enforcement None — declaration of intent only Action plan with milestones and reporting
    Hiring/promotion criteria Left to each institution’s own timetable Shared working groups developing common criteria
    Competitive risk to first movers High — one institution changes alone Reduced — peers move on a shared timetable
    Researcher mobility Fragmented across institutions/countries Greater interoperability of criteria sought

    What Does the Dutch “Recognition and Rewards” Case Show?

    Rushforth’s analysis of Recognition and Rewards found that the initiative succeeded in uniting support from multiple influential national stakeholders — universities, funders and academic hospitals moving together — precisely because it was designed as a coordinated, system-wide commitment rather than a set of separate institutional pledges. It also documents genuine friction: critics raised concerns about the Netherlands “going it alone” internationally, illustrating that collective action problems exist at more than one level simultaneously — within a national system, and between that system and the rest of the world.

    The OECD’s April 2026 report “Reforming Research Assessment for Better Science” reaches a parallel conclusion at the international level, describing the current reform landscape as “a collective of organisations committed to reforming the assessment of research, researchers, and research organisations” — language that treats coordination, not individual compliance, as the operative unit of change.

    Frequently Asked Questions

    Does Signing DORA Actually Change University Hiring Practices?

    Not by itself. DORA’s own post-signature guidance states that hiring, promotion and funding decisions require separate, deliberate policy changes after signature. A signature is a public commitment; rewritten criteria documents, reviewed by hiring and promotion committees, are the actual evidence of change.

    What Is CoARA and How Does It Differ From DORA?

    CoARA is a coalition of research funders, institutions, and organisations built around the 2022 Agreement on Reforming Research Assessment. Unlike DORA’s single declaration, CoARA members commit to shared action plans, working groups and reported milestones — a coordination structure rather than a one-time pledge.

    Why Is Research Assessment Reform Described as a Collective Action Problem?

    Because no institution can safely change its own assessment criteria in isolation without risking a competitive disadvantage in recruitment and rankings. Research-on-research literature, including Rushforth’s 2025 Minerva study, argues reform requires simultaneous, coordinated commitments across many independent actors.

    Can One University Move Away From Metrics Without Being Disadvantaged?

    It can, but the Netherlands’ Recognition and Rewards case shows even a coordinated national effort faced criticism for “going it alone” relative to the rest of the world. A single institution acting without peer, funder and publisher alignment faces materially higher exposure to that same risk.

    What Should Institutions Actually Do Together?

    For research administration teams, the practical implication of the collective-action framing is direct: a DORA or CoARA signature belongs on a compliance checklist next to, not instead of, three coordination-dependent actions.

    1. Confirm hiring and promotion criteria documents have actually been rewritten, not merely a signature logged in a registry.
    2. Compare criteria against peer institutions in the same discipline and country to identify where first-mover risk is concentrated.
    3. Engage through CoARA working groups or equivalent sector bodies (ARMA, EARMA, INORMS) rather than drafting new criteria in isolation.

    Reform that stops at the signature stage produces a compliance artefact, not a changed incentive structure. The evidence from both the Dutch national case and the CoARA-DORA coordination model points the same way: assessment reform moves at the speed of the slowest coordinated group, not the fastest individual signatory. Institutions that treat their own criteria rewrite as contingent on parallel movement by peers, funders and publishers are following the pattern the research-on-research literature identifies as actually working — treating reform as a shared infrastructure problem, not a personal compliance decision.