Tag: research assessment

  • DORA and Responsible Research Assessment

    Responsible research assessment is the principle that researchers and their work should be evaluated on their actual content, quality and contribution, rather than on proxy metrics such as the prestige of the journals in which they publish. Over the past decade a series of declarations and coalitions has turned this principle into a coordinated reform movement, reshaping how universities, funders and publishers make decisions about hiring, promotion and grants.

    The core argument is straightforward: numbers derived from where research appears are a weak substitute for judging what the research actually achieves.

    Why proxies became a problem

    For years, journal-level indicators such as the Journal Impact Factor were used as shorthand for individual quality, despite being designed to compare journals rather than people. This created perverse incentives, encouraging researchers to chase prestigious venues, discouraging the publication of negative or replication results, and disadvantaging valuable outputs such as datasets, software and public engagement that no single metric captures. The flaws of journal metrics are detailed in our explainer on the Journal Impact Factor and its critique.

    The key frameworks

    Four landmark initiatives anchor the responsible assessment landscape.

    Framework Form Central contribution
    DORA Declaration Recommends not using journal metrics to assess individual articles or researchers
    Leiden Manifesto Set of principles Ten principles for responsible use of quantitative metrics
    The Metric Tide Review report Framework for responsible metrics in research evaluation
    CoARA Coalition and agreement Commitments to reform assessment, prioritising qualitative judgement

    DORA

    The San Francisco Declaration on Research Assessment recommends that journal-based metrics not be used as a surrogate for the quality of individual research articles or to assess an individual scientist’s contributions. It calls on institutions and funders to be explicit about the criteria used in decisions and to value all research outputs, not just publications.

    The Leiden Manifesto

    The Leiden Manifesto for research metrics sets out ten principles, including that quantitative evaluation should support rather than replace expert assessment, that performance should be measured against the research missions of the institution or researcher, and that indicators should be kept transparent and regularly scrutinised for their effects.

    The Metric Tide

    The Metric Tide was an independent review of the role of metrics in research assessment. It introduced a framework for responsible metrics, emphasising robustness, humility, transparency, diversity of measures and reflexivity, and warned against allowing indicators to drive behaviour in unintended ways.

    CoARA

    The Coalition for Advancing Research Assessment (CoARA) brings organisations together around a shared agreement to reform assessment practices. Its commitments include recognising the diversity of research contributions and careers, basing assessment primarily on qualitative judgement supported by responsible use of quantitative indicators, and moving away from inappropriate uses of journal- and publication-based metrics.

    From proxies to assessing the research itself

    The practical shift these frameworks encourage is towards reading and judging the work. This means convening expert panels, considering a broad range of outputs and activities, and recognising contributions such as mentorship, data sharing, open-science practice and team science. It aligns with the contributor-focused approach in our CRediT contributor roles guidance, which makes individual contributions to a project visible rather than collapsing them into a single author list.

    Narrative CVs

    One of the most visible products of this movement is the narrative CV. Instead of listing publications ranked by journal prestige, a narrative CV asks researchers to describe their contributions in prose: the significance of their work, their role in collaborations, their contributions to the research community, and their support for others. This format is designed to surface the kinds of value that metrics miss, and to let evaluators assess substance over volume. Narrative formats are a recurring theme across our responsible assessment coverage and reflect the same priorities promoted by CoARA and DORA. Definitions of the key terms are maintained in our standards dictionary.

    Frequently asked questions

    What is responsible research assessment?

    It is the practice of evaluating research and researchers on the actual quality, content and contribution of their work, supported by responsible use of metrics, rather than relying on journal-based proxies such as impact factors.

    What does DORA actually ask institutions to do?

    DORA asks institutions and funders to stop using journal metrics as a proxy for individual quality, to be transparent about assessment criteria, and to value the full range of research outputs rather than publications alone.

    What is a narrative CV?

    A narrative CV is a structured prose account of a researcher’s contributions and their significance, replacing a metrics-ranked publication list so that evaluators can judge substance, role and wider impact.

    How do the Leiden Manifesto and the Metric Tide relate to DORA?

    They are complementary. DORA targets the misuse of journal metrics, the Leiden Manifesto offers principles for using metrics responsibly, and the Metric Tide provides a framework for responsible metrics; CoARA gathers these commitments into a shared reform agreement.

  • Responsible metrics: the Leiden Manifesto and the Metric Tide in practice

    Metrics are seductive because they are simple. A single number — a journal’s impact factor, a researcher’s h-index, a citation count — promises to compress the messy, qualitative business of judging research into something fast, comparable and apparently objective. And metrics are dangerous for exactly the same reason: their simplicity hides what they leave out, and their apparent objectivity lends unearned authority to comparisons they cannot really support. The response to this tension has not been to abolish metrics but to use them responsibly — to let quantitative indicators inform expert judgement rather than replace it. Two landmark statements from 2015, the Leiden Manifesto and The Metric Tide, set out what responsible use looks like. This article examines both and how they translate into practice, drawing on the responsible assessment domain of the CASRAI Dictionary.

    The Leiden Manifesto

    The Leiden Manifesto for research metrics, published in 2015, offers ten principles for the responsible use of quantitative indicators. Several of its themes recur throughout the responsible-metrics movement and are worth drawing out. It insists that quantitative evaluation should support, not supplant, qualitative expert assessment — metrics inform judgement; they do not make it. It warns against measuring performance against inappropriate or generic benchmarks, urging that assessment account for the mission and context of the research. It calls for transparency in the data and methods behind any indicator, so that those being assessed can understand and scrutinise how they are judged. It highlights the importance of accounting for variation between fields, since citation behaviour differs enormously across disciplines and naive comparison across them is meaningless. And it cautions against the distortions metrics produce when they become targets — the well-known problem that an indicator, once it is what people are rewarded for, stops measuring what it was meant to.

    The Metric Tide

    Published the same year, The Metric Tide was an independent review of the role of metrics in research assessment, conducted in the United Kingdom. Its central contribution was the concept of responsible metrics, defined through a set of dimensions that have become a common reference point:

    • Robustness — basing indicators on the best available, accurate data.
    • Humility — recognising that quantitative evaluation should support, not supplant, expert assessment.
    • Transparency — keeping data collection and analytical processes open to scrutiny.
    • Diversity — accounting for variation by field and using a range of indicators to reflect the plurality of research.
    • Reflexivity — recognising and anticipating the systemic effects of indicators and updating them in response.

    The review was notably sceptical of reducing assessment to single numbers and emphasised that metrics work best as a complement to peer review, not a substitute for it. Its framing of responsible metrics as a set of dimensions to be designed for, rather than a checklist to be passed, has proved durable.

    What the two have in common

    Read together, the Leiden Manifesto and The Metric Tide converge on a consistent message. Metrics are useful but partial; they must be transparent so they can be questioned; they must respect disciplinary difference; they must be used with humility alongside expert judgement; and their users must stay alert to the behaviour they induce, because any metric that becomes a target will eventually be gamed or will distort the work it was meant to measure. Neither document is anti-metric. Both are against the misuse of metrics — against the false precision of a single number standing in for a considered judgement about the quality and significance of research.

    From principle to practice

    Translating these principles into institutional practice means concrete commitments: assessing research on its own merits rather than on the prestige of its publication venue, using a basket of indicators rather than any single one, being transparent about what is measured and how, contextualising comparisons by field and career stage, and keeping expert peer judgement at the centre with metrics in a supporting role. These commitments connect directly to the broader assessment-reform movement. The principle of not judging research by where it is published is the heart of the comparison in our DORA versus CoARA overview, while the specific hazards of the two most over-used single numbers are examined in our look at the journal impact factor versus the h-index. Responsible metrics is the methodological backbone these reform initiatives share.

    Metrics and the recognition of contribution

    One reason single-number metrics mislead is that they obscure who actually did the work and what they did. A citation count attaches to a paper, not to the distinct contributions of the people who made it. Structured contributorship through the CRediT taxonomy — whose full set of roles is described in our overview of the CRediT roles — offers a more granular and honest picture of contribution than any aggregate metric can, and is a natural complement to responsible assessment: it supports judging people on what they genuinely contributed rather than on a number that flattens it. The consistent vocabulary that lets assessment frameworks, indicators and contribution records be described and exchanged the same way across systems is maintained in the CASRAI Dictionary, helping ensure that responsible metrics rests on a shared and well-defined foundation.

  • Scopus vs Web of Science: Bibliographic Databases Compared

    Scopus is Elsevier’s large multidisciplinary abstract and citation database, and it is the principal alternative to Clarivate’s Web of Science for tracking scholarly literature and its citation relationships. Both index peer-reviewed publications and the citations between them, but they differ in coverage philosophy, the headline metrics they publish, and how each is used in research assessment.

    This article compares the two systems across the dimensions that matter for choosing and interpreting them, and offers a side-by-side table to summarise the differences.

    Coverage and selection

    Both databases are curated rather than exhaustive, applying editorial selection to the titles they index, but they make different trade-offs. Scopus is generally regarded as having a broader title list and wider coverage of disciplines, regions and document types, while Web of Science’s Core Collection is associated with a more tightly selective tradition rooted in the citation-index approach pioneered by Eugene Garfield. Neither covers the entire scholarly literature, and any analysis drawn from them is shaped by what each chooses to index. We unpack the Web of Science side in detail in our Web of Science explainer.

    The headline metrics: CiteScore and the Impact Factor

    Each platform has its own flagship journal-level metric. Scopus publishes CiteScore, a citation-per-document measure computed over a multi-year window from Scopus data. Web of Science, through the Journal Citation Reports, publishes the Journal Impact Factor, computed over a shorter window from Web of Science data. Because the two metrics use different source databases and calculation windows, a journal’s CiteScore and Impact Factor are not directly comparable, and a title may rank differently depending on which system you consult.

    Both are journal-level indicators. Neither is a reliable measure of the quality of an individual article or researcher, and responsible-metrics frameworks consistently warn against that misuse.

    Side-by-side comparison

    Dimension Scopus Web of Science
    Provider Elsevier Clarivate
    Type Abstract & citation database Citation-index platform (Core Collection)
    Coverage style Broad, multidisciplinary selection Selective Core Collection
    Headline journal metric CiteScore Journal Impact Factor (via JCR)
    Metric source data Scopus citations Web of Science citations
    Access Subscription Subscription

    Use in research assessment

    Both databases are widely used in research evaluation, university rankings and bibliometric studies, and many institutions subscribe to both because their differing coverage produces different — and complementary — views of the same literature. A bibliometric analysis can yield materially different results depending on which database supplies the underlying data, so methodological transparency about the source is essential.

    Crucially, citation databases describe attention and connectivity, not intrinsic merit. Movements such as responsible-metrics and narrative-CV approaches encourage assessors to use these tools as one input among many, alongside qualitative judgement and contributor-level information such as that captured by the CRediT contributor-roles taxonomy. Both systems also depend on persistent identifiers — especially the DOI — to disambiguate and link records accurately, and they sit within the broader landscape of research information systems.

    Which should you use?

    There is no universally correct answer. For the widest net across disciplines and document types, Scopus is often preferred; for the longer-established citation-index tradition and the Journal Impact Factor specifically, Web of Science is the source. For any serious analysis, using both and being explicit about coverage limitations is the most defensible approach. Definitions of the metrics named here are maintained in the CASRAI dictionary.

    Frequently asked questions

    Is Scopus bigger than Web of Science?

    Scopus is generally described as having a broader title list and wider document coverage, while Web of Science’s Core Collection is more selective. The right database depends on whether breadth or selectivity matters more for your purpose.

    Can I compare a CiteScore directly with an Impact Factor?

    No. CiteScore and the Journal Impact Factor are computed from different source databases over different time windows, so the two numbers are not interchangeable and should not be compared head to head.

    Do universities subscribe to both?

    Many research institutions subscribe to both Scopus and Web of Science precisely because their differing coverage gives complementary perspectives on the literature and on journal performance.

    Are these databases suitable for evaluating individual researchers?

    Their journal-level metrics are not designed to assess individuals, and responsible-metrics guidance cautions strongly against using them that way. They are best treated as one input within a broader, qualitative assessment.

  • DORA, CoARA and narrative CVs: assessing research responsibly

    For a decade, “responsible research assessment” was mostly a matter of declarations — statements of principle that institutions signed and then struggled to operationalise. That has changed. Assessment reform has moved from declaration to practice, and anyone who now evaluates research or researchers — on a hiring panel, a promotion committee, or a grant board — is increasingly expected to do so by methods that the reform movement has made concrete. This article sets out how the three load-bearing pieces — DORA, CoARA, and the narrative CV — fit together, and what they ask of an evaluator. It draws on the responsible-assessment domain.

    DORA: the declaration that named the problem

    The Declaration on Research Assessment (DORA), issued in 2013, was the movement’s opening move. Its central target was the misuse of the journal impact factor as a proxy for the quality of individual papers and individual researchers. DORA’s argument was straightforward: a journal-level metric says nothing reliable about any single article published in that journal, and using it to judge a researcher’s work — for hiring, promotion, or funding — is a category error. DORA asked institutions, funders, and publishers to stop doing it, and to assess research on its own merits.

    DORA’s contribution was to name the problem clearly and to gather signatories — thousands of them — behind the principle. What it deliberately did not do was prescribe a detailed alternative. It was a declaration of what to stop, more than a manual for what to start. That left a gap, which the next decade’s work set out to fill.

    CoARA: from principle to coalition commitment

    The Coalition for Advancing Research Assessment (CoARA), launched in 2022, is the operational successor in spirit. Where DORA asked organisations to agree with a principle, CoARA asks members to commit to a reform agreement and to produce action plans for changing their own assessment practices. Its membership runs to hundreds of organisations — universities, funders, learned societies — across Europe and beyond.

    The shift from DORA to CoARA is the shift from “we endorse this” to “here is what we will change and by when.” CoARA’s commitments include recognising a diversity of research outputs and activities, basing assessment primarily on qualitative judgement supported by responsible use of metrics rather than the reverse, and abandoning inappropriate uses of journal- and publication-based metrics. It is, in effect, DORA’s principle turned into an implementation programme that members are accountable to.

    The narrative CV: the practical instrument

    If DORA named the problem and CoARA organised the commitment, the narrative CV is the instrument through which reform actually reaches an individual assessment. A narrative CV is a free-text format in which a researcher describes their contributions in prose, structured around a small set of prompts, rather than presenting an enumerated list of publications and metrics. The best-known implementation is UKRI’s Résumé for Research and Innovation (R4RI), which became standard across all UKRI funding from January 2024, building on the Royal Society’s earlier Résumé for Researchers. Wellcome, several other funders, and a number of institutions run their own variants.

    The narrative CV typically asks a researcher to describe their contributions across several dimensions — to the generation of knowledge, to the development of individuals, to the wider research community, and to broader society — rather than to list outputs by venue. The point is to make visible the contributions that a publication list renders invisible: mentorship, team building, peer review, open-science work, and the other forms of hidden labour that the Hidden REF initiative has campaigned to recognise. It is the mechanism by which a panel can assess a researcher as a contributor to research culture, not merely as a producer of papers.

    Responsible metrics, not no metrics

    A persistent misreading of this movement is that it is anti-metric. It is not. The principle, articulated in the Leiden Manifesto of 2015 and carried through CoARA, is responsible metrics: the principled use of quantitative indicators, always contextualised, always combined with qualitative expert judgement, never used as a substitute for reading the work. The objection is not to counting things; it is to letting a count — especially a journal-level one — stand in for judgement about an individual contribution. A responsible assessment may well use metrics; it simply refuses to let them do the assessing.

    How the three fit together

    The relationship is a progression from principle to practice. DORA supplies the foundational principle: do not mistake journal metrics for research quality. CoARA supplies the organised commitment and accountability: members agree to reform and publish how. The narrative CV supplies the concrete instrument: a format that forces an assessment to engage with what a researcher actually contributed. An evaluator working responsibly today is, in effect, applying DORA’s principle through CoARA-aligned practice using narrative-CV instruments.

    What responsible assessment asks of an evaluator

    Concretely, the movement asks an evaluator to read the work rather than its venue; to weigh a diversity of outputs — datasets, software, protocols, models — alongside articles, which presupposes a modern outputs taxonomy that recognises them; to use metrics only in support of judgement, never as a proxy for an individual’s worth; to recognise the hidden labour the narrative format is designed to surface; and to apply consistent qualitative criteria through a shared rubric, so that “narrative” does not become “unstructured and incomparable.”

    That last point is the live challenge. A narrative CV trades the false precision of metrics for the richer but less standardised evidence of prose, and prose is harder to compare across candidates. The answer is not to retreat to metrics but to develop shared rubrics so that narrative assessments are rigorous and fair rather than impressionistic.

    Where the dictionary fits

    Responsible assessment is awash with terms that every funder and institution defines slightly differently — narrative CV, contribution narrative, responsible metrics, hidden labour, team science. Without shared definitions, every reviewer reinvents their own rubric, which is exactly the inconsistency the movement is trying to escape. A shared, operational vocabulary for these concepts is what lets a narrative-CV reviewer at one institution mean the same thing as one at another. Providing that vocabulary — and pointing to DORA, CoARA, and UKRI for the normative content — is the convening role the CASRAI dictionary is built for. For a side-by-side account of the two frameworks, see our DORA versus CoARA comparison.

    What to do now

    For evaluators: read the work, use metrics only responsibly and in support of judgement, and engage seriously with the contributions a narrative CV surfaces. For institutions and funders: align practice with CoARA commitments and adopt narrative-CV formats with shared, qualitative rubrics so that assessments are comparable and fair. For standards work: define the responsible-assessment vocabulary operationally, federating to DORA, CoARA, and the funder narrative-CV guidance.

    Related reading

  • The SCOPE framework for responsible research evaluation: a practical model for designing fair evaluations

    The movement to reform research assessment has produced a powerful set of principles. Declarations and manifestos have told the community what to stop doing: stop using journal-based metrics as a proxy for the quality of individual articles, stop reducing complex contributions to a single number, stop letting convenient indicators substitute for judgement. These principles are essential, but they leave a practical gap. An evaluator — a panel chair, a research manager, a committee designing a hiring process — who agrees with all of them still has to design and run an actual evaluation, and “don’t do the bad things” is not, by itself, a method. The SCOPE framework, developed within INORMS (the International Network of Research Management Societies), exists to fill that gap by offering a structured process for designing a responsible evaluation. This article explains it, drawing on the responsible assessment domain of the CASRAI Dictionary.

    From principles to process

    The distinctive contribution of SCOPE is that it is a how-to, not a what-not-to. Where DORA and the Leiden Manifesto articulate the values and warn against the failure modes of assessment, SCOPE provides a sequence of steps an evaluator can actually follow to build an evaluation that honours those values. It treats the design of an evaluation as a deliberate act requiring thought, rather than a default to be reached for unreflectively. The name SCOPE is an acronym for the stages of that process, and working through them in order is meant to prevent the most common error in assessment: choosing the measure first — usually whatever is easy to count — and only afterwards, if at all, asking whether it actually captures what matters.

    The five steps

    SCOPE guides an evaluator through five stages:

    • S — Start with what you value. Before anything is measured, articulate what the evaluation is genuinely meant to recognise and encourage. This puts values, not available data, in the driving seat, and forces clarity about the purpose of the exercise.
    • C — Context considerations. Take account of the specific context: who or what is being evaluated, the discipline, the career stage, the conditions, and the consequences the evaluation will have. An approach appropriate in one context may be unfair or meaningless in another.
    • O — Options for evaluating. Consider the range of possible ways to conduct the evaluation — qualitative and quantitative, expert judgement and indicators — rather than defaulting to the most familiar tool. This is where the evaluator deliberately weighs alternatives.
    • P — Probe deeply. Interrogate the chosen approach. What are its limitations, biases and unintended effects? Who might it disadvantage? What behaviour will it incentivise? Probing before committing is how harms are caught in advance.
    • E — Evaluate your evaluation. After the exercise, assess whether the evaluation actually worked — whether it served its purpose, was fair, and had the intended effects — and feed what is learned back into future practice.

    The order is the point. By beginning with values and context and treating measurement as a later, considered choice, SCOPE structurally resists the temptation to let convenient metrics define what counts.

    How SCOPE relates to DORA, CoARA and the Leiden Manifesto

    SCOPE does not compete with the major assessment-reform initiatives; it operationalises them. The San Francisco Declaration on Research Assessment (DORA) sets out commitments to stop misusing journal-based metrics and to assess research on its own merits; SCOPE gives an evaluator a way to design assessments that actually deliver on those commitments. The Leiden Manifesto offers principles for the responsible use of metrics — supporting rather than supplanting expert judgement, accounting for context, recognising the limits of indicators — and SCOPE’s steps are, in effect, a procedure for honouring those principles in a concrete exercise. The Coalition for Advancing Research Assessment (CoARA) commits its many signatory organisations to reforming how they assess research; SCOPE is precisely the kind of practical tool such organisations need to translate their commitments into the design of real evaluations. In short, the declarations supply the why and the constraints; SCOPE supplies a disciplined way to do the work within them.

    Why a process matters

    It is worth dwelling on why a process, rather than a set of rules, is the right form for this. Research is too varied for a single prescribed method to fit every case: what is fair when assessing a senior researcher differs from what is fair for an early-career one; what makes sense in a laboratory discipline differs from a field where books and long-form scholarship dominate. A rigid rule (“always use X”) would simply replace one bad default with another. A process like SCOPE instead equips the evaluator to make a good, context-sensitive decision each time, while guarding against the predictable failure modes. It respects the irreducible role of judgement in assessment while ensuring that judgement is exercised thoughtfully and transparently rather than by reflex.

    Describing contribution for fairer assessment

    Responsible evaluation depends on having good information about what people have actually contributed, described in a way that does not collapse into crude proxies. This is where structured contribution information supports the goals of frameworks like SCOPE. The CRediT taxonomy — with its full set of contribution roles — lets an evaluation recognise the specific roles a person played rather than inferring contribution from authorship position or counting papers. Richer, structured information about contribution gives evaluators better material to exercise the considered judgement SCOPE is designed to support, and complements the narrative approaches increasingly used in responsible assessment. The institutional work of putting such practices in place is part of the broader remit of research administration.

    A consistent foundation for evaluation

    For responsible evaluation to work across institutions and systems, the information it draws on must be described consistently — contributions, outputs, roles and the rest. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the evidence feeding an evaluation means the same thing wherever it comes from. SCOPE reminds us that good assessment is something you design, not something you default into; a shared vocabulary helps ensure the materials you design with are sound.