Limitations of Bibliometrics: DORA and CoARA

Bibliometrics — the statistical analysis of publication and citation data — cannot reliably stand in for research quality on its own: field-specific citation practices, author self-citation, and outright metric gaming all distort single-number scores such as the h-index or Journal Impact Factor. This is the documented evidentiary basis for DORA and CoARA’s push to replace single-score evaluation with qualitative, multi-indicator assessment.

Bibliometrics is the quantitative study of academic literature — citation counts, publication volume, and derived indices — used as a proxy for scholarly influence. The proxy breaks down whenever a single number is asked to carry the full weight of a quality judgement, which is precisely what large-scale hiring, promotion, tenure, and funding panels have done for decades.

What is bibliometrics, and why does one score fall short?

Bibliometric indicators — citation counts, the h-index, the Journal Impact Factor (JIF), and derived composite scores — were built for large-scale, aggregate comparisons, not for judging an individual scholar’s contribution. Bergstrom, West and Wiseman’s 2008 analysis in the Journal of Neuroscience put it plainly: quantitative metrics are poor choices for assessing an individual’s research output compared with the “gold standard” of reading the work and consulting domain experts.

A single score compresses conflicting dimensions of scholarly value — novelty, rigour, reproducibility, societal reach — into one figure. That compression, not citation data itself, is the structural weakness reform movements target.

How does field bias distort bibliometric comparisons?

Citation practices vary sharply by discipline, so raw citation counts cannot be compared across fields. Mathematics and the humanities publish and cite far less frequently than biomedicine, and books and conference proceedings — the dominant outputs in many humanities and computing sub-fields — are tracked inconsistently, or not at all, by Web of Science and Scopus.

Coverage gaps compound the bias. Indexing databases differ in subject breadth, subject depth, geographic coverage, language coverage, and how far back citation histories extend, so researchers publishing outside the Anglophone, journal-dominant core of a database are systematically under-counted. Belter’s 2015 review in PMC also notes that citation-based indicators require roughly two to three years after publication before they stabilise enough to be considered reliable — a lag that penalises early-career researchers and recent work by design.

Why does self-citation inflate bibliometric scores?

Self-citation — an author citing their own prior work — is a normal and often legitimate part of building on a research programme. It becomes a distortion when it is used strategically to inflate an individual’s citation count or a journal’s Impact Factor beyond what independent uptake of the work would justify.

Clarivate’s Journal Citation Reports has, in past cycles, suppressed the calculated Impact Factor of titles found to display anomalous citation behaviour, including excessive journal self-citation and coordinated “citation stacking” arrangements between journals — a documented, database-level enforcement action against exactly this failure mode. At author level, unusually concentrated self-citation rates are one of the diagnostic flags bibliometricians use when auditing whether a headline citation figure reflects genuine external uptake or engineered inflation.

Does field-weighted citation impact solve the problem?

Field-weighted citation impact (FWCI) is a normalised metric — used in tools such as Scopus/SciVal — that adjusts a publication’s citation count against the average for its subject field, publication year, and document type, so that a score of 1.0 represents “as expected” performance for that context. It is a genuine improvement on raw citation counts because it corrects for the field-bias problem described above.

FWCI does not, however, correct for self-citation gaming or database coverage gaps, and it remains a single number: it shows how a paper performed against a benchmark, not whether the research was rigorous or original. Reform frameworks treat field normalisation as a refinement of bibliometrics, not a licence to keep using any single indicator as a proxy for quality.

What evidence underlies DORA and CoARA’s reform case?

The San Francisco Declaration on Research Assessment (DORA), launched in 2012, explicitly recommends against using the Journal Impact Factor as a surrogate measure of the quality of individual research articles, and calls on institutions to assess research on its own merits using a range of qualitative and quantitative indicators. The Coalition for Advancing Research Assessment (CoARA), formed in 2022, builds on DORA’s diagnosis: its signatories commit to basing assessment primarily on qualitative, peer-reviewed judgement, supported by responsible — not exclusive — use of quantitative indicators, and to abandoning inappropriate use of journal- and publication-based metrics such as the JIF and h-index.

Both build directly on the failure modes above: field bias, self-citation gaming, database coverage gaps, and the two-to-three-year reliability lag are the documented evidence, not abstract principle, behind the push for reform.

Initiative Launched Core commitment
DORA (San Francisco Declaration on Research Assessment) 2012 Stop using the Journal Impact Factor as a proxy for individual article or researcher quality
Leiden Manifesto 2015 (Hicks et al., Nature 520, 429–431) Ten principles for the responsible, transparent use of quantitative indicators alongside expert judgement
CoARA (Coalition for Advancing Research Assessment) 2022 Base assessment primarily on qualitative peer review; abandon inappropriate JIF/h-index use in hiring, promotion and funding decisions

Answer-first questions on bibliometric limitations

What are the main limitations of bibliometrics in research assessment?

The main limitations are field bias (citation norms differ by discipline), database coverage gaps (books, non-English and non-journal outputs are under-tracked), self-citation inflation, and a two-to-three-year lag before citation counts stabilise. Together these mean a single score cannot substitute for expert, qualitative judgement of research quality.

Why is the h-index considered a poor measure of individual research quality?

The h-index rewards volume and career length over insight, cannot distinguish a highly cited author from a member of a large collaborative team, and does not account for field-specific citation norms. Bergstrom, West and Wiseman (2008) concluded that reading the work and consulting experts remains the more reliable standard for individual evaluation.

What is the difference between DORA and CoARA?

DORA (2012) is a signable declaration focused primarily on eliminating Journal Impact Factor misuse. CoARA (2022) is a membership coalition of funders, universities and academies that goes further, committing signatories to a broader, peer-review-centred reform agenda across hiring, promotion, and institutional evaluation, with periodic reporting on progress.

What is a self-citation rate and why does it matter?

A self-citation rate is the proportion of an author’s or journal’s total citations that come from their own prior work rather than independent external uptake. Bibliometricians and citation-database auditors (including Clarivate’s Journal Citation Reports process) use unusually high self-citation rates as a flag for possible metric gaming rather than genuine scholarly influence.

What should research administrators do differently?

For research administrators and institutional leaders, the practical implication is not to discard citation data but to stop letting any single figure carry a hiring, promotion, or funding decision unsupervised. That means:

  • Pairing field-normalised indicators such as FWCI with narrative, qualitative peer assessment, as CoARA commitments require.
  • Auditing self-citation and journal self-citation patterns before citing a headline figure in a case file.
  • Recognising a fuller range of outputs — datasets, software, policy influence — rather than journal articles alone.
  • Crediting individual contributions on multi-author papers explicitly, rather than inferring credit from author position or aggregate citation share.

On that last point, standardised contributor-role taxonomies address a related gap directly. CASRAI originated the CRediT contributor role taxonomy in 2014; the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022, and it lets institutions record which named contributor performed which specific role on a paper — conceptualisation, data curation, writing — rather than relying on citation share or author-list position as a proxy for who did what.

Where bibliometric reform goes next

The evidentiary case against single-number bibliometric scores is now well established: field bias, database coverage gaps, self-citation gaming, and a multi-year reliability lag are documented, auditable failure modes, not theoretical objections. DORA and CoARA translate that evidence into institutional commitments, and field-normalised metrics such as FWCI narrow — without eliminating — the field-bias problem.

The direction of travel for funders, universities and academies is toward layered assessment: responsibly used quantitative indicators, transparent contributor-role attribution, and peer judgement at the centre, rather than any one score standing alone.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *