CRediT in JATS XML: a technical primer for production teams

A contributor-roles statement is only as useful as it is machine-readable. A typesetter can render ‘A.B. wrote the original draft; C.D. supervised’ as a tidy paragraph at the foot of an article, but if that information lives only in prose then no downstream system — a research information system, an indexer, a funder’s reporting tool — can act on it. The point of CRediT, the Contributor Roles Taxonomy, is to make contributions structured, and in scholarly publishing ‘structured’ means encoded in JATS XML. This primer is for the production teams who actually do that encoding: the people for whom ‘add CRediT’ on a project plan turns into concrete decisions about elements, attributes and controlled vocabularies. The authoritative tag-level guidance is set out in the CRediT in JATS reference and the broader JATS implementation notes.

Where contributor roles live in JATS

JATS (the Journal Article Tag Suite, the NISO Z39.96 standard) models people in the <contrib-group> element. Each named individual is a <contrib>, carrying their name, affiliations and identifiers. The element that carries a contributor’s function is <role>, nested inside the relevant <contrib>. A single contributor may hold several roles, so multiple <role> elements per <contrib> are expected and entirely valid — one person might legitimately be tagged for Conceptualization, Methodology and Writing – review & editing.

The job of a production team is to make those <role> elements unambiguous. Free-text role labels are not enough, because ‘wrote the paper’ and ‘drafting’ and ‘Writing – original draft’ are the same role expressed three ways. CRediT solves this by giving each of its roles a stable definition and a canonical identifier, and JATS provides the attributes to point at them.

The JATS4R recommendation for encoding CRediT

JATS4R — JATS for Reuse — is the community group that publishes interoperability recommendations for ambiguous corners of the standard, and it has a specific recommendation for CRediT. The core of it is that a <role> element used for a CRediT contribution should declare the vocabulary it draws from and the specific term within it. In practice this means three attributes work together:

  • vocab — identifies the controlled vocabulary as CRediT;
  • vocab-identifier — gives the URI of the taxonomy itself, so a consuming system can resolve what vocabulary is being used;
  • vocab-term and vocab-term-identifier — give the exact term and its canonical URI, so the role resolves to one and only one CRediT definition.

The human-readable label remains the text content of the <role> element — that is what a reader sees — while the attributes carry the machine meaning. The recommendation is deliberate that the visible text and the term identifier must agree: do not tag a <role> as Data curation in its attributes while the visible text reads ‘Formal analysis’. JATS4R also advises using the official CRediT term strings verbatim rather than house variants, because verbatim strings are what validators and aggregators expect to match.

Degrees of contribution

CRediT permits, but does not require, a statement of the degree of a contribution — for example marking one contributor as having led a given role. JATS expresses this through additional attribution on the role rather than by changing the term identifier. Production teams should treat degree as optional metadata that is encoded only when the manuscript actually supplies it; inventing a lead/equal distinction where the authors stated none is a data-quality error, not an enhancement. When degree information is present, keep it consistent across the article so that a reader and a parser draw the same conclusion.

Common production pitfalls

Several mistakes recur often enough to be worth naming. The first is putting CRediT roles in the wrong place — bundling them into an unstructured author-contributions paragraph in the article body instead of, or in addition to, the structured <role> elements. The structured encoding is the one machines read; a prose paragraph is a courtesy to humans, not a substitute. The second is omitting vocab-identifier and vocab-term-identifier, which leaves the role as plain text that cannot be reliably disambiguated. The third is term drift: lightly edited labels such as ‘Writing (review and editing)’ that no longer match the canonical CRediT string and therefore fail automated checks.

A subtler issue is association: every <role> must sit inside the correct <contrib>. In articles with long author lists it is easy for a role to be attached to the wrong person during conversion, especially when contributions are supplied as a separate table that a typesetter merges by hand. Validating that each role resolves to the intended contributor is as important as validating that the term identifiers are correct.

Building it into the workflow

The practical recommendation is to capture CRediT as structured data as early as possible — ideally at submission, where many manuscript systems now collect a contribution matrix — and to carry that structure through conversion rather than reconstructing it from prose at the typesetting stage. Round-trip validation against the JATS4R recommendation should be part of the production QA step, alongside the schema validation a publisher already runs. Treating contributor roles as first-class structured metadata, governed by the definitions in the research information systems domain of the CASRAI Dictionary, is what allows contribution data to survive intact all the way to the version of record and beyond.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *