Tools for CRediT: tenzing, the Contributor Role Ontology and machine-readable contributorship

The CRediT taxonomy has done a great deal to change how research contributions are described: instead of an opaque author list, a paper can carry a structured statement of who did what across fourteen defined roles. But a taxonomy is only as useful as the practices and tools that put it to work. Two problems recur. First, producing a CRediT statement by hand — collecting fourteen roles from a dozen co-authors and assembling them into a correct, consistent statement — is tedious and surprisingly easy to get wrong. Second, a statement that ends up as a paragraph of prose in a published paper is largely invisible to the systems that could use it; it cannot easily be searched, aggregated, or propagated to a contributor’s profile. This article looks at the tools and standards that address both problems, drawing on the CRediT extensions domain of the CASRAI Dictionary.

tenzing: generating CRediT statements

One of the best-known practical tools is tenzing, a free application for generating contributorship statements, developed by Marton Balazs Aczel, Alex Holcombe and colleagues and named after the mountaineer Tenzing Norgay. The idea is simple and effective. Rather than each author emailing their roles to a corresponding author who then assembles them by hand, contributors record their roles in a shared spreadsheet using the CRediT taxonomy, and tenzing turns that into a properly formatted contributorship statement ready to paste into a manuscript. It can also produce machine-readable output, so the same information can travel as structured data rather than only as prose.

The value of a tool like tenzing is not merely convenience. By making it easy to capture each contributor’s roles at the point of writing, it encourages teams to have the explicit conversation about contribution that good practice requires, and it reduces the errors that creep in when statements are assembled informally. It lowers the effort of doing the right thing, which often determines whether the right thing gets done.

The Contributor Role Ontology

For CRediT to be genuinely machine-readable, the roles need a formal, computational expression — a definition that software can reason about, not just a list of human-readable labels. This is what the Contributor Role Ontology (CRO) provides. The CRO is an ontology that represents the CRediT roles (and extends around them) in a structured, formal way, giving each role a stable identifier and a place in a defined conceptual model. An ontology differs from a plain list: it expresses relationships and definitions in a form that systems can use to integrate, validate and exchange information reliably.

Why does this matter? When a contribution role has a formal identifier rather than only a text label, different systems can refer to exactly the same concept without ambiguity. “Data curation” written as free text might be abbreviated or translated inconsistently across platforms; the same role expressed through the CRO is an unambiguous, machine-resolvable entity. The ontology is what lets contributorship data move between systems without losing its meaning.

Tagging contributorship in JATS

Scholarly articles are typically encoded in JATS (the Journal Article Tag Suite), the XML standard publishers use to represent article content and metadata. JATS provides structured ways to record contributors, including the ability to associate contribution roles with each contributor and to point those roles at a controlled vocabulary such as CRediT. When a publisher tags contributorship properly in JATS — attaching each author’s CRediT roles within the article’s structured metadata rather than only writing them out in a prose statement — the contribution information becomes part of the machine-readable record of the article. The roles are then available to any system that processes the JATS, rather than being locked inside a sentence that only a human reader can parse.

Propagating contributions through the PID ecosystem

Machine-readable contributorship becomes powerful when it flows beyond the article into the wider persistent-identifier ecosystem. The key actors are familiar:

  • ORCID gives each contributor a persistent identifier, so a role can be attached to a specific, disambiguated person rather than to a name that might be shared or misspelt.
  • Crossref is the metadata infrastructure through which publishers register articles, and its schema can carry CRediT role information alongside the rest of an article’s metadata.

When contribution roles are captured in structured form and registered with the metadata, they can in principle propagate: a contribution recorded against an ORCID identifier can surface on that person’s profile, and role information registered with Crossref can be exposed to the many services that consume Crossref metadata. The vision is that a contribution recorded once, at the point of writing, follows the contributor and the work through the scholarly record — appearing on profiles, in discovery systems and in reporting — without anyone re-entering it.

From statement to data

The common thread is the shift from contributorship as text to contributorship as data. A prose statement is a fine thing for a human reader, but it is a dead end for everything downstream. The tools and standards described here — tenzing for generating statements, the CRO for formalising the roles, JATS for tagging them in the article, and ORCID and Crossref for carrying them through the identifier ecosystem — together turn a once-static statement into structured information that can be searched, aggregated, credited and reused. The full CRediT taxonomy, with its set of contribution roles, is the vocabulary at the centre of all of it; the tooling is what makes that vocabulary actionable in real systems rather than merely available on paper.

A shared vocabulary underneath

For contributorship data to move cleanly between a spreadsheet, an ontology, an article’s XML and a contributor’s profile, the roles must mean exactly the same thing at every step. That consistency is what the CASRAI Dictionary supports: a shared vocabulary so that a contribution role recorded in one tool is understood identically in the next. Researchers and administrators wanting to put this into practice can explore the practical resources in our learning materials. CRediT gave the community a common language for contribution; tools such as tenzing and standards such as the Contributor Role Ontology are what let that language be spoken by machines as well as people.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *