Datasets and software
DataCite contributorType and the CRediT cross-walk
DataCite's vocabulary for contributor roles on datasets and software overlaps with CRediT but is distinct. Practical guidance on which to use where, with a published cross-walk.
Two vocabularies, one ecosystem
DataCite is the DOI registry for research data and software, and its metadata schema has long carried a contributorType field that names the role a person or organisation played in creating a dataset. The vocabulary predates CRediT and reflects the data-stewardship lineage: values include DataCurator, Researcher, ProjectLeader, Sponsor, and Supervisor. The full list lives in the DataCite metadata schema reference.
CRediT, defined as ANSI/NISO Z39.104-2022, names 14 roles oriented to scholarly articles. The two vocabularies overlap in spirit — both describe granular contributor roles — but they are not directly interchangeable. A 2024 joint guide from Crossref and DataCite documents the relationship and the recommended mapping; see the DataCite blog post on the joint metadata guide.
Practical rule of thumb
For the rebuilt CASRAI site the recommendation is straightforward: use CRediT for journal articles and book chapters; use DataCite contributorType for datasets and software; cross-reference them where the same person plays both kinds of role on different outputs. Avoid translating CRediT roles into contributorType values silently — the loss of precision shows up downstream.
Where contributorType lives in a DataCite deposit
Inside the contributors element of a DataCite XML deposit, each contributor declares a contributorType attribute drawn from the fixed vocabulary. The contributor also carries name, name identifier (typically an ORCID iD), and affiliation in the usual way.
<contributors>
<contributor contributorType="DataCurator">
<contributorName nameType="Personal">Zhang, San</contributorName>
<givenName>San</givenName>
<familyName>Zhang</familyName>
<nameIdentifier nameIdentifierScheme="ORCID"
schemeURI="https://orcid.org/">0000-0001-2345-6789</nameIdentifier>
<affiliation affiliationIdentifier="https://ror.org/04abcd123"
affiliationIdentifierScheme="ROR">University of Example</affiliation>
</contributor>
<contributor contributorType="ProjectLeader">
<contributorName nameType="Personal">Liu, Mei</contributorName>
<givenName>Mei</givenName>
<familyName>Liu</familyName>
<nameIdentifier nameIdentifierScheme="ORCID"
schemeURI="https://orcid.org/">0000-0002-3456-7890</nameIdentifier>
</contributor>
</contributors>Indicative CRediT to contributorType cross-walk
The mapping below summarises the joint guide. It is indicative, not authoritative — deposit-time decisions should consult the upstream guide directly. Where no clean mapping exists, the recommendation is to choose the closest DataCite value and keep the CRediT URI on the linked article, not on the dataset record.
CRediT role DataCite contributorType --------------------------------- ---------------------------- Data curation DataCurator Investigation Researcher Project administration ProjectManager Supervision Supervisor Funding acquisition Sponsor Methodology Researcher (closest) Software (no direct match — see DataCite "resourceType=Software") Resources Other (with role narrative) Formal analysis Researcher (closest) Writing - original draft (no direct match — narrative-only role) Writing - review & editing Editor (closest, depending on context) Visualization Producer (closest) Validation Researcher (closest) Conceptualization Researcher (closest)
The blank cells matter: not every CRediT role has a sensible DataCite analogue. For software in particular, DataCite recommends using its resourceType="Software" alongside the relevant contributorType values rather than forcing CRediT's vocabulary onto a dataset deposit.
Where this matters operationally
- Repositories minting DOIs for datasets and code. Use
contributorTypeat deposit; do not invent CRediT-flavoured custom fields. - Publishers issuing a paper plus an underlying dataset. Two deposits, two vocabularies, linked via
relatedIdentifierin the DataCite record and the Crossref deposit's relation block. - Institutional CRIS systems. Read both vocabularies; reconcile to a single internal model that preserves the distinction rather than collapsing it.
Related
- Crossref schema 5.5 — for the article that cites the dataset.
- ORCID integration — both vocabularies propagate via ORCID auto-update.
- CRediT for publishers — editorial-side guidance on which vocabulary to capture when.
- External: schema.datacite.org — the canonical schema reference.








