Definition · Plain-language
CRediT role: Data curation
In the CASRAI-originated CRediT taxonomy (ANSI/NISO Z39.104-2022), Data curation covers management activities to annotate, scrub and maintain research data — including software code — for initial use and later re-use.
The step most authors miss
Doing CRediT right? Don’t stop at the statement.
A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.
Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.
What Data curation means in CRediT
The ANSI/NISO Z39.104-2022 definition is: "Management activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later re-use." This encompasses four main activities: producing metadata (describing what data exist and how they are structured); scrubbing data (cleaning, de-duplicating, correcting errors, harmonising formats); maintaining datasets over time (archiving, ensuring persistent access, responding to re-use queries); and curation of software code required to interpret or reproduce results.
Data curation and FAIR data principles
The FAIR data principles — Findable, Accessible, Interoperable, and Reusable — describe what well-curated data looks like. Data curation is the practical activity that makes data FAIR. Annotation with rich metadata (Findable); deposition in open repositories with persistent identifiers (Accessible); use of community-standard data formats and vocabularies (Interoperable); and clear licensing and provenance documentation (Reusable) are all curation activities. Researchers who invest substantial time in making datasets FAIR should receive Data curation credit — a recognition historically missing from authorship conventions.
The recognition gap
Before CRediT, data managers, research data librarians, bioinformaticians who curated genomic datasets, and statisticians who harmonised multi-site trial data all performed essential work that disappeared into acknowledgements or was not credited at all. This recognition gap affected career progression: without publication credits, data curators could not demonstrate research contribution in grant applications or promotion cases. CRediT's Data curation role directly addresses this gap, making it possible for data professionals to accumulate a verifiable record of contribution to published research.
Key facts
At a glance
- Role definition: "Management activities to annotate, scrub data and maintain research data for initial use and later re-use"
- Standard: ANSI/NISO Z39.104-2022, role 2 of 14 (alphabetical: D)
- URI: casrai.org/credit/roles/data-curation
- Includes: metadata production, data scrubbing, dataset maintenance, software curation
- FAIR: data curation activities make datasets Findable, Accessible, Interoperable, Reusable
- Recognition gap: previously invisible in author lists; CRediT makes it visible
Common misconceptions
What people often get wrong
Often heard: Data curation only applies to large datasets or bioinformatics.
Actually: Data curation applies to any research where annotating, cleaning or maintaining data is a distinct, substantive activity — from clinical trial data to qualitative interview transcripts to survey datasets.
Often heard: Data curation is the same as Data collection (Investigation).
Actually: Investigation covers collecting or gathering data (running experiments, recording observations). Data curation covers managing, annotating and maintaining data after collection — often a separate and substantial activity.
Often heard: Software developers always receive the Software role, not Data curation.
Actually: Where software code is required to interpret the data itself, maintaining that code is a data curation activity under ANSI/NISO Z39.104-2022. Researchers who maintain analysis pipelines for reproducibility may hold both Software and Data curation roles.
Common questions
FAQ
Should a research data librarian who curated a dataset receive Data curation credit?+
If the librarian's contribution was substantive — producing metadata, scrubbing data, depositing to a repository — and meets the criteria for authorship at the target journal, then yes. If they do not meet authorship criteria, acknowledge them in the acknowledgements section with a description of their contribution.
How does Data curation relate to CASRAI's work on research data management?+
CASRAI's broader work on research information standards (including the CERIF model and institutional data management frameworks) is complementary to the CRediT Data curation role. The role operationalises recognition of data management work at the article level; CASRAI's institutional standards address data management infrastructure at the organisational level.
Going deeper








