Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Explainer · Plain-language

Data Citation: Definition, Meaning & Examples | CASRAI

Data citation is the practice of formally citing datasets as first-class research outputs, in the same way that articles and books are cited. It gives data creators credit, makes the data findable and verifiable, and lets readers trace the evidence behind a study. A data citation points to a specific, persistently identified dataset rather than to a paper that merely discusses it.

CASRAI plain-language explainers — clear answers to recurring research-administration questions

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

Why data citation matters

Historically, datasets were buried in the methods section or held privately, so the effort of collecting, cleaning, and curating data went largely unrewarded. Data citation changes that by making datasets a recognised, creditable output. When a dataset is cited like any other reference, its creators accrue visible credit, and the citation creates a permanent, machine-readable link between a claim and the evidence that supports it. This matters for research integrity and reproducibility. A reader who can follow a citation to the exact dataset used can scrutinise, verify, or reuse the underlying evidence. It also helps funders and institutions see the return on the investment they make in data collection, and it encourages a culture in which sharing well-documented data is a rewarded activity rather than an optional extra.

The FORCE11 Data Citation Principles

The Joint Declaration of Data Citation Principles, published by the FORCE11 Data Citation Synthesis Group in 2014, is the reference framework. It states eight principles: data citation is important and should be a legitimate scholarly practice; citations should facilitate credit and attribution; they are evidence and should be included wherever a claim relies on data; datasets should carry a unique, machine-actionable, globally unique persistent identifier; citations should facilitate access to the data and its associated metadata; identifiers and metadata should persist even if the data do not; citations should support identification of the specific version or subset used; and citation methods should be interoperable across communities and not disrupt other practices. These principles have been endorsed by a wide range of publishers, repositories, and infrastructure providers, and they shape how journals and data repositories format and require data citations today.

What a data citation contains

A data citation follows a structure similar to a publication citation. It typically lists the creators or authors of the dataset, the publication or release year, the dataset title, the publisher or repository (for example Zenodo, Dryad, or a discipline-specific archive), a version where relevant, and a persistent identifier — most often a DataCite DOI that resolves to a landing page describing the dataset. Including the version or a specific subset is important: data can change over time, and a precise citation lets others retrieve exactly what was used. The persistent identifier is the anchor that keeps the citation working even if the data are moved or reorganised, because the identifier resolves to current metadata about the dataset.

Data citation, availability statements and credit

Data citation works hand in hand with the data availability statement that many journals now require. The statement tells readers whether and how the data underlying a paper can be accessed, and a formal data citation in the reference list provides the precise, persistent pointer to that data. Together they close the loop between a published finding and its supporting evidence. Data citation is also a credit mechanism. Because cited datasets can be tracked and counted, contributors to data collection and curation gain a recognised record of their work — supporting the broader move, reflected in declarations such as DORA and the principles of responsible research assessment, towards valuing a wider range of outputs than journal articles alone.

Key facts

At a glance

  • Definition: Formally citing datasets as first-class research outputs
  • Framework: FORCE11 Joint Declaration of Data Citation Principles (2014)
  • Principles: Eight, covering credit, evidence, persistence, and specificity
  • Identifier: Usually a DataCite DOI resolving to a dataset landing page
  • Contains: Creators, year, title, repository, version, persistent identifier
  • Supports: Reproducibility, credit, and the data availability statement

Common misconceptions

What people often get wrong

Often heard: Citing the paper that used a dataset is the same as citing the data.

Actually: No — a data citation points to the dataset itself, with its own persistent identifier, so the evidence can be located and reused directly rather than only through a narrative description in an article.

Often heard: A web link to the data is good enough as a citation.

Actually: No — plain URLs break over time. A proper data citation uses a persistent identifier such as a DataCite DOI, which keeps resolving to current metadata even if the data are moved.

Often heard: Datasets are not real scholarly outputs, so they need not be cited formally.

Actually: No — the FORCE11 principles establish data citation as a legitimate scholarly practice, giving data creators credit and treating data as evidence on a par with other cited works.

LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →