Editorial · CASRAI · Research outputs (expanded)

Citing Data Properly: The Joint Declaration of Data Citation Principles

Datasets are first-class research outputs, yet they are still routinely mentioned in passing rather than formally cited. The Joint Declaration of Data Citation Principles, published by FORCE11 in 2014, set out eight principles to bring data into the scholarly record on equal footing with the literature. We explain how to cite a dataset, why DataCite DOIs belong in reference lists, and how data availability statements complete the chain.

ByCASRAI Editorial Board

Published 21 Jun 2026· 4 minute read

For decades, the data underpinning a study lived in a footnote, an appendix, or nowhere visible at all. A reader who wanted to inspect, reuse, or build on those data had little to go on. As research has become more data-intensive, that omission has grown harder to justify. The Joint Declaration of Data Citation Principles, published through FORCE11 in 2014, was a deliberate attempt to fix it by treating datasets as legitimate, citable research outputs in their own right.

Why data citation matters

Citing data is not merely good manners. It serves the same purposes as citing the literature: it credits the people who produced the work, it lets readers verify claims, and it builds a traceable record of how knowledge accumulates. When a dataset is cited formally, the citation can be counted, indexed, and linked, which means the often-considerable labour of collecting, cleaning, and documenting data becomes visible and rewardable. This connects directly to broader efforts in FAIR data, where the goal is for data to be findable, accessible, interoperable, and reusable.

The eight principles

The Declaration is built around eight principles that, taken together, describe what responsible data citation looks like:

Importance. Data should be considered legitimate, citable products of research, deserving the same status as publications.
Credit and attribution. Citations should give scholarly credit and normative, legal attribution to everyone who contributed to the data.
Evidence. Where a claim rests on data, the corresponding data should be cited.
Unique identification. Citations should include a persistent, machine-actionable, globally unique identifier.
Access. Citations should make it possible to reach the data themselves and their associated metadata and documentation.
Persistence. Identifiers and metadata should persist even beyond the lifespan of the data they describe.
Specificity and verifiability. Citations should allow a precise version and subset of the data to be identified.
Interoperability and flexibility. Citation methods should work across communities while accommodating disciplinary differences.

These principles are intentionally technology-neutral. They do not mandate a single repository or identifier scheme; they describe outcomes that any sound practice should achieve.

How to cite a dataset in practice

A well-formed data citation looks much like a reference to an article, but with a few additions. At minimum it should carry the creator or creators, the year of publication, the title of the dataset, the publisher or repository, a version where one exists, and a persistent identifier. In most cases that identifier is a DataCite DOI, resolvable to a landing page that describes the dataset and points to the files. A typical reference takes the shape: Creator(s) (Year): Title. Version. Publisher. Dataset. DOI.

Two details repay attention. First, versioning is not optional for datasets that change over time. Citing the specific version used means a future reader can reproduce exactly what was analysed, rather than a later, possibly different, release. Second, the identifier should appear in the reference list, not merely in the running text. Burying a dataset DOI in a sentence keeps it out of the indexing and counting systems that make citation meaningful in the first place.

DataCite DOIs and the reference list

DataCite was established precisely to assign DOIs to research data and to maintain the metadata that makes those DOIs useful. When a repository mints a DataCite DOI for a dataset, it registers structured metadata describing the creators, title, publication year, resource type, and related identifiers. That metadata is what allows discovery services and reference managers to handle data citations the way they handle article citations. Placing the DOI in the reference list, formatted to the relevant style, lets indexing infrastructure pick it up and attribute it correctly.

Data availability statements close the loop

Many publishers now require a data availability statement, a short passage telling readers where the underlying data can be found and under what conditions. Done well, the statement names the repository and gives the persistent identifier, linking the prose of the article to the formal citation in the reference list. Done poorly, it says only that data are available on request, which research has repeatedly shown to be an unreliable route to access. A good availability statement and a properly formatted data citation are two halves of the same commitment: that the evidence behind a study can actually be found and reused.

Bringing it together

The Joint Declaration did not invent the idea that data deserve credit, but it gave the community a shared, citable reference point. The practical implications are modest and achievable: assign a persistent identifier, capture the version, put the citation in the reference list, and write a data availability statement that points to it. Standards bodies and metadata schemas, including the work catalogued in the CASRAI data dictionary and contributor frameworks such as CRediT, give the surrounding vocabulary to describe who did what. The principles themselves are a reminder that data are not a by-product of research but, increasingly, one of its most valuable outputs.

Related editorial in this domain

More on Research outputs (expanded)

20 Jun 2026

Chicago and Vancouver Referencing Styles Explained

Chicago and Vancouver are two contrasting referencing systems: Chicago offers a notes-bibliography and an author–date variant for the humanities and sciences, while Vancouver uses numeric citations for biomedicine. This guide explains how each works, with worked examples.

20 Jun 2026

Anatomy of a Journal Article: The IMRaD Structure

Most empirical research papers follow the IMRaD structure: Introduction, Methods, Results and Discussion, framed by an abstract, references and metadata. This guide explains each section, how to read it and how to write it, plus the role of DOIs.

20 Jun 2026

Citation Styles Compared: APA, MLA, Chicago, Vancouver

Citation styles are standardised systems for formatting in-text citations and references. Compare APA 7, MLA 9, Chicago 17 and Vancouver, the disciplines that use them, and when to choose each.