Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Editorial · CASRAI

Data Papers Explained: Making Datasets Citable

A data paper is a peer-reviewed publication that makes a dataset citable and FAIR — not just a data availability statement.

ByMCP Service
Published 3 Jul 2026· 7 minute read

A data paper is a peer-reviewed journal article whose sole purpose is to describe a dataset — its collection methods, structure, quality controls and reuse potential — so the dataset itself becomes a citable, discoverable research output. This is fundamentally different from a data availability statement (DAS), which is only a short paragraph inside a conventional research article pointing to where supporting data can be found. Understanding the distinction matters for anyone trying to get formal academic credit for data curation work, rather than a passing mention buried in someone else’s paper.

A data paper is best defined this way: it is a searchable, citable metadata document, published as a standalone peer-reviewed article, whose primary content is the dataset’s provenance, structure and quality rather than a hypothesis or a set of conclusions.

What is a data paper?

A data paper is a peer-reviewed document describing a dataset, published in a peer-reviewed journal rather than as an appendix to a conventional study. It concentrates on the “what, why and how” of the data itself — collection methodology, processing steps, structure and known limitations — rather than on testing a hypothesis.

The format is also known as a data article, data report, data brief or data note, but the function is consistent: it converts curation effort into an indexed, citable scholarly output that gives dataset creators formal academic credit.

How is a data paper different from a data availability statement?

A data availability statement is a short, mandatory paragraph within a conventional research article that tells readers where and how to access the data underpinning that paper’s findings. It exists to support transparency and reproducibility of one specific study — it is not a publication in its own right and it is not independently peer reviewed as a scholarly document.

A data paper, by contrast, is a full standalone publication. It undergoes its own peer review, receives its own DOI, and is indexed and cited independently of any related research article. The table below sets out the practical differences.

Feature Data paper Data availability statement
Nature Standalone, peer-reviewed journal article A short section inside another article
Peer review Independently peer reviewed as a scholarly work Not separately reviewed
Citability Has its own DOI and citation record Not citable as a discrete work
Purpose Describe and credit a dataset in depth Point readers to where data for one study lives
Typical length Several pages, structured like a journal article One to three sentences

Since 2018, the International Committee of Medical Journal Editors (ICMJE) has required a data sharing statement in reports of clinical trials, and many funders, including UKRI, expect a data access statement in any grant output. Neither requirement is a substitute for a data paper: a DAS satisfies a transparency mandate, while a data paper is the route to scholarly recognition and independent citation of the dataset itself.

Which journals publish data papers?

Dedicated data journals have grown substantially since the mid-2010s. According to the Global Biodiversity Information Facility (GBIF), which tracks outlets accepting data papers, article processing charges and impact metrics vary widely by publisher.

  • Scientific Data (Nature Portfolio) — an open-access, online-only journal dedicated to descriptions of scientifically valuable datasets, with a 2024 Journal Impact Factor of 6.9 and an article processing charge of approximately EUR 1,790, per GBIF’s June 2026 tracked figures.
  • Data in Brief (Elsevier) — a multidisciplinary, open-access journal publishing short data articles that describe and give context to datasets, with a 2024 Journal Impact Factor of 1.4 and an article processing charge of approximately USD 1,010.
  • GigaByte (BGI and Oxford University Press) — a CC BY open-access journal for “big data” descriptions across the life, biomedical and environmental sciences, with a 2024 Journal Impact Factor of 1.2, a Scopus CiteScore of 3.2, and an article processing charge of approximately USD 350 — the lowest of the three.

Discipline-specific alternatives exist too: Earth System Science Data (Copernicus) carries a 2024 CiteScore of 20.6, and Biodiversity Data Journal (Pensoft) charges from around EUR 650. Choice of outlet should follow disciplinary norms, not price alone.

How do you publish a data paper?

Publishing a data paper follows a broadly consistent workflow across data journals:

  1. Deposit the dataset first. Upload the data to a recognised repository (for example Dryad, Zenodo or a domain-specific archive) so it receives a persistent identifier before the manuscript is submitted.
  2. Draft the manuscript around the metadata. Describe collection methods, instrumentation, processing pipelines, quality-control steps and known limitations — some tools, such as GBIF’s Integrated Publishing Toolkit, can auto-generate a manuscript draft directly from dataset metadata.
  3. Select a journal matched to the dataset’s discipline. Compare scope, licence terms, and article processing charge against outlets such as Scientific Data, Data in Brief or GigaByte.
  4. Submit for peer review. Reviewers assess the completeness and reusability of the description, not novel findings or conclusions.
  5. Publish and cross-link. On acceptance, the data paper’s DOI should be cross-referenced with the dataset’s own DOI in the repository record, so citation tools can connect the two.

Why do data papers matter for FAIR data and citation?

The FAIR Guiding Principles — Findable, Accessible, Interoperable, Reusable — were formalised by Wilkinson and colleagues in a 2016 Scientific Data paper and now underpin funder and repository policy internationally. A data paper operationalises FAIR by attaching a structured, human- and machine-readable description to a dataset that would otherwise carry only minimal repository metadata.

Dataset citation is governed by the Joint Declaration of Data Citation Principles, published by FORCE11 in 2014, which holds that data merits the same importance, persistence and formal citation treatment as literature. Registration agencies such as DataCite assign the DOIs that make this mechanically possible; a data paper gives readers the narrative context a bare DOI record cannot.

Frequently asked questions

What is a data paper?

A data paper is a peer-reviewed journal article whose primary purpose is describing a dataset’s collection, structure and quality, rather than reporting findings. It gives dataset creators an indexed, independently citable scholarly output.

How to publish a data paper?

Deposit the dataset in a recognised repository, draft a manuscript describing its methodology, choose a journal such as Scientific Data, Data in Brief or GigaByte, then submit for peer review that assesses completeness rather than novel conclusions.

Do you have to pay to publish a data paper?

Most data journals are open access and charge an article processing charge, ranging from roughly USD 350 at GigaByte to around EUR 1,790 at Scientific Data. Some outlets, including several Pensoft and Copernicus titles, waive or reduce this fee.

Implications for institutions and funders

For research administrators, the data paper format offers a concrete way to evidence data-curation effort in tenure, promotion and grant-reporting processes, where a bare data availability statement provides none. Recording named contributions to data creation, curation and description alongside the CRediT contributor role taxonomy gives institutions a fuller, auditable account of who did the data work, distinct from who wrote up the findings.

Funders increasingly expect both: a data availability statement in the primary research article to satisfy transparency mandates, and — where a dataset has independent reuse value — a data paper to secure its long-term discoverability. Research administrators managing compliance across these overlapping requirements may find it useful to consult a dictionary of research administration terms when mapping funder policy language to practical author guidance.

Conclusion

A data paper and a data availability statement solve different problems: one creates a citable, peer-reviewed scholarly record of a dataset; the other simply discloses where supporting data for a specific study can be found. As funders tighten open-data expectations and repositories mature their DOI infrastructure, treating dataset description as a first-class, citable publication — not an afterthought bolted onto a results paper — will matter more, not less, for institutions seeking to demonstrate the full value of the research data they steward.

LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →