A data paper is a peer-reviewed journal article whose sole purpose is to describe a dataset — its collection methods, structure, quality controls and reuse potential — so the dataset itself becomes a citable, discoverable research output. This is fundamentally different from a data availability statement (DAS), which is only a short paragraph inside a conventional research article pointing to where supporting data can be found. Understanding the distinction matters for anyone trying to get formal academic credit for data curation work, rather than a passing mention buried in someone else’s paper.
A data paper is best defined this way: it is a searchable, citable metadata document, published as a standalone peer-reviewed article, whose primary content is the dataset’s provenance, structure and quality rather than a hypothesis or a set of conclusions.
- What is a data paper?
- How is a data paper different from a data availability statement?
- Which journals publish data papers?
- How do you publish a data paper?
- Why do data papers matter for FAIR data and citation?
- Frequently asked questions
- Implications for institutions and funders
- Conclusion
What is a data paper?
A data paper is a peer-reviewed document describing a dataset, published in a peer-reviewed journal rather than as an appendix to a conventional study. It concentrates on the “what, why and how” of the data itself — collection methodology, processing steps, structure and known limitations — rather than on testing a hypothesis.
The format is also known as a data article, data report, data brief or data note, but the function is consistent: it converts curation effort into an indexed, citable scholarly output that gives dataset creators formal academic credit.
How is a data paper different from a data availability statement?
A data availability statement is a short, mandatory paragraph within a conventional research article that tells readers where and how to access the data underpinning that paper’s findings. It exists to support transparency and reproducibility of one specific study — it is not a publication in its own right and it is not independently peer reviewed as a scholarly document.
A data paper, by contrast, is a full standalone publication. It undergoes its own peer review, receives its own DOI, and is indexed and cited independently of any related research article. The table below sets out the practical differences.
| Feature | Data paper | Data availability statement |
|---|---|---|
| Nature | Standalone, peer-reviewed journal article | A short section inside another article |
| Peer review | Independently peer reviewed as a scholarly work | Not separately reviewed |
| Citability | Has its own DOI and citation record | Not citable as a discrete work |
| Purpose | Describe and credit a dataset in depth | Point readers to where data for one study lives |
| Typical length | Several pages, structured like a journal article | One to three sentences |
Since 2018, the International Committee of Medical Journal Editors (ICMJE) has required a data sharing statement in reports of clinical trials, and many funders, including UKRI, expect a data access statement in any grant output. Neither requirement is a substitute for a data paper: a DAS satisfies a transparency mandate, while a data paper is the route to scholarly recognition and independent citation of the dataset itself.
Which journals publish data papers?
Dedicated data journals have grown substantially since the mid-2010s. According to the Global Biodiversity Information Facility (GBIF), which tracks outlets accepting data papers, article processing charges and impact metrics vary widely by publisher.
- Scientific Data (Nature Portfolio) — an open-access, online-only journal dedicated to descriptions of scientifically valuable datasets, with a 2024 Journal Impact Factor of 6.9 and an article processing charge of approximately EUR 1,790, per GBIF’s June 2026 tracked figures.
- Data in Brief (Elsevier) — a multidisciplinary, open-access journal publishing short data articles that describe and give context to datasets, with a 2024 Journal Impact Factor of 1.4 and an article processing charge of approximately USD 1,010.
- GigaByte (BGI and Oxford University Press) — a CC BY open-access journal for “big data” descriptions across the life, biomedical and environmental sciences, with a 2024 Journal Impact Factor of 1.2, a Scopus CiteScore of 3.2, and an article processing charge of approximately USD 350 — the lowest of the three.
Discipline-specific alternatives exist too: Earth System Science Data (Copernicus) carries a 2024 CiteScore of 20.6, and Biodiversity Data Journal (Pensoft) charges from around EUR 650. Choice of outlet should follow disciplinary norms, not price alone.
How do you publish a data paper?
Publishing a data paper follows a broadly consistent workflow across data journals:
- Deposit the dataset first. Upload the data to a recognised repository (for example Dryad, Zenodo or a domain-specific archive) so it receives a persistent identifier before the manuscript is submitted.
- Draft the manuscript around the metadata. Describe collection methods, instrumentation, processing pipelines, quality-control steps and known limitations — some tools, such as GBIF’s Integrated Publishing Toolkit, can auto-generate a manuscript draft directly from dataset metadata.
- Select a journal matched to the dataset’s discipline. Compare scope, licence terms, and article processing charge against outlets such as Scientific Data, Data in Brief or GigaByte.
- Submit for peer review. Reviewers assess the completeness and reusability of the description, not novel findings or conclusions.
- Publish and cross-link. On acceptance, the data paper’s DOI should be cross-referenced with the dataset’s own DOI in the repository record, so citation tools can connect the two.
Why do data papers matter for FAIR data and citation?
The FAIR Guiding Principles — Findable, Accessible, Interoperable, Reusable — were formalised by Wilkinson and colleagues in a 2016 Scientific Data paper and now underpin funder and repository policy internationally. A data paper operationalises FAIR by attaching a structured, human- and machine-readable description to a dataset that would otherwise carry only minimal repository metadata.
Dataset citation is governed by the Joint Declaration of Data Citation Principles, published by FORCE11 in 2014, which holds that data merits the same importance, persistence and formal citation treatment as literature. Registration agencies such as DataCite assign the DOIs that make this mechanically possible; a data paper gives readers the narrative context a bare DOI record cannot.
Frequently asked questions
What is a data paper?
A data paper is a peer-reviewed journal article whose primary purpose is describing a dataset’s collection, structure and quality, rather than reporting findings. It gives dataset creators an indexed, independently citable scholarly output.
How to publish a data paper?
Deposit the dataset in a recognised repository, draft a manuscript describing its methodology, choose a journal such as Scientific Data, Data in Brief or GigaByte, then submit for peer review that assesses completeness rather than novel conclusions.
Do you have to pay to publish a data paper?
Most data journals are open access and charge an article processing charge, ranging from roughly USD 350 at GigaByte to around EUR 1,790 at Scientific Data. Some outlets, including several Pensoft and Copernicus titles, waive or reduce this fee.
Implications for institutions and funders
For research administrators, the data paper format offers a concrete way to evidence data-curation effort in tenure, promotion and grant-reporting processes, where a bare data availability statement provides none. Recording named contributions to data creation, curation and description alongside the CRediT contributor role taxonomy gives institutions a fuller, auditable account of who did the data work, distinct from who wrote up the findings.
Funders increasingly expect both: a data availability statement in the primary research article to satisfy transparency mandates, and — where a dataset has independent reuse value — a data paper to secure its long-term discoverability. Research administrators managing compliance across these overlapping requirements may find it useful to consult a dictionary of research administration terms when mapping funder policy language to practical author guidance.
Conclusion
A data paper and a data availability statement solve different problems: one creates a citable, peer-reviewed scholarly record of a dataset; the other simply discloses where supporting data for a specific study can be found. As funders tighten open-data expectations and repositories mature their DOI infrastructure, treating dataset description as a first-class, citable publication — not an afterthought bolted onto a results paper — will matter more, not less, for institutions seeking to demonstrate the full value of the research data they steward.








