Explainer · Plain-language

What is a data repository?

A data repository is a managed online service that stores, preserves, and provides access to research data, giving each dataset a persistent identifier and metadata so it can be found, cited, and reused. Funders increasingly require data to be deposited in a recognised, trustworthy repository.

CASRAI plain-language explainers — clear answers to recurring research-administration questions

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

What a data repository does

A data repository accepts deposited datasets, stores them durably, and exposes them with descriptive, machine-readable metadata. It usually assigns a persistent identifier — most often a DOI via DataCite — so the data can be cited and linked. Good repositories also handle access control, versioning, licensing, and long-term preservation, making them the practical home for FAIR research data.

Domain-specific vs generalist

Domain repositories serve a particular community and enforce discipline-specific standards — for example sequence archives in genomics, or established archives in the social sciences and crystallography. Generalist repositories such as Zenodo, Dryad, and Figshare accept data of any kind and are useful when no suitable domain repository exists. Funder guidance commonly prefers a recognised domain repository where one is available.

Trusted repositories and CoreTrustSeal

Not all repositories are equally reliable. "Trustworthy" or "certified" repositories meet recognised standards for governance, preservation, and security. CoreTrustSeal is a widely used core-level certification for trustworthy data repositories; other frameworks include the nestor Seal and ISO 16363. Choosing a certified or recognised repository helps satisfy funder and journal data-availability requirements.

How repositories support FAIR data

Depositing in a good repository is the main way to make data FAIR. The repository provides the persistent identifier and metadata (Findable), a standard access protocol (Accessible), support for standard formats and vocabularies (Interoperable), and clear licensing and provenance (Reusable). This is why data management plans and funder policies centre on repository deposit.

Key facts

At a glance

Purpose: store, preserve, and share citable research data
Identifier: usually a DOI (via DataCite) per dataset
Types: domain-specific and generalist
Generalists: Zenodo, Dryad, Figshare, and others
Trust mark: CoreTrustSeal (and nestor Seal, ISO 16363)
Role: the practical mechanism for FAIR, open data

Common misconceptions

What people often get wrong

Often heard: Any cloud storage or personal website counts as a data repository.

Actually: No — a data repository provides curation, persistent identifiers, metadata, licensing, and long-term preservation. Generic file-sharing or a personal site offers none of these guarantees and does not make data FAIR or reliably citable.

Often heard: All repositories are equally trustworthy.

Actually: No — trustworthiness varies. Certification schemes such as CoreTrustSeal exist precisely to distinguish repositories that meet recognised standards for governance and preservation from those that do not.

Going deeper