Research Data Repository: Generalist vs Subject

Q: What is a data repository in research?

A data repository is a system or service where researchers deposit datasets to obtain a persistent identifier, structured metadata, and long-term hosting. It exists separately from a journal article so that data can be found, cited and reused independently of the publication it supports.

Q: What is an example of a data repository?

Zenodo and Figshare are widely used generalist examples; the UK Data Service's ReShare and the Protein Data Bank are widely used discipline-specific examples. Each assigns a DOI, retains version history, and exposes metadata for discovery by search engines and domain indexes.

Choose a discipline-specific repository whenever one exists for your data type, and fall back to a generalist repository such as Zenodo, Figshare or Dryad only when no subject-specific option is available. A research data repository is a system that assigns persistent identifiers, retains data over the long term, and exposes machine-readable metadata so datasets can be found, cited and reused. The right choice depends on discoverability within your field, what your funder actually mandates, and who is committed to curating the data after the grant ends.

What is a research data repository?
Generalist vs discipline-specific: what’s actually different?
Does your funder require a specific repository type?
Which option wins on long-term curation and sustainability?
How do you actually decide? A five-step framework
Answer-first Q&A
What this means for your data management plan

What is a research data repository?

A research data repository is a curated system for depositing, preserving and exposing datasets independently of the article they support. Unlike a general-purpose cloud drive, a qualifying repository issues a persistent identifier (typically a DOI), retains fixity and version history, and publishes structured metadata that search engines and indexing services can crawl.

Two broad categories exist. Generalist repositories — Zenodo, Figshare, Dryad, the Open Science Framework, Harvard Dataverse — accept any discipline and any file type. Discipline-specific repositories — the Protein Data Bank, OpenNeuro, ICPSR, the UK Data Service’s ReShare — are built around domain metadata schemas, controlled vocabularies and, often, expert curators who understand the data.

Generalist vs discipline-specific: what’s actually different?

The two repository types are not interchangeable, even though both can technically hold the same file. They differ in who finds the data, how deeply it is described, and how funders treat the deposit for compliance purposes.

Factor	Generalist repository	Discipline-specific repository
Discoverability	Indexed broadly; weaker within a subject community	High within the field via domain search portals and cross-references
Metadata depth	Generic (title, creator, subject, DOI)	Domain-specific schemas (e.g. genomic, crystallographic, survey metadata)
Curation	Largely automated; minimal review	Often expert-reviewed before publication
Funder acceptance	Accepted as a fallback by nearly all funders and journals	Frequently the stated first preference where one exists
Typical cost to depositor	Free (Zenodo, OSF) or freemium (Figshare)	Varies — free (ICPSR, OpenNeuro) to fee-charging (some subject archives)
Best for	Interdisciplinary, mixed-format, or “no domain home” datasets	Data types the community already expects to find in one place

The registries FAIRsharing and re3data.org, both supported by DataCite, list several thousand repositories across disciplines and are the standard starting point for checking whether a subject-specific option exists before defaulting to a generalist platform.

Does your funder require a specific repository type?

Funder and journal policy is usually the deciding factor, not personal preference. Most major funders now state an explicit hierarchy: use a recognised discipline repository first, and use a generalist repository — provided it is FAIR-aligned — only where none exists.

Funder / body	Repository requirement
Horizon Europe	Model Grant Agreement Article 17 requires deposit in a research data repository, following the principle “as open as possible, as closed as necessary”
UKRI	Open access policy (in force since 1 April 2022) requires data underpinning a publication to be findable, accessible, interoperable and reusable, with access details stated in a data access statement
NIH	Data Management and Sharing Policy, effective 25 January 2023, requires a data management plan and preference for an established public repository appropriate to the data type
ICMJE journals	Data sharing statement required for clinical trials that began enrolment on or after 1 January 2019

Where a policy is silent on repository type, DataCite’s Repository Finder tool cross-references FAIRsharing and re3data metadata to surface certified, FAIR-aligned repositories for a given data type — a step that is worth doing before defaulting to whichever repository a colleague used last time.

Which option wins on long-term curation and sustainability?

This is the trade-off least discussed in generic repository guidance, and it matters more than discoverability once a dataset is more than a few years old. Discipline-specific repositories often provide deeper curation at deposit time, but many depend on renewable grant funding, which creates a real risk of the archive itself losing support, freezing new deposits, or migrating without notice.

Generalist repositories carry a different risk profile. Zenodo is operated by CERN with backing from OpenAIRE and the European Commission; Figshare is commercially operated by Digital Science; the Open Science Framework is run by the non-profit Center for Open Science. None of these guarantees permanence, but their institutional backing is typically more diversified than a single-grant-funded domain archive.

Ask whether the discipline repository has a named institutional or consortium backer, not just a project grant.
Check whether the repository is a CoreTrustSeal-certified trustworthy digital repository — certification signals an audited preservation commitment.
If the domain archive’s funding horizon is unclear, consider a dual deposit: primary copy in the discipline repository for discoverability, mirrored DOI in a generalist repository as a preservation backstop.

How do you actually decide? A five-step framework

Use this sequence rather than defaulting to whichever repository is fastest to sign up for:

Check the funder mandate first. If your grant agreement or journal’s data sharing policy names a required or preferred repository type, that overrides personal choice.
Search FAIRsharing and re3data for a certified discipline-specific option matching your data type, format and jurisdiction.
Assess curation depth needed. Complex, reusable data (genomic sequences, clinical trial data, crystal structures) benefits from expert domain curation; simple supplementary files often do not need it.
Weigh sustainability. Prefer CoreTrustSeal-certified or institutionally-backed repositories over unaffiliated project archives, especially for data with a multi-decade reuse horizon.
Default to a generalist repository only when no suitable, FAIR-aligned discipline repository exists — and record the choice and rationale in your data management plan.

Answer-first Q&A

What is a data repository in research?

A data repository is a system or service where researchers deposit datasets to obtain a persistent identifier, structured metadata, and long-term hosting. It exists separately from a journal article so that data can be found, cited and reused independently of the publication it supports.

What is an example of a data repository?

Zenodo and Figshare are widely used generalist examples; the UK Data Service’s ReShare and the Protein Data Bank are widely used discipline-specific examples. Each assigns a DOI, retains version history, and exposes metadata for discovery by search engines and domain indexes.

What is a research repository?

“Research repository” is often used loosely to mean either a data repository (datasets) or an institutional repository (publications, theses). In a data management context, it specifically refers to a certified system for archiving and publishing the datasets underlying research outputs.

What this means for your data management plan

A data management plan should name the intended repository before data collection begins, not after submission. Reviewers at UKRI, NIH and Horizon Europe increasingly check whether the named repository matches the funder’s stated hierarchy — generalist repositories named without justification, when a recognised discipline archive exists, are a common cause of DMP revision requests.

The practical position for most research teams is not “generalist or discipline-specific” as a permanent allegiance, but a per-dataset decision applied consistently: check the mandate, search the registries, weigh curation against sustainability, and document the reasoning. That documented reasoning — more than the repository name itself — is what demonstrates genuine engagement with FAIR data principles to funders, reviewers and future re-users.