Open and FAIR data has to live somewhere, and the choice of where is not a clerical detail. A dataset deposited on a personal web page, a lab server, or a service that may not exist in five years is, for the purposes of long-term reuse, lost. The question of where research data should live is the question of trusted repositories, and the European answer to coordinating them is the EOSC. This article maps the landscape, drawing on the data-infrastructure domain.
What makes a repository trustworthy
Not every place that can store a file is fit to be the home of the scholarly record. A trusted digital repository is one assessed against a recognised trust framework, demonstrating that it has the organisational and technical capability to preserve and provide access to data over the long term. Trust here is not a vibe; it is a set of demonstrable properties — a sustainability plan, preservation procedures, persistent identifiers, clear access conditions, and the organisational continuity to outlast any individual project or grant.
The most widely recognised certification of these properties is CoreTrustSeal, a community-governed assessment that a repository meets the core requirements of trustworthy data stewardship. A CoreTrustSeal certification is a concrete signal a funder or researcher can rely on: it means an independent process has checked that the repository can actually do what “long-term preservation” implies. When a funder mandate says data must go to a trusted repository, CoreTrustSeal is the most common way that word is given operational meaning.
The repository taxonomy: generalist and domain
Trusted repositories come in two broad kinds, and choosing well between them is one of the most consequential data-management decisions a researcher makes.
- A generalist repository accepts data from any discipline. Zenodo, Figshare, and Dryad are the familiar examples: they mint a DOI, accept almost any data type, and provide a reliable, citable home when no specialist option exists. They are the right default for the long tail of research data that has no natural disciplinary home.
- A domain repository is discipline-specific, built around the data types, standards, and community of a particular field. GenBank for nucleotide sequence data is the archetype; there are equivalents across crystallography, astronomy, social science, proteomics, and more. A domain repository adds what a generalist cannot: discipline-specific metadata standards, validation, and a community of expert users who will actually find and reuse the data.
The practical rule that funders increasingly articulate is: deposit in the appropriate domain repository where one exists, and fall back to a trusted generalist repository where it does not. A sequence belongs in GenBank, not in a generic store; a one-off dataset with no community home belongs in a generalist repository with a DOI rather than on a server that will be decommissioned.
The EOSC: coordinating the federation
Individual trusted repositories are necessary but not sufficient. A researcher also needs to find the right one, move data and compute between services, and trust that the pieces interoperate. In Europe, the coordinating layer for this is the European Open Science Cloud (EOSC) — a federation of research-data services rather than a single monolithic platform.
The EOSC’s model is federation: an EOSC node is a service provider connected to the federation, and an EOSC service is something offered through its catalogue — a repository, a compute resource, a data-management tool. The aspiration is that a researcher can discover trusted repositories, deposit data, and compose data with compute across institutional and national boundaries, through a coordinated catalogue rather than a patchwork of disconnected services. The EOSC is, in effect, the European attempt to make “where should this data live?” answerable through one front door onto many trustworthy providers. It is not the only such effort — the African Open Science Platform pursues a comparable continental federation — but it is the most developed.
The human layer: stewards and custodians
Infrastructure does not curate itself, and an honest account of where data should live has to name the people. A data steward is the professional responsible for data quality, governance, and ongoing curation — the role that makes the difference between data that is merely deposited and data that is genuinely reusable. A data custodian holds legal or operational responsibility for the data. Around them sit the structured agreements that govern sharing: a data sharing agreement setting the conditions under which data move between parties, an embargo period deferring public access after deposit, and access controls distinguishing open, restricted, and metadata-only data.
A trusted repository with no data steward behind the data is a safe building with empty rooms. Preservation is an organisational commitment carried out by people, not a property that storage acquires on its own.
Why this connects to FAIR and to identifiers
Where data lives is what makes the FAIR principles operational. Findability depends on the repository minting a persistent identifier and exposing good metadata; accessibility depends on stable resolution and clear access conditions; interoperability and reusability depend on the standards a domain repository enforces. A trusted repository is, in practice, the machine that turns the FAIR aspiration into a deposited reality — which is why the choice of repository, and the trust signal of CoreTrustSeal, matters as much as the decision to share at all. The repository is also where the data’s persistent identifier enters the broader graph that links it to the project, the people, and the funding.
Where shared vocabulary fits
The terms in this domain are used loosely in funder mandates and policies — “trusted”, “appropriate”, “long-term” all mean different things to different bodies, and “generalist” versus “domain” is often left implicit. A shared, federated vocabulary that defines these precisely, pointing to CoreTrustSeal for the trust framework and to the EOSC for the federation model, is what lets a data-sharing requirement be stated unambiguously and checked. Supplying that definitional layer is the role the CASRAI dictionary is designed to play.
What to do now
For researchers: deposit in the appropriate domain repository where one exists, otherwise a CoreTrustSeal-certified generalist repository, and never a personal or project server for the long term. For institutions: invest in data stewards, not just storage. For funders and standards work: give “trusted repository” operational meaning through certification and shared vocabulary, and support the federations that make trustworthy services findable.
Leave a Reply