Dictionary domainTrack B
Research data infrastructure
Trusted repositories, EOSC, biobanks, data trusts, federated infrastructure.
For implementers
Operational deployment checklist for Research data infrastructure: prerequisites, five deploy steps, integration notes for Pure, Symplectic Elements, Worktribe, DSpace, and more, plus the pitfalls that recur in the field.
Terms in this domain
43 terms
Aggregator service
A service that harvests, harmonises, and re-exposes metadata and (sometimes) content from many upstream sources, providing a unified search, browse, or query interface across the aggregated corpus; canonical examples include OpenAIRE, BASE, CORE, and OpenAlex.
Data safe haven
A secure data-handling environment that allows controlled, audited access to sensitive datasets for approved research, applying technical, physical, and procedural safeguards; effectively a synonym for trusted research environment (TRE) in much current usage, though the term has older roots in NHS information governance.
Five Safes framework
A framework for the safe use of sensitive data in research, articulated by the UK Office for National Statistics, that organises controls under five dimensions: Safe People, Safe Projects, Safe Settings, Safe Data, and Safe Outputs.
Trusted research environment
A secure computing environment — typically delivered as a remote-access workspace with controlled inbound/outbound data flows — that allows accredited researchers to analyse sensitive data in situ without exporting the data, supporting privacy-preserving secondary research use.
Sensitive-data repository
A repository specifically designed to hold sensitive research data — typically personal data, health data, criminal-justice data, commercially-confidential data, or culturally-sensitive Indigenous data — with enhanced access controls, audit logging, contractual access conditions, and (often) a secure analysis environment.
Dataset landing page
The human-readable web page that a dataset's persistent identifier (typically a DataCite DOI) resolves to, presenting the dataset's title, creators, description, identifiers, dates, version history, related works, access conditions, and a link to download or request the data.
Joint Declaration of Data Citation Principles
The 2014 statement produced by Force11's Data Citation Synthesis Group, signed by a wide community of publishers, funders, repositories, and infrastructure providers, that articulates eight principles for the citation of research data in scholarly communication.
Data citation principle
Any of the eight principles articulated in the Joint Declaration of Data Citation Principles (Force11, 2014) covering importance, credit and attribution, evidence, unique identification, access, persistence, specificity and verifiability, and interoperability and flexibility of data citations in scholarly communication.
Data publication platform
A platform that supports the publication of research data as a citable artefact — assigning a persistent identifier, presenting a landing page, and applying review, curation, or peer-review processes — distinct from purely depositional storage.
Domain repository
Synonym for discipline-specific repository: a repository whose scope is a particular research domain (or domain-sub-area), with curation practices and metadata tailored to that domain.
Generalist repository
A repository that accepts research outputs from any discipline, applying domain-agnostic curation and discovery, and serving as a deposit destination for outputs that have no natural discipline-specific home or whose authors prefer a single multidisciplinary venue.
Discipline-specific repository
A repository whose scope is bounded to a particular research discipline or sub-discipline, with curation practices, metadata schemas, and community standards tailored to that domain's data types, terminologies, and norms.
FAIRsharing (concept)
A curated, community-driven registry of databases, standards (metadata, identifiers, formats, terminologies), and data policies relevant to research data, maintained at the University of Oxford with linkage to funders, journals, and standards organisations.
Re3data (concept)
Registry of Research Data Repositories: a global registry, operated by DataCite and partner institutions, that lists research data repositories worldwide with descriptive metadata about their disciplines, content types, access conditions, and policies, helping researchers locate suitable repositories for deposit and discovery.
UK Data Service (concept)
A UK ESRC-funded data infrastructure that holds, curates, and provides access to social, economic, and population data resources for research, learning, and policy, comprising the UK Data Archive at the University of Essex and partner institutions.
ICPSR (concept)
Inter-university Consortium for Political and Social Research: a consortium-membership-funded data archive based at the University of Michigan that holds and curates over 10,000 social-science research datasets, providing access to member institutions worldwide.
Harvard Dataverse (concept)
A free research-data repository operated by Harvard University on the open-source Dataverse software platform, accepting datasets from researchers worldwide, minting DataCite DOIs, and serving as the flagship instance of the global Dataverse network.
Dryad (concept)
A non-profit generalist research data repository operated by Dryad Data Inc. (in partnership with the California Digital Library) that publishes peer-reviewed-paper-linked datasets, mints DataCite DOIs, and applies curation review before publication.
Figshare (concept)
A commercial generalist research repository operated by Digital Science that accepts datasets, figures, presentations, papers, software, and other research artefacts, minting DataCite DOIs and offering institutional-branded instances ('Figshare for Institutions') alongside the public service.
Zenodo (concept)
A free generalist research repository operated by CERN and developed under OpenAIRE that accepts deposits of datasets, software, publications, presentations, posters, and other research artefacts, minting DataCite DOIs and providing free preservation up to a per-record size limit.
GitHub mirror
A copy of a Git repository (or set of repositories) hosted on GitHub that tracks an upstream source repository elsewhere, typically maintained for redundancy, visibility, or community-engagement reasons rather than as the canonical primary copy.
Software Heritage archive
A non-profit international initiative based at Inria that systematically crawls, archives, and preserves the world's publicly available source code, including its full version-control history, and issues persistent identifiers (Software Hash Identifiers, SWHIDs) to every archived artefact.
Code repository
A version-controlled storage location for source code, typically operated on top of a distributed version-control system such as Git, exposing the code's full revision history, branches, tags, and (often) collaboration features such as issues, pull requests, and code review.
Tissue bank
A specific kind of biobank focused on the collection, processing, storage, and distribution of human tissue samples (typically solid tissue specimens from surgical or post-mortem sources), governed under tissue-banking regulation in the relevant jurisdiction.
Sample repository
A repository for physical research samples — geological, environmental, biological, or material — that catalogues, stores, and provides access to samples for downstream analysis, often issuing persistent identifiers (IGSN, DataCite DOI) for citation and provenance tracking.
Biorepository
A facility or organisation that collects, processes, stores, and distributes biological materials and their associated data for research, encompassing both human and non-human samples, distinguished from a 'biobank' by usage in some communities to denote broader scope or specific research projects.
Biobank
An organised collection of biological samples (typically human samples such as blood, tissue, DNA, urine) together with their associated clinical, demographic, and lifestyle data, governed for use in biomedical research.
National data infrastructure
A coordinated, nationally-scoped programme and set of services for the storage, sharing, and reuse of research data within a country, typically combining funding policy, technical infrastructure (repositories, compute, federation), training, and governance.
Data hub
A central node in a data ecosystem that aggregates, harmonises, and brokers access to data from multiple upstream sources, exposing the harmonised data to downstream consumers via curated APIs, query interfaces, or download endpoints.
Federated data infrastructure
A data infrastructure in which data, services, and access controls remain distributed across multiple independent nodes (typically operated by different organisations) but are made discoverable, queryable, and usable as a unified resource through shared protocols, vocabularies, and identity-federation.
Data warehouse
A central repository of structured data, integrated from multiple operational sources, modelled for analytical querying (typically with a star or snowflake schema), and optimised for read-heavy workloads supporting reporting and decision-making.
Data lake
A storage repository that holds large volumes of structured, semi-structured, and unstructured data in their native formats, deferring schema-on-write requirements so that data can be ingested cheaply and only structured at the time of read or analysis.
Data commons
A shared data resource — often combined with shared computing and analysis tools — governed by a community under defined access and contribution rules, designed to enable many users to use and add to the resource for collective benefit.
Data trust
A legal and organisational structure in which a fiduciary intermediary holds, governs, and brokers access to a body of data on behalf of its contributors and beneficiaries, applying agreed terms of access, use, and accountability.
World Data System certification
Historic certification programme of ICSU's World Data System (WDS) under which scientific data centres in geosciences and related fields were certified as trustworthy; merged with the Data Seal of Approval in 2017 to form CoreTrustSeal.
CoreTrustSeal
A community-based, non-profit certification scheme for trustworthy data repositories, operated by the CoreTrustSeal Foundation, awarded against 16 published requirements covering organisational infrastructure, digital object management, and technical infrastructure.
Trusted digital repository
A digital repository whose mission, governance, technical infrastructure, and procedures have been independently assessed against a recognised standard (e.g. CoreTrustSeal, nestor seal, ISO 16363) and judged trustworthy to preserve digital content over the long term.
Subject repository
A repository the contents of which are connected purely by their discipline, rather than by other factors such as their institutional affiliation (see Institutional Repository)
Researcher webpage
A webpage featuring a researcher's profile, which possibly may also provide links to their publications.
Repository
Repositories preserve, manage, and provide access to many types of digital materials in a variety of formats.
Open archive
A repository that is compliant with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and therefore facilitates the sharing of metadata for a variety of purposes, most notably the compilation tasks performed by aggregator databases.
Institutional webpage
A webpage that is associated with the institution at which the author is employed.
Institutional repository
An online, digital collection of research outputs (see Repository) that are connected by their affiliation with a specific institution. Institutional repositories are most commonly associated with universities and other academic organisations, and so the contents of a single institutional repository may therefore cover a range of disciplines. An institutional repository may often be managed as part of a wider suite of services supporting scholarly communication, Open Access and Open Education.








