Skip to main content
v2026.1714 entries · CC-BY 4.0

Implementation checklistTrack B

Implementing the Research data infrastructure vocabulary

Repository directors, RDM service leads, and CRIS administrators integrating institutional repositories with trusted external repositories and federated infrastructure.

When to apply When choosing or certifying a data repository, when wiring deposit handoffs between an institutional CRIS and disciplinary repositories, or when surfacing EOSC- or FAIR-aligned trust signals on a public dataset record.

Before you start

Prerequisites

What needs to be in place before you operationalise Research data infrastructure terminology in your CRIS or repository.

  • An institutional repository (DSpace, EPrints, Dataverse, Invenio) or a CRIS with a repository-tools layer
  • A clear separation between institutional-of-record storage and disciplinary repositories (avoid duplicate-master records)
  • Familiarity with CoreTrustSeal, nestor, ISO 16363, or comparable trust frameworks
  • A data classification scheme (open, restricted, controlled, embargoed) that aligns with your access-control layer
  • Knowledge of which disciplinary repositories your researchers actually use (GenBank, PDB, GEO, ICPSR, UK Data Service, etc.)

Deployment

Five steps to deploy

Each step is small enough to land in a single sprint or a single sitting with the relevant CRIS administrator. Follow in order.

  1. Define your repository-of-record policy

    For every domain your researchers publish in, document which external repository is canonical (e.g. genomics → GenBank, social sciences → UK Data Service or ICPSR). The institutional repository carries the metadata record and a link, not duplicate primary files.

  2. Capture trust-framework status as structured metadata

    On each repository record, store coretrustseal_certified (boolean + date), nestor_seal status, ISO 16363 audit date, and an EOSC-EOSC_Future trust-tier value. These drive funder-mandate compliance reporting.

  3. Wire deposit handoffs

    For each canonical external repository, set up a deposit-or-link workflow: structured forms, SWORD or REST API push where supported (Dataverse, Zenodo, Figshare, DataCite), or a documented manual hand-off with a tracked link.

  4. Surface access-control state on every record

    Distinguish open, controlled, embargoed, and restricted at the record level. Controlled-access records need a documented access-request mechanism and an audit trail; institutional repositories often lack this and must integrate with a data-access-committee tool.

  5. Cross-walk with the federated layer

    Register the institutional repository with OpenAIRE, DataCite, and the relevant national aggregator (UK SHERPA, NL NARCIS, US Data.gov-adjacent). Confirm record schema validates upstream — most failures here are missing license or missing related-identifier fields.

Worked example

Sample workflow

A realistic walk-through of a single record passing through the Research data infrastructure pipeline once the checklist is in production.

A research data steward receives a deposit request for a 40-GB genomics dataset linked to a forthcoming paper. The dataset is large and domain-specific, so the canonical-repository rule sends the primary files to GenBank rather than the institutional repository. The depositor uploads to GenBank via the lab's usual flow; on accession-number return, the steward creates an institutional metadata record that carries the GenBank accession as a structured relatedIdentifier, the dataset license, the linked-publication DOI, and a CoreTrustSeal flag inherited from GenBank's certification. Because the data classification is "controlled" (patient sequence data), an access-control statement and the data-access-committee URL are attached. The institutional record mints a DataCite DOI, gets harvested into the national aggregator and OpenAIRE, and surfaces on the contributors' ORCID profiles — all without duplicating the 40 GB of primary files.

Integration points

CRIS and repository systems

Vendor-specific notes on where this vocabulary fits in real research-information systems. Names appear here only where there is public field evidence — they are not vendor partnerships.

Dataverse

Native CoreTrustSeal-friendly; supports DataCite DOI minting, structured access requests, and SWORD deposit from upstream CRIS. Strong fit for institutional-of-record metadata when primary files live elsewhere.

Invenio / Zenodo

CERN-developed; Zenodo is the canonical instance for general-purpose research outputs. Invenio-RDM is reusable for institutional installs and is increasingly adopted as a CoreTrustSeal-aligned platform.

DSpace 8.x

Dataset deposit works but treat DSpace as the metadata-of-record layer with primary files linked out to disciplinary repositories for anything above ~10 GB.

Pure + Pure Portal

CRIS-side capture; rely on Pure for project-to-dataset linkage and on a separate repository for the dataset bitstreams. The Pure REST API supports relatedIdentifier injection.

OpenAIRE Research Graph

Federation target rather than a repository — register your repository for harvesting, validate the OpenAIRE Guidelines 4.0 metadata profile against your output, and check the resulting graph for missing license / PID fields.

What goes wrong in the field

Common pitfalls

The patterns that show up repeatedly when this checklist is skipped or misapplied. Address these before they become entrenched.

  • Storing 40+ GB of primary files in an institutional repository when a discipline-canonical repository should hold them
  • Failing to capture CoreTrustSeal or equivalent trust-tier status as structured metadata, so funder-mandate reporting becomes manual
  • Conflating "open access" with "open data" — the dataset license and the publication license are independent
  • Forgetting to register controlled-access datasets with a data-access committee, leaving the access pathway as an email address
  • Skipping the upstream-validation step against OpenAIRE Guidelines and discovering the record was silently dropped from the federation

Frequently asked

Implementation FAQ

Who maintains this checklist?
The Research data infrastructure working group maintains the checklist alongside the dictionary terms in the same domain. It is reviewed each release cycle (March and September) and updated when a working-group consultation, a vendor product change, or a federation-partner schema update materially changes the operational guidance.
What if my CRIS or repository is not listed?
The integration points listed name the systems CASRAI has direct field experience with — Pure, Symplectic Elements, Worktribe, Converis, DSpace and DSpace-CRIS, EPrints, VIVO, Dataverse, Invenio-RDM. The CERIF mapping in the checklist is vendor-neutral and applies equally to other CRIS or repository products. If your system supports the underlying entities (Person, Project, Output, Funding, plus the domain-specific extensions), the steps transfer.
How do I validate my implementation?
Three validation surfaces. First, the deposit form should refuse a record missing required fields rather than warn and accept. Second, the resulting metadata should round-trip through the federation layer your institution uses (OpenAIRE Guidelines 4.0 for European federation, DataCite Commons for DOI-anchored discovery, Crossref for article-anchored discovery) without upstream errors. Third, walk a real-world record through the sample-workflow path on this page and confirm the structured fields capture what the prose describes.
Where do I report errors in the checklist?
Open a comment via the dictionary-feedback flow at /dictionary/contribute. Editorial corrections — wrong vendor module names, deprecated standards, broken integration paths — are queued into the next release cycle. Substantive disagreements on the operational guidance are routed to the working group for review and may motivate a checklist revision.
Is this checklist enough to certify my implementation?
No. The checklist gives you the operational baseline; certification against federation profiles (CoreTrustSeal, OpenAIRE-compliant, COAR-aligned) is a separate process with its own audit. Treat the checklist as the engineering scaffolding and the certification as the institutional sign-off that the scaffolding is being used.

Adopted by research universities worldwide

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • Massachusetts Institute of Technology logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo

View CASRAI adoption →