What is a research data management glossary?

A research data management (RDM) glossary is a curated, defined set of the terms used to describe how research data is planned for, collected, documented, stored, shared, preserved, and cited across the research lifecycle. The CASRAI RDM Glossary draws nearly 200 of these terms from the CASRAI Dictionary, covering data infrastructure, machine-actionable DMPs, persistent identifiers, research-information systems, and reproducibility.

Who maintains the CASRAI RDM Glossary?

The glossary is maintained by CASRAI as part of the broader CASRAI Dictionary, stewarded by community working groups that draft, review, and ratify entries on a rolling versioned release cadence. The RDM terms are stewarded primarily by the data-infrastructure, machine-actionable DMP, persistent-identifier, research-information-systems, and reproducibility working groups.

Is the RDM Glossary free to use under CC-BY?

Yes. Every entry in the CASRAI RDM Glossary is published under the Creative Commons Attribution 4.0 International licence (CC-BY 4.0), with no paywall and no registration. You may reuse, redistribute, translate, and bundle the definitions commercially or non-commercially, provided you give appropriate credit to CASRAI.

How do I cite a term from the RDM Glossary?

Each term has a stable URI at https://casrai.org/dictionary/term/ together with the dictionary release version. The term page emits a citation widget producing APA, BibTeX, RIS, and Chicago forms. To cite the dictionary release as a whole, use the Zenodo DOI listed on the cite page.

How does this glossary relate to FAIR and DMPs?

The FAIR principles (Findable, Accessible, Interoperable, Reusable) and data management plans (DMPs) are the operational backbone of research data management, so they are central to this glossary. FAIR-related terms (data citation, persistent identifiers, repositories) and DMP terms (living DMP, machine-actionable DMP, RDA DMP Common Standard) each form a dedicated sub-theme below.

RDM glossary · CC-BY 4.0

Research Data Management Glossary

A research data management glossary is a curated set of defined terms for how research data is planned, stored, shared, preserved, and cited. This RDM glossary is the CASRAI-curated set: 197 research data management terms with stable URIs, grouped by sub-theme and linked to their full definitions in the CASRAI Dictionary.

Jump to the glossary →Full A–Z index Download dataset

What is in this glossary

RDM terms: 197
Sub-themes: 6
Source domains: 5
Licence: CC-BY

· Part of the CASRAI Dictionary (714 terms, CC-BY 4.0)
· Stable URI per entry
· Schema.org DefinedTermSet markup
· Aligned with FAIR, RDA, and DataCite vocabularies

About this glossary

What research data management terms cover

Research data management (RDM) is the set of practices for handling the data produced by research throughout its lifecycle — from the planning captured in a data management plan, through collection and documentation, to storage, sharing, preservation, and citation. A shared vocabulary matters here more than almost anywhere else in research administration: funders mandate DMPs, repositories enforce metadata schemas, and persistent identifiers stitch outputs back to the people and projects that produced them. When everyone references the same definition for a trusted research environment, a machine-actionable DMP, or a DOI, systems exchange records without manual mapping and reporting becomes auditable across institutions.

This RDM glossary curates 197 terms from five thematic domains of the CASRAI Dictionary — research data infrastructure, machine-actionable data management plans, the persistent identifier ecosystem, research-information systems, and reproducibility and computational research. Each entry below links to its full definition, picklists, related terms, and structured data. The glossary aligns with the FAIR principles, the RDA DMP Common Standard, and the DataCite and CrossRef metadata schemas, and is published under CC-BY 4.0 for unrestricted reuse — including by controlled-vocabulary services such as the UN FAO AGROVOC thesaurus that reference CASRAI definitions.

The discipline this glossary models — one operational definition per term, a stable URI, primary-source verification, and Schema.org DefinedTermSet markup — travels well beyond research administration. Organisations that publish reviewed reference content increasingly pair the same structure with named, role-attributed authorship to make their expertise machine-verifiable. A worked example from healthcare commerce: LAC Healthcare Solutions’ medical-equipment standards glossary documents FDA, ISO, IEC, and ASTM procurement requirements as a DefinedTermSet written from each issuing authority’s primary text, and declares its editorial contributor roles using the CRediT taxonomy’s canonical role URIs, tying the named author to a single cross-site entity via sameAs. For a standards body, that is the adoption pattern working exactly as intended: the controlled-vocabulary and attribution models leaving academia and structuring commercial reference content the same way they structure a CRIS record or a data-management plan.

Part of the CASRAI Dictionary (714 terms, CC-BY 4.0). For institution-level adoption guidance see /for-institutions; for reproducibility standards see /standards/reproducibility.

Six sub-themes

How the glossary is organised

Data management and governance16 Metadata and exchange standards20 Repositories and infrastructure72 Data management plans (DMPs)24 Persistent identifiers40 FAIR, reproducibility, and sharing25

Data management and governance

Core concepts for stewarding research data across its lifecycle, from secure handling environments to governance constructs.

16 terms

Aggregator service: A service that harvests, harmonises, and re-exposes metadata and (sometimes) content from many upstream sources, providing a unified search, browse, or query interface across the aggregated corpus; canonical examples include OpenAIRE, BASE, CORE, and OpenAlex.
Data commons: A shared data resource — often combined with shared computing and analysis tools — governed by a community under defined access and contribution rules, designed to enable many users to use and add to the resource for collective benefit.
Data hub: A central node in a data ecosystem that aggregates, harmonises, and brokers access to data from multiple upstream sources, exposing the harmonised data to downstream consumers via curated APIs, query interfaces, or download endpoints.
Data lake: A storage repository that holds large volumes of structured, semi-structured, and unstructured data in their native formats, deferring schema-on-write requirements so that data can be ingested cheaply and only structured at the time of read or analysis.
Data safe haven: A secure data-handling environment that allows controlled, audited access to sensitive datasets for approved research, applying technical, physical, and procedural safeguards; effectively a synonym for trusted research environment (TRE) in much current usage, though the term has older roots in NHS information governance.
Data steward role (in DMP): A named individual or role accountable for the day-to-day execution of a DMP's commitments, typically distinct from the principal investigator and reported in the DMP's contributor metadata.
Data trust: A legal and organisational structure in which a fiduciary intermediary holds, governs, and brokers access to a body of data on behalf of its contributors and beneficiaries, applying agreed terms of access, use, and accountability.
Data warehouse: A central repository of structured data, integrated from multiple operational sources, modelled for analytical querying (typically with a star or snowflake schema), and optimised for read-heavy workloads supporting reporting and decision-making.
Federated data infrastructure: A data infrastructure in which data, services, and access controls remain distributed across multiple independent nodes (typically operated by different organisations) but are made discoverable, queryable, and usable as a unified resource through shared protocols, vocabularies, and identity-federation.
Five Safes framework: A framework for the safe use of sensitive data in research, articulated by the UK Office for National Statistics, that organises controls under five dimensions: Safe People, Safe Projects, Safe Settings, Safe Data, and Safe Outputs.
National data infrastructure: A coordinated, nationally-scoped programme and set of services for the storage, sharing, and reuse of research data within a country, typically combining funding policy, technical infrastructure (repositories, compute, federation), training, and governance.
Preservation commitment (in DMP): A statement in a DMP specifying which datasets will be preserved beyond project end, in which repository, for how long, and under what conditions.
Retention period (in DMP): The minimum duration for which a specified dataset will be retained after project end, expressed as a calendar period and recorded as part of the DMP's preservation commitments.
Sensitive-data handling (in DMP): The section of a DMP that documents the categories of sensitive data involved (personal, special category, commercially confidential, indigenous, security-restricted), the legal basis for processing, and the technical and organisational measures applied.
Sensitive-data repository: A repository specifically designed to hold sensitive research data — typically personal data, health data, criminal-justice data, commercially-confidential data, or culturally-sensitive Indigenous data — with enhanced access controls, audit logging, contractual access conditions, and (often) a secure analysis environment.
Trusted research environment: A secure computing environment — typically delivered as a remote-access workspace with controlled inbound/outbound data flows — that allows accredited researchers to analyse sensitive data in situ without exporting the data, supporting privacy-preserving secondary research use.

Metadata and exchange standards

The schemas, protocols, and serialisations that let research-information systems exchange records without manual mapping.

20 terms

CERIF: Common European Research Information Format: an EU-recommended data model and exchange schema for research information, developed and maintained by euroCRIS, that defines core entities (Person, Project, Publication, OrgUnit, Funding, Equipment) and the relationships among them with explicit role-and-time semantics.
Crossref deposit XML: The XML schema family maintained by Crossref that publishers and content-registration members use to submit metadata when minting Crossref DOIs, with distinct sub-schemas for journals, books, conference proceedings, preprints, peer reviews, grants, and standards.
DataCite metadata schema: The XML/JSON schema (current version 4.x) maintained by the DataCite Metadata Working Group that defines the mandatory, recommended, and optional metadata properties to be supplied when registering a DOI through DataCite.
DMP machine-readable expression: The structured serialisation of a DMP's content in a format (JSON, JSON-LD, XML) that conforming software can parse, validate, and act upon without human re-interpretation.
DMP RDA-JSON-LD: A JSON-LD context and ontology rendering of the RDA DMP Common Standard, enabling DMP graphs to be expressed as linked data and merged with other research-information graphs.
Identifier crosswalk: A mapping or correspondence table between identifiers in different schemes that refer to the same real-world entity, allowing a system holding one scheme's identifier to find the equivalent identifier in another scheme.
Identifier scheme: A named, formally-defined system for constructing, issuing, and resolving identifiers, typically defined by a syntax, an authority structure (who can mint), a metadata schema, and a resolution policy.
Identifier syntax: The formal grammar that constrains the textual form of identifiers in a given scheme — what character sets, lengths, prefixes, separators, and check digits are allowed — typically expressed as a regular expression or BNF grammar in the scheme's specification.
JATS XML: Journal Article Tag Suite (ANSI/NISO Z39.96): an XML vocabulary for describing the textual content and metadata of scholarly journal articles, comprising tag sets for archival (Green), publishing (Blue), and authoring (Pumpkin) use cases.
Linked Data Fragments: A family of Web interfaces for publishing RDF data — ranging from data dumps through SPARQL endpoints to lightweight Triple Pattern Fragments — designed to allow clients to query large RDF datasets without overloading the server.
METS: Metadata Encoding and Transmission Standard: a Library of Congress-maintained XML schema for encoding descriptive, administrative, and structural metadata for digital library objects, providing a wrapper around content files and metadata sections.
MODS: Metadata Object Description Schema: a Library of Congress XML schema for descriptive bibliographic metadata, designed as a richer alternative to Dublin Core and a more flexible alternative to MARC for library, archive, and repository contexts.
OAI-ORE: Open Archives Initiative Object Reuse and Exchange: a set of specifications, complementary to OAI-PMH, that define how to describe and exchange aggregations of Web resources (a 'compound digital object') using standard Web Architecture and Linked Data principles.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting: a low-barrier HTTP/XML protocol that allows a 'data provider' system (typically a repository or CRIS) to expose its metadata records for incremental harvesting by 'service providers' (aggregators, search services, national portals).
PID metadata: The descriptive metadata registered with a PID provider at the time of identifier minting, and updated thereafter, that describes the identified entity — its title, creators, dates, types, related identifiers, and so on — and that is exposed by the provider's APIs alongside the identifier itself.
RDA DMP Common Standard: A community-developed application profile, maintained by the Research Data Alliance DMP Common Standards Working Group, defining the core entities, properties, and JSON schema for expressing machine-actionable DMPs.
RDF triple store: A database management system specialised for storing and querying RDF (Resource Description Framework) statements — subject-predicate-object triples (or named-graph quads) — typically with a SPARQL query engine on top.
ResourceSync: ANSI/NISO Z39.99: a framework for synchronising resources between a source server and one or more destination servers using Web-based sitemaps with extensions for change lists, incremental updates, and push notifications.
RIOXX: Research Outputs Information Exchange XML (RIOXX): a UK-originated application profile for institutional-repository metadata, layered on Dublin Core, designed to capture funder, project, and licence information for OA-compliance reporting, currently maintained at v3.0.
SPARQL endpoint: An HTTP endpoint that accepts SPARQL queries (the W3C-standard query language for RDF) against an RDF dataset and returns results in standard formats (XML, JSON, CSV, TSV), implementing the SPARQL 1.1 Protocol.

Repositories and infrastructure

Where research data, code, and samples are deposited, preserved, and certified for long-term trust.

72 terms

Affiliation (CRIS): In CRIS terminology, a time-bounded relationship between a Person record and an Organization Unit, capturing the role (employee, visitor, honorary, student), start and end dates, employment fraction, and other contextual attributes of the association.
ARRIVE 2.0: The 2020 revision of the Animal Research: Reporting of In Vivo Experiments guidelines, a checklist of items that should be reported in any publication describing animal research, in order to enable assessment and replication.
Authentication of key resources: The verification, by methods appropriate to the resource type, of the identity and integrity of biological and chemical materials used in research, including cell lines, antibodies, animal models, and specialty chemicals.
Biobank: An organised collection of biological samples (typically human samples such as blood, tissue, DNA, urine) together with their associated clinical, demographic, and lifestyle data, governed for use in biomedical research.
Biorepository: A facility or organisation that collects, processes, stores, and distributes biological materials and their associated data for research, encompassing both human and non-human samples, distinguished from a 'biobank' by usage in some communities to denote broader scope or specific research projects.
Code repository: A version-controlled storage location for source code, typically operated on top of a distributed version-control system such as Git, exposing the code's full revision history, branches, tags, and (often) collaboration features such as issues, pull requests, and code review.
Code review (research software): A structured review of research software by one or more peers, focused on correctness, clarity, documentation, testing, and fitness for purpose, conducted before publication or as part of community-curated software repositories.
CONSORT 2010: The 2010 edition of the Consolidated Standards of Reporting Trials, a 25-item checklist and participant-flow diagram covering items that should be reported in any randomised controlled trial publication.
CoreTrustSeal: A community-based, non-profit certification scheme for trustworthy data repositories, operated by the CoreTrustSeal Foundation, awarded against 16 published requirements covering organisational infrastructure, digital object management, and technical infrastructure.
CRIS: Current Research Information System: an enterprise-class software system that aggregates, stores, and publishes information about a research organisation's activities — its researchers, publications, projects, funding, equipment, collaborations, and outputs — and exposes that information to internal management and external reporting consumers.
CRIS interoperability: The capacity of CRIS systems to exchange data with each other and with adjacent systems (repositories, funders, publishers, aggregators) through shared data models, schemas, protocols, and persistent identifiers — most prominently CERIF, OAI-PMH, OpenAIRE Guidelines, and PID-based joins.
Crowdsourced replication: A coordinated effort in which many independent laboratories or teams attempt to replicate the same set of studies under pre-specified protocols, in order to estimate field-wide replicability.
Data publication platform: A platform that supports the publication of research data as a citable artefact — assigning a persistent identifier, presenting a landing page, and applying review, curation, or peer-review processes — distinct from purely depositional storage.
Dataset landing page: The human-readable web page that a dataset's persistent identifier (typically a DataCite DOI) resolves to, presenting the dataset's title, creators, description, identifiers, dates, version history, related works, access conditions, and a link to download or request the data.
Discipline-specific repository: A repository whose scope is bounded to a particular research discipline or sub-discipline, with curation practices, metadata schemas, and community standards tailored to that domain's data types, terminologies, and norms.
Domain repository: Synonym for discipline-specific repository: a repository whose scope is a particular research domain (or domain-sub-area), with curation practices and metadata tailored to that domain.
Dryad (concept): A non-profit generalist research data repository operated by Dryad Data Inc. (in partnership with the California Digital Library) that publishes peer-reviewed-paper-linked datasets, mints DataCite DOIs, and applies curation review before publication.
DSpace-CRIS: An open-source extension of the DSpace repository platform, developed by 4Science and the DSpace community, that adds CRIS-style entity management for researchers, projects, organisational units, journals, and other research-information entities alongside the existing repository content.
Elements (Symplectic): A commercial CRIS product developed by Symplectic (part of Digital Science) that automates the discovery, capture, and management of research outputs and activities for individual researchers and institutional reporting workflows, with strong emphasis on researcher-facing workflows.
EOSC: European Open Science Cloud: an EU-led initiative and emerging federation of research data infrastructures intended to provide European researchers with seamless, cross-border, cross-discipline access to data, services, and computational resources under FAIR and open-science principles.
EOSC Federation: The architectural model adopted by EOSC for federating heterogeneous national, thematic, and pan-European research-data infrastructures into a single user-facing layer, with shared identity and access management, monitoring, accounting, helpdesk, and service onboarding.
EOSC Marketplace: The catalogue of FAIR research services accessible to European researchers through the European Open Science Cloud, where service providers register their offerings and researchers can discover, order, and (where applicable) access services with EOSC-federated authentication.
FAIRsharing (concept): A curated, community-driven registry of databases, standards (metadata, identifiers, formats, terminologies), and data policies relevant to research data, maintained at the University of Oxford with linkage to funders, journals, and standards organisations.
Figshare (concept): A commercial generalist research repository operated by Digital Science that accepts datasets, figures, presentations, papers, software, and other research artefacts, minting DataCite DOIs and offering institutional-branded instances ('Figshare for Institutions') alongside the public service.
Forking paths: The phenomenon by which the cumulative effect of many small, data-contingent analytical choices inflates false-positive rates even when each individual choice appears defensible.
Funding entity: In CRIS terminology, an entity representing a specific award or funding instance — its funder, award number, amount, currency, start and end dates, and the project and people it supports — distinct from the abstract Funder organisation and from the project itself.
Garden of forking paths: The Gelman-Loken metaphor for the implicit, data-dependent multiplicity of analytical choices made in the course of an empirical study, even by analysts not engaged in explicit p-hacking.
Generalisability: The extent to which a study's findings extend to populations, settings, or conditions other than those directly sampled.
Generalist repository: A repository that accepts research outputs from any discipline, applying domain-agnostic curation and discovery, and serving as a deposit destination for outputs that have no natural discipline-specific home or whose authors prefer a single multidisciplinary venue.
GitHub mirror: A copy of a Git repository (or set of repositories) hosted on GitHub that tracks an upstream source repository elsewhere, typically maintained for redundancy, visibility, or community-engagement reasons rather than as the canonical primary copy.
HARKing (Hypothesising After Results are Known): Presenting a post-hoc hypothesis, formulated after data analysis, as if it had been the a priori hypothesis under test.
Harvard Dataverse (concept): A free research-data repository operated by Harvard University on the open-source Dataverse software platform, accepting datasets from researchers worldwide, minting DataCite DOIs, and serving as the flagship instance of the global Dataverse network.
ICPSR (concept): Inter-university Consortium for Political and Social Research: a consortium-membership-funded data archive based at the University of Michigan that holds and curates over 10,000 social-science research datasets, providing access to member institutions worldwide.
Institutional repository: An online, digital collection of research outputs (see Repository) that are connected by their affiliation with a specific institution. Institutional repositories are most commonly associated with universities and other academic organisations, and so the contents of a single institutional repository may therefore cover a range of disciplines. An institutional repository may often be managed as part of a wider suite of services supporting scholarly communication, Open Access and Open Education.
Institutional webpage: A webpage that is associated with the institution at which the author is employed.
Many-analysts study: A study design in which a single dataset and research question are given to multiple independent analysts or teams who proceed without coordination, and the distribution of their conclusions is then compared.
Multiverse analysis: An analytical approach in which all reasonable combinations of data-processing and modelling choices are executed, producing a distribution of results that displays the impact of researcher degrees of freedom on the conclusion.
Nextflow (concept): A workflow orchestration system based on dataflow programming with a Groovy-based domain-specific language, designed for scalable, container-native, multi-platform execution of computational pipelines.
NIH Rigor and Reproducibility policy: The set of US National Institutes of Health policies, effective from 2016, requiring applicants and grantees to address scientific premise, scientific rigour, biological variables (including sex as a biological variable), and authentication of key biological and chemical resources in grant applications.
Open archive: A repository that is compliant with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and therefore facilitates the sharing of metadata for a variety of purposes, most notably the compilation tasks performed by aggregator databases.
OpenAIRE EXPLORE: The end-user-facing discovery service of OpenAIRE that allows researchers, funders, and the public to search and browse the OpenAIRE Graph by publications, datasets, software, projects, organisations, and funders.
OpenAIRE Graph: An open scholarly knowledge graph maintained by OpenAIRE that aggregates and links publications, datasets, software, projects, organisations, and people harvested from thousands of repositories, journals, and CRIS systems across Europe and globally, with deduplication, enrichment, and link inference applied.
OpenAIRE Nexus: An OpenAIRE-led Horizon-Europe-funded initiative (2021-2024) that bundles OpenAIRE services and tools into a coherent service portfolio for delivery through the European Open Science Cloud (EOSC), targeting researchers, content providers, research communities, funders, and policy makers.
Organization unit: In CRIS terminology, an entity representing a structural component of a research organisation — a faculty, school, department, institute, centre, lab, or research group — with its own identifier, name, parent and child relationships, type, start and end dates, and links to People, Projects, and Outputs.
P-hacking: The practice of selectively reporting or adjusting analytical choices in order to obtain a statistically significant p-value, typically below the conventional 0.05 threshold.
Person record (CRIS): In CRIS terminology, the entity representing an individual person involved in research — researcher, RA, PhD candidate, or other contributor — with metadata including names (preferred and historical), identifiers (ORCID iD, ISNI, local HR ID), employments, qualifications, and links to outputs, projects, and organisational units.
PRISMA 2020: The 2020 update of the Preferred Reporting Items for Systematic reviews and Meta-Analyses, a 27-item checklist with accompanying flow diagram for reporting systematic reviews.
Project (CRIS): In CRIS terminology, an entity representing a discrete research project with a defined scope, time period, participating people and organisations, funding source(s), and intended or actual outputs — typically the central organising entity for activity-level reporting.
Pure: A commercial CRIS product, originally developed by Atira A/S in Denmark and now owned by Elsevier, used by universities and research organisations to manage publications, projects, people, organisational units, awards, equipment, and external engagement, and to expose this information through a configurable public 'research portal'.
Re3data (concept): Registry of Research Data Repositories: a global registry, operated by DataCite and partner institutions, that lists research data repositories worldwide with descriptive metadata about their disciplines, content types, access conditions, and policies, helping researchers locate suitable repositories for deposit and discovery.
Repository: Repositories preserve, manage, and provide access to many types of digital materials in a variety of formats.
Research activity (CRIS): In CRIS terminology, an entity representing a research undertaking — typically a project, programme, or organised research effort — with its own start and end dates, participants, funding sources, outputs, and host organisation, around which CRIS data accretes over the activity's lifetime.
Research entity (CRIS): A first-class object in a CRIS data model — typically Person, Project, Publication, Organisation Unit, Funding, Equipment, or Activity — that has its own identifier, metadata schema, and relationships to other entities, and that can be managed and reported on independently.
Research output (CRIS): In CRIS terminology, an entity representing a discrete product of research — a publication, dataset, software release, patent, performance, exhibition, or other recognised output — recorded with its own identifier, type, date, contributors, and relationships to people, organisations, and projects.
Researcher degrees of freedom: The decisions an analyst makes during a study (inclusion criteria, outcome definition, model specification, covariate set) any of which, if made differently, would yield a different result.
Researcher webpage: A webpage featuring a researcher's profile, which possibly may also provide links to their publications.
RIM: Research Information Management: the organisational practice and the supporting systems and processes by which a university or research organisation collects, manages, and uses information about its research activities, encompassing both the technical CRIS layer and the people and policies around it.
Robustness: The stability of a study's conclusions under reasonable variations in analytical choices, model specification, sample inclusion, or measurement, on the same data.
Robustness check: An additional analysis, supplementary to the headline result, that varies one or more analytical choices in order to demonstrate that the main conclusion is not artefactual to those choices.
Sample repository: A repository for physical research samples — geological, environmental, biological, or material — that catalogues, stores, and provides access to samples for downstream analysis, often issuing persistent identifiers (IGSN, DataCite DOI) for citation and provenance tracking.
Scientific rigour: The strict application of the scientific method to ensure unbiased and well-controlled experimental design, methodology, analysis, interpretation, and reporting of results.
Snakemake (concept): A Python-based workflow management system that expresses computational pipelines as rules with explicit inputs, outputs, and shell or script bodies, and infers a directed acyclic graph (DAG) of jobs from those rules.
Software Heritage archive: A non-profit international initiative based at Inria that systematically crawls, archives, and preserves the world's publicly available source code, including its full version-control history, and issues persistent identifiers (Software Hash Identifiers, SWHIDs) to every archived artefact.
Specification curve: An analytic and visual technique that plots the estimated effect across a large set of theoretically defensible model specifications, ordered by effect size, to convey the sensitivity of the result to analytical choices.
STROBE: The Strengthening the Reporting of Observational Studies in Epidemiology guidelines, a 22-item checklist covering items that should be reported in cohort, case-control, and cross-sectional studies.
Subject repository: A repository the contents of which are connected purely by their discipline, rather than by other factors such as their institutional affiliation (see Institutional Repository)
Tissue bank: A specific kind of biobank focused on the collection, processing, storage, and distribution of human tissue samples (typically solid tissue specimens from surgical or post-mortem sources), governed under tissue-banking regulation in the relevant jurisdiction.
Trusted digital repository: A digital repository whose mission, governance, technical infrastructure, and procedures have been independently assessed against a recognised standard (e.g. CoreTrustSeal, nestor seal, ISO 16363) and judged trustworthy to preserve digital content over the long term.
UK Data Service (concept): A UK ESRC-funded data infrastructure that holds, curates, and provides access to social, economic, and population data resources for research, learning, and policy, comprising the UK Data Archive at the University of Essex and partner institutions.
VIVO: An open-source semantic-web application and ontology developed by the VIVO community (initially at Cornell University, now under DuraSpace/LYRASIS) that publishes information about researchers, departments, publications, grants, and courses as linked open data and as a navigable web interface.
World Data System certification: Historic certification programme of ICSU's World Data System (WDS) under which scientific data centres in geosciences and related fields were certified as trustworthy; merged with the Data Seal of Approval in 2017 to form CoreTrustSeal.
Zenodo (concept): A free generalist research repository operated by CERN and developed under OpenAIRE that accepts deposits of datasets, software, publications, presentations, posters, and other research artefacts, minting DataCite DOIs and providing free preservation up to a per-record size limit.

Data management plans (DMPs)

The living documents and machine-actionable expressions that declare how research data will be collected, stored, shared, and preserved.

24 terms

Active DMP: A DMP that is actively maintained, updated, and queried during project execution, typically in machine-actionable form, in contrast to a one-off document filed at proposal stage.
Argos (OpenAIRE): OpenAIRE's open-source DMP authoring service, designed from the outset around the RDA DMP Common Standard and integrated with European Open Science Cloud (EOSC) and OpenAIRE Research Graph services.
Cost element (in DMP): A line item in a DMP describing a financial commitment associated with data management, such as repository deposit fees, long-term storage, data steward time, or anonymisation services.
Data Management Plan (DMP): A formal document that describes how research data will be collected, processed, described, stored, shared, preserved, and (where appropriate) destroyed across the lifecycle of a research project.
DataDMP: A DMP authoring and management platform developed in Germany that implements the RDA DMP Common Standard and emphasises integration with institutional research-data infrastructures.
DMP active phase: The phase of the DMP lifecycle during project execution, when the plan is iteratively updated as data are actually generated, processed, and deposited.
DMP assessment: The structured rating of a DMP against a published rubric to produce a comparable score across plans, used in funder evaluation, institutional benchmarking, and capacity-building.
DMP closeout phase: The final phase of the DMP lifecycle, at or after project end, when the plan is reconciled against actual outputs, preservation commitments are confirmed, and the DMP is archived as part of the project record.
DMP compliance check: An automated or rules-based verification that a DMP satisfies the structural and policy requirements of a specific funder, institution, or standard, typically returning a binary or itemised pass/fail outcome.
DMP component: A discrete, reusable section of a DMP corresponding to a logical entity (project, dataset, contributor, host, cost, security_and_privacy) as modelled in the RDA DMP Common Standard.
DMP creation phase: The phase of the DMP lifecycle covering initial drafting, typically at grant-proposal stage, when data types, volumes, and intended sharing arrangements are projected rather than known.
DMP lifecycle: The set of phases through which a Data Management Plan passes from initial drafting at proposal stage, through active project execution, to project closeout and post-project preservation.
DMP narrative: The human-readable prose portion of a Data Management Plan, typically organised under the headings of the applicable funder or institutional template.
DMP review: A formal or informal evaluation of a DMP by a peer, data steward, librarian, or funder reviewer against quality criteria such as completeness, plausibility, and policy alignment.
DMP template: A funder-, institution-, or community-specific structured set of questions and guidance used to elicit the content of a Data Management Plan from researchers.
DMPonline (DCC product): The UK Digital Curation Centre's hosted instance of the DMPRoadmap platform, providing DMP authoring for UK and international institutions against funder-specific templates.
DMPRoadmap (DMP Tool): An open-source Ruby on Rails platform for creating, reviewing, and exporting Data Management Plans, jointly developed by the UK Digital Curation Centre and the University of California Curation Center, and deployed under different brands (notably DMPonline and DMPTool).
ezDMP: A funder-focused DMP creation service that guides researchers through funder-specific (originally US NSF directorates') requirements and produces both narrative and structured outputs.
Living DMP: A DMP that is versioned, citable, and intended to evolve over the life of the project, with each significant change captured as a new version of the plan.
Machine-actionable DMP (maDMP): A Data Management Plan expressed in a structured, machine-readable format (typically JSON conforming to the RDA DMP Common Standard) that enables automated exchange, validation, and updating between systems such as DMP tools, repositories, CRIS/RIMS, and funder portals.
Output management plan (OMP): A broader successor concept to the DMP that covers all categories of research output (data, software, samples, protocols, models, publications) within a single management plan.
Sharing commitment (in DMP): A statement in a DMP specifying which datasets will be shared, to whom, on what licence, with what timing relative to project end, and through which infrastructure.
Software management plan (SMP): A structured plan covering how research software will be developed, documented, licensed, tested, released, and maintained over a project's lifetime, increasingly required alongside or as an extension of a DMP.
Static DMP: A DMP that is produced at a single point in time (typically grant submission) and not subsequently updated, regardless of whether the project's data realities evolve.

Persistent identifiers

The PID ecosystem that disambiguates researchers, organisations, outputs, and projects across every system that touches them.

40 terms

ARK: Archival Resource Key, a persistent identifier scheme for information objects of any type, in the form ark:/NAAN/Name[Qualifier], where NAAN is a Name Assigning Authority Number and Name is the local identifier; resolvable through any cooperating ARK resolver.
ARK inflection rules: A convention of the ARK identifier scheme whereby appending a single '?' to an ARK URL yields the object's descriptive metadata and appending '??' yields a 'commitment statement' describing the issuing institution's persistence policy for that ARK.
Crossref DOI: A DOI registered through Crossref, the DOI Registration Agency for scholarly publications (journals, books, conference proceedings, preprints, peer reviews, grants), accompanied by metadata deposited in Crossref's XML schema.
Curated org record (ROR): An entry in the ROR registry that has been reviewed and approved by ROR curators against the published inclusion criteria, carrying a stable ROR ID and metadata including names, types, country, geographic location, parent/child/related/successor/predecessor relationships, and crosswalks.
DataCite consortium: A national or regional grouping of DataCite member organisations led by a 'Consortium Lead' that holds the master agreement with DataCite, allowing member institutions to mint DOIs under a shared fee structure and shared support model.
DataCite DOI: A DOI registered through DataCite, the DOI Registration Agency that serves research data, software, samples, dissertations, instruments, and other non-article research outputs, accompanied by metadata in the DataCite Metadata Schema.
DOI: Digital Object Identifier (ISO 26324), a persistent identifier for an entity (typically a research output) consisting of a prefix assigned to a registrant by a DOI Registration Agency and a suffix assigned by the registrant, resolvable as an HTTPS URI under https://doi.org/.
DOI prefix: The leading portion of a DOI before the first forward slash, of the form 10.NNNN where 10 is the directory indicator for DOI under the Handle System and NNNN is a numeric (or alphanumeric) string assigned by the DOI Registration Agency to a specific registrant.
DOI suffix: The portion of a DOI after the first forward slash, assigned by the registrant within their issued DOI prefix, identifying the specific object; can contain any Unicode characters with the case-insensitivity rule applied during comparison.
DOI tombstone: A tombstone page served at the resolved URL of a DOI after the underlying resource has been withdrawn, providing withdrawal information and metadata while ensuring the DOI itself continues to resolve.
Funder ID: An identifier from the Crossref Funder Registry (formerly FundRef), a curated, open registry of funder names and identifiers used by publishers to tag deposited works with the funders that supported them.
GRID: Global Research Identifier Database, a legacy identifier and registry of research organisations originally operated by Digital Science, frozen to new records in 2021 and superseded by ROR, which seeded its registry from a deduplicated GRID snapshot.
GUID: Globally Unique Identifier, a generic term for an identifier that is intended to be unique across all systems and time, most commonly implemented as a 128-bit UUID but used informally for any opaque, globally scoped identifier.
Handle: An identifier in the CNRI Handle System (RFC 3650-3652), of the form Prefix/Suffix (e.g. 20.500.12345/abcd), resolved by a distributed system of Handle servers that map the identifier to one or more current URLs or other typed data values.
IGSN: International Geo Sample Number, a globally unique persistent identifier for physical samples (geological, environmental, biological) that supports tracking and citation of the sample through subsequent analyses, publications, and derived data.
ISNI: International Standard Name Identifier (ISO 27729), a 16-digit identifier for the public identity of a person or organisation involved in the creation, production, management, or distribution of content, administered by the ISNI International Agency.
ORCID API: The two-tier REST application programming interface (Public API and Member API) operated by ORCID that allows systems to read public ORCID record data and, with researcher authorisation, to read restricted data or write trusted-party assertions to records.
ORCID consortium: A national or regional grouping of ORCID member organisations that share a single membership fee structure and a lead organisation, in order to coordinate ORCID adoption, training, and policy advocacy within a country or region.
ORCID education: An affiliation item in an ORCID record asserting that the iD holder studied at a named organisation, including degree or qualification, department, start and end dates, and the organisation's disambiguated identifier.
ORCID employment: An affiliation item in an ORCID record asserting that the iD holder is or was employed by a named organisation, with start date, optional end date, department, role title, and the organisation's disambiguated identifier (typically a ROR ID).
ORCID iD: A 16-digit persistent identifier, expressed as four hyphen-separated blocks (e.g. 0000-0002-1825-0097) and resolvable as an HTTPS URI under https://orcid.org/, that uniquely identifies an individual researcher across publications, datasets, grants, employments, and peer-review activity.
ORCID record: The structured profile maintained at orcid.org for an individual ORCID iD, containing assertions about the person's names, employments, educations, funding, works, peer reviews, and service activities, each with a visibility setting and a source attribution.
ORCID record permissions: The three-level visibility setting attached to each item in an ORCID record — public, trusted parties only (limited), or private — which the record holder applies individually to names, employments, works, fundings, and other assertions.
ORCID work assertion: A claim, recorded in an ORCID record, that a particular research output (journal article, book chapter, dataset, software, etc.) is associated with the iD holder, with metadata fields including title, type, publication year, external identifiers (DOI, ISBN, PMID), and contributor role.
Persistent URL: An HTTP(S) URL that an issuing organisation commits to maintain unchanged over time so that links continue to resolve correctly even as the underlying resource is moved, renamed, or migrated between systems.
PID consortium: A grouping of PID-provider member organisations, typically at national or regional scale, formed to share infrastructure, contracts, and support around one or more persistent identifier schemes such as DOI, ORCID, or ROR.
PID graph: A graph data structure in which persistent identifiers (ORCID iDs, DOIs, ROR IDs, RAiDs, IGSNs, etc.) are nodes and the metadata relationships among them (creator-of, affiliated-with, funded-by, derived-from) are edges, allowing federated queries across multiple PID-provider registries.
PID minting: The act of generating a new persistent identifier in a registered scheme and registering it, with associated metadata, at the appropriate PID provider so that it becomes resolvable and discoverable.
PID provider: An organisation that issues persistent identifiers from one or more PID schemes, operates (or contracts) the resolution infrastructure for those identifiers, and makes long-term commitments about the maintenance of the identifiers and their metadata.
PID resolution: The process by which a persistent identifier is looked up through its scheme's resolution infrastructure and returned either as an HTTP redirect to the current resource location or as metadata about the resource, depending on the request and the scheme's policy.
PIDINST: A persistent identifier for a research instrument, minted under a DataCite DOI or Handle, conforming to the PIDINST metadata schema developed by an RDA Working Group, that enables citation of and provenance back to the instrument that produced data.
PURL: Persistent Uniform Resource Locator, a URL maintained by a PURL service that redirects (typically via HTTP 302) to the current location of the named resource, allowing the persistent URL to remain stable as the underlying resource location changes.
RAiD: Research Activity Identifier, an ISO-standardised persistent identifier (ISO 23527) for a research project or activity, providing a stable handle around which related people, organisations, outputs, instruments, and funding can be linked over the activity's lifetime.
Resolution service: A networked service that, given a persistent identifier, returns the current location of the named resource (typically by HTTP redirect) or returns its metadata, allowing the identifier itself to remain stable while the resource's location changes.
ROR Curation: The community-driven process by which the Research Organization Registry receives, reviews, and acts on requests to add new organisations, update existing records, merge duplicates, or split records, governed by a published curation policy and managed by ROR's curation team.
ROR ID: A persistent identifier for research organisations issued by the Research Organization Registry (ROR), expressed as an HTTPS URI of the form https://ror.org/0xxxxxxxx where the final nine-character path component is a base32-encoded random value with a check digit.
Tombstone page: A landing page served at a persistent identifier's resolved URL after the underlying resource has been withdrawn, retracted, or made permanently unavailable, providing metadata describing the former resource, the reason for its absence, and (where applicable) a successor identifier.
URN: Uniform Resource Name (RFC 8141), a URI of the form urn:NID:NSS where NID is a registered Namespace Identifier and NSS is the namespace-specific string, intended to denote a resource persistently and independently of any particular resolution mechanism.
UUID: Universally Unique Identifier (RFC 4122 / ISO/IEC 9834-8), a 128-bit value rendered as 32 hexadecimal digits in 8-4-4-4-12 grouping, generated such that the probability of collision across independent generators is negligible.
w3id.org PURL: A persistent URL hosted on the w3id.org domain by the W3C Permanent Identifier Community Group, providing a community-maintained redirect under https://w3id.org/<namespace> to ontologies, vocabularies, and standards documents that may move between hosting providers over time.

Keep exploring

Related areas of the dictionary

The RDM glossary sits inside the broader CASRAI Dictionary. These adjacent surfaces extend the same controlled vocabulary.

/dictionary

The CASRAI Dictionary

All 714 entries across 20 thematic domains, the parent set this glossary draws from.

/dictionary/domain

Research data infrastructure

Repositories, safe havens, certification schemes, and national data infrastructure.

/dictionary/domain

Machine-actionable DMPs

The maDMP vocabulary: living DMPs, the RDA DMP Common Standard, and DMP tooling.

/for-institutions

For institutions

How research offices and libraries adopt the dictionary in their RDM workflows.

/standards

Reproducibility standards

TOP Guidelines, PRISMA, CONSORT, and the computational-reproducibility vocabulary.

/dictionary/a-z

Full A–Z index

Every one of the 714 CASRAI terms, alphabetised, for complete crawl coverage.

This page is the canonical home of the CASRAI RDM glossary, referenced by controlled-vocabulary services including the UN FAO AGROVOC thesaurus. All definitions are CC-BY 4.0.

Frequently asked

RDM Glossary FAQ

What is a research data management glossary?: A research data management (RDM) glossary is a curated, defined set of the terms used to describe how research data is planned for, collected, documented, stored, shared, preserved, and cited across the research lifecycle. The CASRAI RDM Glossary draws nearly 200 of these terms from the CASRAI Dictionary, covering data infrastructure, machine-actionable DMPs, persistent identifiers, research-information systems, and reproducibility.
Who maintains the CASRAI RDM Glossary?: The glossary is maintained by CASRAI as part of the broader CASRAI Dictionary, stewarded by community working groups that draft, review, and ratify entries on a rolling versioned release cadence. The RDM terms are stewarded primarily by the data-infrastructure, machine-actionable DMP, persistent-identifier, research-information-systems, and reproducibility working groups.
Is the RDM Glossary free to use under CC-BY?: Yes. Every entry in the CASRAI RDM Glossary is published under the Creative Commons Attribution 4.0 International licence (CC-BY 4.0), with no paywall and no registration. You may reuse, redistribute, translate, and bundle the definitions commercially or non-commercially, provided you give appropriate credit to CASRAI.
How do I cite a term from the RDM Glossary?: Each term has a stable URI at https://casrai.org/dictionary/term/<slug> together with the dictionary release version. The term page emits a citation widget producing APA, BibTeX, RIS, and Chicago forms. To cite the dictionary release as a whole, use the Zenodo DOI listed on /dictionary/cite.
How does this glossary relate to FAIR and DMPs?: The FAIR principles (Findable, Accessible, Interoperable, Reusable) and data management plans (DMPs) are the operational backbone of research data management, so they are central to this glossary. FAIR-related terms (data citation, persistent identifiers, repositories) and DMP terms (machine-actionable DMPs) each form a dedicated sub-theme below.

For researchers · Academic SEO

Getting cited starts with getting found.

Most researchers never learn Academic SEO — the levers that make your work discoverable, correctly attributed, and cited across Google, Google Scholar and AI search. Here’s the complete playbook, plus the verified identity that ties it together.

Read the Academic SEO guide or claim your verified profile →