Tag: FAIR Data Principles

Clinical Data Management Plan vs Research Data Management Plan: What’s the Difference

On this page:

What Is a Clinical Data Management Plan?
What Is a Research Data Management Plan?
CDMP vs RDMP: Side-by-Side Comparison
Common Questions Answered
Why the Distinction Matters for Research Administrators
Looking Ahead

A clinical data management plan and a research data management plan are two of the most frequently conflated documents in the clinical trial lifecycle. Both use the acronym “DMP” in casual conversation, both get drafted before a study starts, and both concern “data” in the broadest sense — but they answer to different masters, cover different lifecycle stages, and are read by different audiences. Submitting the wrong one to the wrong reviewer is a recurring, avoidable compliance headache for trial units and research offices alike.

What Is a Clinical Data Management Plan?

A Clinical Data Management Plan (CDMP) is an operational, trial-specific document that describes exactly how data will move from case report form (CRF) to locked database. It is written by or with the clinical data management (CDM) function — not the principal investigator’s grants office — and it sits alongside the protocol as one of the working documents that Good Clinical Practice (GCP), per ICH E6, expects a sponsor to maintain and be able to produce on inspection.

A CDMP typically specifies:

CRF or eCRF design and the electronic data capture (EDC) system to be used
Database build, edit-check specifications and data validation rules
Data entry conventions (single vs double entry, query turnaround)
Medical coding dictionaries and versions, such as MedDRA and the WHO Drug Dictionary
Discrepancy management and serious adverse event reconciliation procedures
Roles, responsibilities and sign-off authority for database lock

Because it is inspected against GCP, a CDMP is a living, version-controlled document updated through the study rather than filed once and forgotten.

What Is a Research Data Management Plan?

A Research Data Management Plan (RDMP) is a funder- or institution-facing document submitted at the grant proposal stage, well before a trial’s CDMP would even exist. Its job is compliance with funder and institutional data policy, not trial operations. UK Research and Innovation (UKRI) requires a data management plan for relevant grant applications, Horizon Europe applicants complete one through the Data Management Plan template built into the Horizon Europe Programme Guide, and the NIH Data Management and Sharing (DMS) Policy has required a DMS plan for NIH-funded research since January 2023.

An RDMP typically covers:

What data types and volumes the project will generate or reuse
How data will be described, documented and made findable (metadata, identifiers)
Storage, security and access-control arrangements during the project
Ethical, consent and legal constraints on sharing (particularly for identifiable participant data)
Long-term preservation and repository plans, often with a DOI issued via DataCite
Alignment with the FAIR principles — Findable, Accessible, Interoperable, Reusable

Unlike a CDMP, an RDMP is reviewed once (or at defined milestones) by a funder or research office, not audited line-by-line by a regulator during a GCP inspection.

CDMP vs RDMP: Side-by-Side Comparison

The table below sets out where the two documents genuinely diverge, so institutions running funded clinical trials know they usually need both — not one instead of the other.

Dimension	Clinical Data Management Plan (CDMP)	Research Data Management Plan (RDMP)
Primary purpose	Ensure trial data is accurate, complete and audit-ready for database lock	Satisfy funder/institutional policy on data stewardship and sharing
Governing framework	ICH E6 Good Clinical Practice; sponsor/CRO SOPs	Funder mandates (UKRI, NIH, Horizon Europe); institutional RDM policy
Typical author	Data manager / clinical data management lead	Principal investigator, often with library or research office support
Created at	Study set-up, before first patient enrolled	Grant proposal stage, before funding is awarded
Primary audience	CDM team, biostatisticians, sponsor, regulatory inspectors	Funder, ethics/IRB reviewers, institutional research office
Content focus	CRF design, edit checks, coding, database lock procedures	Data description, storage, ethics, sharing, long-term preservation
Review cadence	Continuously updated through study conduct; inspected on audit	Reviewed at proposal and, for some funders, at defined milestones

Common Questions Answered

What does a clinical data management plan include?

A clinical data management plan includes CRF or eCRF specification, database design, data entry and validation procedures, edit-check logic, medical coding dictionaries such as MedDRA, discrepancy and adverse-event reconciliation processes, and clearly defined roles and responsibilities through to database lock, all maintained as a living, version-controlled document inspected under Good Clinical Practice.

What should a data management plan include?

A funder-facing research data management plan should describe the data types and volumes a project will generate, how data will be documented and made findable through metadata, storage and security arrangements, ethical and consent constraints on sharing identifiable data, and the eventual repository and preservation route, typically aligned to the FAIR data principles.

What are the three phases of clinical data management?

Clinical data management is generally organised into three sequential phases: study set-up, covering database build and CRF design; study conduct, covering data entry, cleaning and query resolution; and study close-out, covering final reconciliation, coding sign-off and database lock ahead of statistical analysis.

Why the Distinction Matters for Research Administrators

Institutions running externally funded clinical trials almost always need both documents, produced by different teams on different timelines. A funder reviewer looking for a FAIR-aligned sharing and preservation strategy will not find it in a CDMP’s edit-check specification — and a GCP inspector auditing database lock will not accept an RDMP’s high-level data-sharing statement as evidence of query resolution procedure.

This is precisely the coordination gap that research administration functions increasingly exist to close: aligning the pre-award compliance document (the RDMP, owned by the grants office) with the operational trial document (the CDMP, owned by clinical data management) so that neither is quietly missing when a funder audit or a regulatory inspection arrives. Institutions that treat the two as interchangeable risk both funder non-compliance and GCP findings — for two entirely separate reasons.

Consistent terminology helps here. Reviewers, auditors and research offices benefit from a shared reference for what each document is called and what it covers; the CASRAI research administration dictionary maintains definitions for terms that span exactly this pre-award-to-conduct boundary.

Looking Ahead

The line between the two documents is not static. ICH’s ongoing revision of E6 Good Clinical Practice has pushed sponsors toward more explicit, risk-based data governance language inside the CDMP itself, while funders such as UKRI and the NIH continue to tighten expectations for FAIR-aligned sharing inside the RDMP. Institutions that keep the two plans distinct — but explicitly cross-referenced — will be best placed to satisfy both regulators and funders as each side’s requirements keep evolving.

July 2, 2026

Data Provenance: Tracking Research Data to Publication

What Is Data Provenance?
Data Provenance vs Data Lineage
Provenance Standards: W3C PROV, RDA and RO-Crate
Building a Custody Chain from Collection to Publication
Common Questions About Data Provenance
Why Provenance Completes FAIR: Implications for Institutions

Research funders increasingly ask not just whether a dataset is open, but where it came from. Data provenance is the discipline of documenting a dataset’s origin, custody, and every transformation it undergoes between collection and publication — a distinct concern from data lineage, which maps only the technical pathway data takes through systems. As data management plans, repository deposits, and AI-training-data audits come under closer scrutiny, provenance metadata is becoming the connective tissue between “collected” and “citable.”

What Is Data Provenance?

Data provenance is the historical record of a dataset’s origin, custody, and processing history — who created or collected it, under what conditions, and what happened to it before it reached its published form. It functions as a chain of custody: not a single field in a metadata record, but a continuous trail spanning collection instruments, transformation scripts, quality checks, and every hand the data passed through.

This differs from anonymisation or privacy-preserving techniques, which govern what can be disclosed about a dataset’s contents. Provenance governs what can be verified about a dataset’s history — a governance question, not a disclosure-control one.

Data Provenance vs Data Lineage

The two terms are frequently used interchangeably, but the ELIXIR Research Data Management Kit (RDMkit) draws a useful distinction: lineage traces the technical movement of data between systems — extract, transform, load, output — while provenance adds the contextual and authorship layer: who authorised each step, why it happened, and under what licence or methodology.

Data lineage answers: which pipeline stages did this data pass through, and in what order?
Data provenance answers: who is accountable for each stage, and can that history be trusted and cited?

In practice, a well-built pipeline produces both: lineage as the operational map, provenance as the governance record layered on top of it.

Provenance Standards: W3C PROV, RDA and RO-Crate

Provenance only becomes machine-actionable — and therefore auditable at scale — once it is captured against a shared model rather than free text. The W3C PROV family (PROV-DM, PROV-O, PROV-N) is the reference data model, formally recommending how to describe “entities,” “activities,” and “agents” so provenance graphs can be exchanged between systems. The Research Data Alliance (RDA) has convened interest groups aligning disciplinary metadata practices with PROV-DM, and repository-facing specifications build on top of it.

Standard / Framework	Steward	What It Captures
PROV-DM / PROV-O / PROV-N	W3C	Formal graph model of entities, activities and agents; RDF/OWL-serialisable provenance
RO-Crate	Research Object community (schema.org-based)	Packages a dataset with its licence, workflow-run history and provenance in one archive
ISO 19115-2	ISO	Lineage extension for geographic and imagery metadata
DataCite Metadata Schema	DataCite	Related-identifier relationship types (IsDerivedFrom, IsSourceOf) linking a dataset DOI to its origin and outputs

Discipline-specific profiles then sit on top of these: FAIRsharing and RDA’s standards directory catalogue hundreds of provenance and metadata schemas so groups do not reinvent the model for each field.

Building a Custody Chain from Collection to Publication

A defensible provenance record follows the dataset through five stages, each logged with enough detail that a third party could reconstruct the history without contacting the original team.

Collection: instrument or method, collector identity (an ORCID iD is the practical anchor), date, and location captured at source.
Transformation: every cleaning, normalisation, aggregation or filtering step logged with the tool and version used.
Review: who validated the data, what checks were applied, and what was flagged or excluded.
Deposit: registration in a repository with a persistent identifier — a DataCite or CrossRef DOI — and an ROR identifier for the responsible institution.
Citation and reuse: downstream citations captured so the provenance trail extends forward into the published research output that relies on it.

Contributor-role taxonomies help name accountability at each stage. The CRediT contributor role of “Data Curation,” for example — a taxonomy CASRAI originated in 2014 and which is now stewarded by NISO as ANSI/NISO Z39.104-2022 — gives institutions a controlled vocabulary for naming who performed which custody step, complementing PROV-O’s more technical entity/activity/agent model. Research administrators building data management plans can pair the two: CRediT roles for human accountability, PROV-DM for machine-actionable history.

Common Questions About Data Provenance

What is data provenance?

Data provenance is the documented history of a dataset’s origin and custody — who collected it, under what method, and what transformations it underwent before use. It functions as a chain of custody, supporting authenticity checks, quality auditing, and reproducibility of any research output that relies on the data.

What is data provenance vs lineage?

Data lineage maps the technical route data takes between systems — extraction, transformation, loading. Data provenance adds the accountability layer: who authorised each step, why it occurred, and under what licence. Lineage is the operational map; provenance is the governance record built on top of it.

What are the two classes of data provenance?

Provenance literature typically distinguishes backward (retrospective) provenance, which reconstructs a dataset’s origin and history after the fact, from forward (prospective) provenance, which records how data is expected to move and transform in a defined future workflow before it happens.

What does provenance mean?

Outside data contexts, provenance refers to the documented history of ownership or origin of an object — the term used to authenticate artworks and manuscripts. Applied to research data, the same principle holds: a verifiable record of origin that supports trust, exactly as a chain of custody supports evidentiary trust in other domains.

Why Provenance Completes FAIR: Implications for Institutions

The FAIR data principles (Findable, Accessible, Interoperable, Reusable) are frequently treated as a checklist for open deposit, but the Reusable facet explicitly requires more than a licence tag. Principle R1.2 states that “(meta)data are associated with detailed provenance” — a sub-principle that is easy to satisfy nominally and hard to satisfy meaningfully. A dataset can be technically Findable and Accessible while its provenance metadata is a single free-text sentence, which leaves reproducibility unverifiable in practice.

This gap matters more as scrutiny of dataset origin intensifies elsewhere. MIT Media Lab’s audit of over 1,800 AI training datasets found licence omission or miscategorisation in more than two-thirds of cases — a warning sign for any field, including research data management, that treats provenance as an afterthought rather than a captured-at-source discipline.

For institutions building or refreshing data management plans under UKRI or Horizon Europe funding requirements, the practical implication is straightforward: provenance capture belongs at collection time, encoded against PROV-DM or an equivalent model, not reconstructed retrospectively when a journal, repository, or auditor asks for it. Research administrators, repository managers, and publishers who build custody-chain logging into their research administration workflows now will find FAIR compliance — and reproducibility review — considerably less costly later.

July 2, 2026

FAIR Data Principles in 2026: A Practical Guide for Research Administrators
The FAIR data principles — Findable, Accessible, Interoperable, Reusable — turn ten in 2026. Since Mark Wilkinson and colleagues published the framework in Scientific Data in 2016, FAIR has moved from an aspirational statement of good practice to a hard requirement embedded in funder mandates, journal policies, and institutional research data management infrastructure. UKRI’s open access policy now expects data underpinning publications to be made available in line with FAIR, the US NIH data sharing policy is actively enforced for funded projects, and Horizon Europe applicants must demonstrate FAIR-compliant data management as a condition of award.

Yet a decade in, compliance remains uneven. Many institutions still treat FAIR as a checkbox on a data management plan template rather than a set of concrete technical and governance obligations. As the ten-year anniversary approaches and funders sharpen enforcement, research administrators need a working map from principle to practice — one that goes beyond restating the acronym and instead specifies what each letter actually requires of repositories, metadata schemas, and institutional policy.

This article revisits the original FAIR framework as stewarded by FORCE11 and the GO FAIR initiative, and translates each element into actions that research offices, data stewards, and library services can implement now, ahead of the next REF cycle and continued tightening of funder mandates.

What the FAIR Data Principles Actually Require

Wilkinson et al. (2016) deliberately wrote FAIR as a set of guiding principles rather than a rigid standard, which has allowed broad adoption but also created room for superficial interpretation. FORCE11, the scholarly communication community that convened the original working group, and GO FAIR, the international support and coordination initiative, both continue to publish implementation guidance. For research administrators, the practical translation looks like this:
- Findable — Every dataset needs a globally unique, persistent identifier (a DOI minted through DataCite is the de facto standard for research data) and rich, indexed metadata that describes the dataset independently of the data itself. Institutional repositories must expose this metadata to harvesters and search services, not bury it behind a login wall.
- Accessible — Data (and, critically, its metadata) should be retrievable via a standardised, open communication protocol, with clear authentication and authorisation procedures where restrictions are legitimate. Accessible does not mean “open by default” — it means the access conditions are documented, discoverable, and enforced consistently, even when the data itself is restricted for ethical or commercial reasons.
- Interoperable — Metadata and data should use formal, shared, broadly applicable vocabularies for knowledge representation, and reference other data and metadata using standard identifiers. This is where controlled vocabularies, ontologies, and cross-referencing to identifiers like ORCID (for contributors), ROR (for institutions), and CrossRef (for related publications) matter most.
- Reusable — Data must carry a clear, accessible data usage licence, detailed provenance, and be described with enough domain-relevant metadata that a future researcher — human or machine — can understand and reuse it without contacting the original team.
None of the four elements is optional or substitutable for another. A dataset with a DOI but no licence is findable but not reusable. A dataset described only in free-text notes is accessible but not interoperable. Institutions that treat FAIR as satisfied once a DOI is assigned are addressing roughly one letter out of four.

Persistent Identifiers, Metadata, and Vocabularies: The Infrastructure Layer

The technical backbone of FAIR compliance rests on three infrastructure decisions that research administrators are often best placed to influence, even without deep technical expertise.

First, persistent identifier coverage needs to extend beyond the dataset itself. Contributor identification through ORCID, organisational identification through ROR, and publication linkage through CrossRef and DataCite together create the graph of relationships that makes data genuinely findable and interoperable — not just archived. Institutions that mandate ORCID at the point of data deposit, rather than treating it as optional metadata, see materially better linkage between datasets, grants, and outputs.

Second, metadata schemas need to move beyond generic Dublin Core toward domain-specific standards where they exist — DataCite Metadata Schema as a baseline, supplemented by discipline-specific vocabularies (such as those maintained by biomedical, environmental, or social science data communities). Rich metadata is the single most under-invested element of FAIR compliance: it is unglamorous, resource-intensive to produce well, and rarely rewarded in the same way a publication or citation is.

Third, standard vocabularies and licensing need institutional defaults rather than case-by-case decisions. A repository that offers a menu of Creative Commons or equivalent licences at deposit, with a sensible institutional default and clear guidance on when to deviate, removes the single most common point of friction — researchers who simply skip the licensing step because no default is presented.

From FAIR to CARE: Data Governance Beyond Technical Compliance

FAIR was designed primarily to solve a technical and infrastructural problem: making data machine-actionable and reusable. It says comparatively little about who benefits from that reuse, who consented to it, and who retains authority over data concerning specific communities. This gap is precisely what the CARE Principles for Indigenous Data Governance — Collective Benefit, Authority to Control, Responsibility, and Ethics — were developed to address, and the two frameworks are increasingly discussed together rather than as alternatives.

Institutions building research data governance frameworks in 2026 need to treat FAIR and CARE as complementary rather than competing. FAIR asks “can this data be found, accessed, and reused efficiently?” CARE asks “should it be, on what terms, and who decides?” A research data management policy that only addresses FAIR risks technically excellent infrastructure applied to data — particularly Indigenous, community, or otherwise sensitive data — without adequate governance over consent, benefit-sharing, or ongoing authority. Data governance frameworks that reference both FAIR and CARE principles are becoming standard practice at institutions with significant Indigenous studies, community health, or population genomics portfolios, and reviewers increasingly expect to see both addressed in ethics and data management documentation, not just FAIR.

Building a Research Data Management Plan That Delivers FAIR

The research data management plan is where FAIR principles are supposed to become operational commitments, yet many plans are still written to satisfy a funder template rather than to genuinely guide the research team. A data management plan that actually delivers FAIR outcomes needs to specify, in concrete and checkable terms:
- Which repository will host the data, and whether that repository mints persistent identifiers and supports the metadata schema required for the discipline.
- Who is responsible for metadata creation and quality review before deposit — not left as an afterthought at project close-out.
- Which licence will apply to the data, decided at the planning stage rather than retrofitted at submission.
- What vocabularies or ontologies will be used to describe variables, samples, or methods, particularly where cross-study interoperability is a stated goal.
- How access will be governed for any data subject to ethical, commercial, or CARE-relevant restrictions, including who approves access requests after the project team disbands.
Institutions preparing for REF 2029 and equivalent national assessment exercises have a particular incentive to get this right now: data management practice is increasingly scrutinised as part of research environment statements, and a portfolio of well-governed, genuinely FAIR datasets is a defensible evidence base in a way that a folder of unlinked spreadsheets is not.

What This Means for Research Administrators

For research administrators, EARMA and ARMA members, and institutional research office staff, the ten-year mark for FAIR is a natural prompt to audit rather than assume compliance. Three actions stand out as immediately actionable:

First, audit repository defaults. Check whether your institutional repository mints DOIs automatically, requires a licence selection at deposit, and exposes metadata to standard harvesting protocols. If any of these is missing, that is a findability or reusability gap regardless of how the policy documents read.

Second, build ORCID, ROR, and DataCite/CrossRef linkage into deposit workflows as mandatory fields, not optional extras. This is the lowest-cost, highest-leverage intervention available to most institutions and directly strengthens the Findable and Interoperable pillars.

Third, extend data governance conversations to explicitly include CARE alongside FAIR wherever research involves Indigenous communities, sensitive population data, or community-held knowledge. Reviewers, ethics committees, and increasingly funders are asking for both.

Looking Ahead

As FAIR approaches its tenth anniversary, the framework’s core insight — that data value compounds when it is genuinely findable, accessible, interoperable, and reusable — remains sound. What has changed is the level of scrutiny applied to claims of compliance. Funders, publishers, and institutions themselves are moving from asking “do you have a data management plan?” to asking “does your data actually behave like FAIR data?” For research administrators, closing that gap between policy and practice — with the infrastructure, governance, and plan quality to match — is the work of the next decade, not just the anniversary year.
July 1, 2026
UKRI’s New Research Data Policy: A Plain-English Briefing for Institutional Administrators
UKRI is expected to publish an updated research data policy in summer 2026, and institutional research offices should not wait for the final text to start preparing. Signals from UKRI’s existing Common Principles on Data Policy, its 2022 open access policy, and the broader direction of travel across funders point clearly toward a single organising idea: “maximising data value.” For research administrators, that phrase is not a slogan — it is a compliance requirement in waiting, and it will touch data management plans, persistent identifiers, and the systems that track them long before any enforcement clock starts ticking.

The pattern is familiar. When the UKRI open access policy took effect for journal articles in 2022 and for monographs in 2024, institutions that had already invested in repository infrastructure, author identifier hygiene, and rights-retention workflows absorbed the change with minimal disruption. Those that had not scrambled. A forthcoming UKRI research data policy is likely to follow the same script, extending the funder’s open research agenda from published articles to the underlying datasets, code, and materials that support them.

This briefing sets out, in plain English, what “maximising data value” is likely to mean operationally, and what a research data management policy readiness checklist should contain before the formal text arrives.

What “Maximising Data Value” Means for a UKRI Research Data Policy

UKRI’s framing of data value draws directly on the FAIR principles — Findable, Accessible, Interoperable, and Reusable — first articulated in the scientific data community and now embedded in funder expectations across the UK, the EU’s Horizon Europe programme, and beyond. In practice, “maximising value” is unlikely to mean simply “publish more data.” It means data that can be discovered through standard metadata, accessed under clear licensing terms, described in formats other researchers’ tools can parse, and reused with enough provenance information to trust it.

For administrators, the operational translation is threefold:
- Findable — datasets need persistent identifiers and rich, machine-readable metadata, typically registered through services such as DataCite, so they surface in discovery tools rather than sitting on an unindexed institutional server.
- Accessible — access conditions (open, embargoed, or restricted for sensitive data) must be stated explicitly and consistently, not left to individual researcher discretion.
- Interoperable and Reusable — data needs documented standards, controlled vocabularies where they exist, and licensing that permits reuse, mirroring the rights-retention logic already familiar from open access compliance.
None of this is achievable researcher-by-researcher at the point of grant closeout. It requires infrastructure that exists before the data is generated — which is precisely why an anticipatory approach matters more than a reactive one.

Data Management Plans as the Compliance Backbone

Data management plans (DMPs) are the mechanism through which funders convert data policy principles into auditable commitments. UKRI councils already require DMPs for many grant types, but a unified data policy is likely to standardise expectations across councils that have historically varied — a source of persistent friction for multi-council and interdisciplinary awards.

Institutions should treat the DMP not as a one-off grant-application document but as a living compliance artefact, reviewed at key milestones: award, mid-project, and closeout. This is where the overlap with research integrity policy becomes explicit. Bodies such as COPE and the UK’s own research integrity infrastructure have repeatedly linked poor data stewardship — undocumented provenance, irreproducible datasets, unclear authorship of derived outputs — to the conditions that enable disputes and, in the worst cases, retractions tracked by services such as Retraction Watch. A robust DMP process is therefore not merely an administrative box to tick; it is a frontline research integrity control.

Administrators should also expect closer alignment between DMP compliance and the CRediT contributor role taxonomy, which clarifies who is responsible for which stage of data collection, curation, and analysis. CASRAI originated the CRediT contributor role taxonomy in 2014. The standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. Institutions that already map CRediT roles into their publication workflows are well placed to extend the same logic to dataset contributorship statements.

Persistent Identifiers: The Infrastructure Layer Nobody Notices Until It’s Missing

Persistent identifiers (PIDs) are the connective tissue of any credible research data infrastructure, and they are the single most concrete thing an institution can fix before a policy lands. Three PIDs matter most:
- ORCID identifiers for researchers, now widely mandated across funder and publisher workflows, ensuring datasets are correctly attributed even when authors move institutions or change names.
- ROR (Research Organization Registry) identifiers for institutional affiliation, increasingly required alongside ORCID to disambiguate which organisation is accountable for which output.
- DataCite DOIs for the datasets themselves, giving each dataset a citable, resolvable, permanent address independent of where it happens to be hosted.
CrossRef DOIs for articles and DataCite DOIs for datasets should be linked bidirectionally wherever possible, so that a published paper and its underlying data form a verifiable pair. Institutions that have not yet audited their systems for consistent ORCID and ROR capture — particularly in their electronic research administration platforms, current research information systems, and repository intake forms — should treat this as the highest-priority, lowest-cost preparation step available. It requires no new policy to justify; it improves compliance readiness for every funder mandate, not just UKRI’s.

What This Means for Research Administrators

The institutions best positioned for a summer 2026 policy announcement will not be the ones that read it fastest — they will be the ones whose sponsored research administration infrastructure already produces compliant metadata as a by-product of normal grant management, rather than as a bolt-on exercise triggered by audit anxiety. Practical steps worth starting now include:
- Auditing current DMP templates against FAIR principles and standardising them across faculties or research councils where practice has diverged.
- Confirming that ORCID and ROR capture is mandatory, not optional, at the point of grant setup within the institution’s research administration system.
- Establishing or reviewing institutional agreements with DataCite (directly or via a national or subject repository) for dataset DOI minting.
- Mapping data stewardship responsibilities using a CRediT-style contributor framework, so accountability for data quality is documented rather than assumed.
- Briefing research integrity offices now, so that data policy compliance is understood as an extension of existing research integrity policy rather than a parallel, competing process.
Professional bodies including ARMA, NCURA, EARMA, and INORMS have all flagged funder data mandates as a growing training and resourcing need for research administrators; institutions that engage with these networks now will have a head start on interpreting whatever UKRI ultimately publishes.

Looking Ahead

A formal UKRI research data policy, when it arrives, will almost certainly be framed around the language of value, openness, and reuse rather than restriction. But the operational substance — FAIR-compliant metadata, disciplined data management plans, and consistent use of persistent identifiers — is already knowable, and already actionable. Institutions that treat the coming months as a compliance sprint rather than a waiting period will be the ones for whom “maximising data value” is simply a description of how they already work, not a new burden imposed from outside.
July 1, 2026

FAIR Data Principles in Action: A Practical Implementation Guide for Researchers

Introduction

The FAIR Data Principles, published in 2016, provide a guideline for improving the Findability, Accessibility, Interoperability, and Reusability of digital assets. They emphasize machine-actionability—the capacity of computational systems to find, access, interoperate, and reuse data with minimal or no human intervention—because humans increasingly rely on computational support to deal with data at scale and complexity.

Findable and Accessible: The Metadata Layer

To make data Findable (F), it must be assigned a persistent identifier (PID) like a DOI, described with rich metadata, and registered in a searchable resource. To make it Accessible (A), the data and metadata must be retrievable by their identifier using a standardized, open communication protocol (like HTTP or HTTPS). Crucially, metadata must remain available even if the data itself is no longer accessible.

Interoperable and Reusable: Standards and Vocabularies

Interoperability (I) requires the data to use a formal, accessible, shared, and broadly applicable language for knowledge representation. It must use vocabularies that follow FAIR principles and include qualified references to other metadata. Reusability (R) is the ultimate goal, which is achieved by releasing data with a clear, accessible data usage license (like Creative Commons) and detailed provenance information.

Step-by-Step FAIR Assessment for Projects

Transforming theoretical principles into daily practice requires a systematic approach. Researchers should audit their workflows using online FAIR assessment tools, select file formats that are non-proprietary (like CSV instead of XLSX), and ensure that all metadata records contain linkable references to related publications and funding identifiers.

Key Comparison Matrix

FAIR Principle	Core Requirement	Practical Action
Findable (F)	Persistent Identifiers and rich metadata.	Deposit data in Zenodo or Figshare to obtain a DOI and fill out all metadata fields.
Accessible (A)	Open, standard protocols for retrieval.	Ensure repository utilizes open APIs (e.g., OAI-PMH) and protocols like HTTPS.
Interoperable (I)	Common vocabularies and ontologies.	Use standard schemas (e.g., Schema.org, Dublin Core) and structured JSON-LD.
Reusable (R)	Clear usage licenses and provenance.	Attach a CC-BY 4.0 license and document the data generation steps.

Five Steps to Achieving FAIRness

Select non-proprietary open file formats for data deposits.
Obtain a DOI for all published datasets and source code.
Attach an open-source or creative commons license to all digital assets.
Map metadata fields to recognized standards like Dublin Core or Schema.org.
Include rich provenance details explaining how the data was gathered and processed.

June 13, 2026

Tag: FAIR Data Principles

Clinical Data Management Plan vs Research Data Management Plan: What’s the Difference

What Is a Clinical Data Management Plan?

What Is a Research Data Management Plan?

CDMP vs RDMP: Side-by-Side Comparison

Common Questions Answered

What does a clinical data management plan include?

What should a data management plan include?

What are the three phases of clinical data management?

Why the Distinction Matters for Research Administrators

Looking Ahead

Data Provenance: Tracking Research Data to Publication

What Is Data Provenance?

Data Provenance vs Data Lineage

Provenance Standards: W3C PROV, RDA and RO-Crate

Building a Custody Chain from Collection to Publication

Common Questions About Data Provenance

What is data provenance?

What is data provenance vs lineage?

What are the two classes of data provenance?

What does provenance mean?

Why Provenance Completes FAIR: Implications for Institutions

FAIR Data Principles in 2026: A Practical Guide for Research Administrators

What the FAIR Data Principles Actually Require

Persistent Identifiers, Metadata, and Vocabularies: The Infrastructure Layer

From FAIR to CARE: Data Governance Beyond Technical Compliance

Building a Research Data Management Plan That Delivers FAIR

What This Means for Research Administrators

Looking Ahead

UKRI’s New Research Data Policy: A Plain-English Briefing for Institutional Administrators

What “Maximising Data Value” Means for a UKRI Research Data Policy

Data Management Plans as the Compliance Backbone

Persistent Identifiers: The Infrastructure Layer Nobody Notices Until It’s Missing

What This Means for Research Administrators

Looking Ahead

FAIR Data Principles in Action: A Practical Implementation Guide for Researchers

Introduction

Findable and Accessible: The Metadata Layer

Interoperable and Reusable: Standards and Vocabularies

Step-by-Step FAIR Assessment for Projects

Key Comparison Matrix

Five Steps to Achieving FAIRness