Tag: FAIR Data Principles

  • Clinical Data Management Plan vs Research Data Management Plan: What’s the Difference

    On this page:

    A clinical data management plan and a research data management plan are two of the most frequently conflated documents in the clinical trial lifecycle. Both use the acronym “DMP” in casual conversation, both get drafted before a study starts, and both concern “data” in the broadest sense — but they answer to different masters, cover different lifecycle stages, and are read by different audiences. Submitting the wrong one to the wrong reviewer is a recurring, avoidable compliance headache for trial units and research offices alike.

    What Is a Clinical Data Management Plan?

    A Clinical Data Management Plan (CDMP) is an operational, trial-specific document that describes exactly how data will move from case report form (CRF) to locked database. It is written by or with the clinical data management (CDM) function — not the principal investigator’s grants office — and it sits alongside the protocol as one of the working documents that Good Clinical Practice (GCP), per ICH E6, expects a sponsor to maintain and be able to produce on inspection.

    A CDMP typically specifies:

    • CRF or eCRF design and the electronic data capture (EDC) system to be used
    • Database build, edit-check specifications and data validation rules
    • Data entry conventions (single vs double entry, query turnaround)
    • Medical coding dictionaries and versions, such as MedDRA and the WHO Drug Dictionary
    • Discrepancy management and serious adverse event reconciliation procedures
    • Roles, responsibilities and sign-off authority for database lock

    Because it is inspected against GCP, a CDMP is a living, version-controlled document updated through the study rather than filed once and forgotten.

    What Is a Research Data Management Plan?

    A Research Data Management Plan (RDMP) is a funder- or institution-facing document submitted at the grant proposal stage, well before a trial’s CDMP would even exist. Its job is compliance with funder and institutional data policy, not trial operations. UK Research and Innovation (UKRI) requires a data management plan for relevant grant applications, Horizon Europe applicants complete one through the Data Management Plan template built into the Horizon Europe Programme Guide, and the NIH Data Management and Sharing (DMS) Policy has required a DMS plan for NIH-funded research since January 2023.

    An RDMP typically covers:

    • What data types and volumes the project will generate or reuse
    • How data will be described, documented and made findable (metadata, identifiers)
    • Storage, security and access-control arrangements during the project
    • Ethical, consent and legal constraints on sharing (particularly for identifiable participant data)
    • Long-term preservation and repository plans, often with a DOI issued via DataCite
    • Alignment with the FAIR principles — Findable, Accessible, Interoperable, Reusable

    Unlike a CDMP, an RDMP is reviewed once (or at defined milestones) by a funder or research office, not audited line-by-line by a regulator during a GCP inspection.

    CDMP vs RDMP: Side-by-Side Comparison

    The table below sets out where the two documents genuinely diverge, so institutions running funded clinical trials know they usually need both — not one instead of the other.

    Dimension Clinical Data Management Plan (CDMP) Research Data Management Plan (RDMP)
    Primary purpose Ensure trial data is accurate, complete and audit-ready for database lock Satisfy funder/institutional policy on data stewardship and sharing
    Governing framework ICH E6 Good Clinical Practice; sponsor/CRO SOPs Funder mandates (UKRI, NIH, Horizon Europe); institutional RDM policy
    Typical author Data manager / clinical data management lead Principal investigator, often with library or research office support
    Created at Study set-up, before first patient enrolled Grant proposal stage, before funding is awarded
    Primary audience CDM team, biostatisticians, sponsor, regulatory inspectors Funder, ethics/IRB reviewers, institutional research office
    Content focus CRF design, edit checks, coding, database lock procedures Data description, storage, ethics, sharing, long-term preservation
    Review cadence Continuously updated through study conduct; inspected on audit Reviewed at proposal and, for some funders, at defined milestones

    Common Questions Answered

    What does a clinical data management plan include?

    A clinical data management plan includes CRF or eCRF specification, database design, data entry and validation procedures, edit-check logic, medical coding dictionaries such as MedDRA, discrepancy and adverse-event reconciliation processes, and clearly defined roles and responsibilities through to database lock, all maintained as a living, version-controlled document inspected under Good Clinical Practice.

    What should a data management plan include?

    A funder-facing research data management plan should describe the data types and volumes a project will generate, how data will be documented and made findable through metadata, storage and security arrangements, ethical and consent constraints on sharing identifiable data, and the eventual repository and preservation route, typically aligned to the FAIR data principles.

    What are the three phases of clinical data management?

    Clinical data management is generally organised into three sequential phases: study set-up, covering database build and CRF design; study conduct, covering data entry, cleaning and query resolution; and study close-out, covering final reconciliation, coding sign-off and database lock ahead of statistical analysis.

    Why the Distinction Matters for Research Administrators

    Institutions running externally funded clinical trials almost always need both documents, produced by different teams on different timelines. A funder reviewer looking for a FAIR-aligned sharing and preservation strategy will not find it in a CDMP’s edit-check specification — and a GCP inspector auditing database lock will not accept an RDMP’s high-level data-sharing statement as evidence of query resolution procedure.

    This is precisely the coordination gap that research administration functions increasingly exist to close: aligning the pre-award compliance document (the RDMP, owned by the grants office) with the operational trial document (the CDMP, owned by clinical data management) so that neither is quietly missing when a funder audit or a regulatory inspection arrives. Institutions that treat the two as interchangeable risk both funder non-compliance and GCP findings — for two entirely separate reasons.

    Consistent terminology helps here. Reviewers, auditors and research offices benefit from a shared reference for what each document is called and what it covers; the CASRAI research administration dictionary maintains definitions for terms that span exactly this pre-award-to-conduct boundary.

    Looking Ahead

    The line between the two documents is not static. ICH’s ongoing revision of E6 Good Clinical Practice has pushed sponsors toward more explicit, risk-based data governance language inside the CDMP itself, while funders such as UKRI and the NIH continue to tighten expectations for FAIR-aligned sharing inside the RDMP. Institutions that keep the two plans distinct — but explicitly cross-referenced — will be best placed to satisfy both regulators and funders as each side’s requirements keep evolving.

  • Data Provenance: Tracking Research Data to Publication

    Research funders increasingly ask not just whether a dataset is open, but where it came from. Data provenance is the discipline of documenting a dataset’s origin, custody, and every transformation it undergoes between collection and publication — a distinct concern from data lineage, which maps only the technical pathway data takes through systems. As data management plans, repository deposits, and AI-training-data audits come under closer scrutiny, provenance metadata is becoming the connective tissue between “collected” and “citable.”

    What Is Data Provenance?

    Data provenance is the historical record of a dataset’s origin, custody, and processing history — who created or collected it, under what conditions, and what happened to it before it reached its published form. It functions as a chain of custody: not a single field in a metadata record, but a continuous trail spanning collection instruments, transformation scripts, quality checks, and every hand the data passed through.

    This differs from anonymisation or privacy-preserving techniques, which govern what can be disclosed about a dataset’s contents. Provenance governs what can be verified about a dataset’s history — a governance question, not a disclosure-control one.

    Data Provenance vs Data Lineage

    The two terms are frequently used interchangeably, but the ELIXIR Research Data Management Kit (RDMkit) draws a useful distinction: lineage traces the technical movement of data between systems — extract, transform, load, output — while provenance adds the contextual and authorship layer: who authorised each step, why it happened, and under what licence or methodology.

    • Data lineage answers: which pipeline stages did this data pass through, and in what order?
    • Data provenance answers: who is accountable for each stage, and can that history be trusted and cited?

    In practice, a well-built pipeline produces both: lineage as the operational map, provenance as the governance record layered on top of it.

    Provenance Standards: W3C PROV, RDA and RO-Crate

    Provenance only becomes machine-actionable — and therefore auditable at scale — once it is captured against a shared model rather than free text. The W3C PROV family (PROV-DM, PROV-O, PROV-N) is the reference data model, formally recommending how to describe “entities,” “activities,” and “agents” so provenance graphs can be exchanged between systems. The Research Data Alliance (RDA) has convened interest groups aligning disciplinary metadata practices with PROV-DM, and repository-facing specifications build on top of it.

    Standard / Framework Steward What It Captures
    PROV-DM / PROV-O / PROV-N W3C Formal graph model of entities, activities and agents; RDF/OWL-serialisable provenance
    RO-Crate Research Object community (schema.org-based) Packages a dataset with its licence, workflow-run history and provenance in one archive
    ISO 19115-2 ISO Lineage extension for geographic and imagery metadata
    DataCite Metadata Schema DataCite Related-identifier relationship types (IsDerivedFrom, IsSourceOf) linking a dataset DOI to its origin and outputs

    Discipline-specific profiles then sit on top of these: FAIRsharing and RDA’s standards directory catalogue hundreds of provenance and metadata schemas so groups do not reinvent the model for each field.

    Building a Custody Chain from Collection to Publication

    A defensible provenance record follows the dataset through five stages, each logged with enough detail that a third party could reconstruct the history without contacting the original team.

    • Collection: instrument or method, collector identity (an ORCID iD is the practical anchor), date, and location captured at source.
    • Transformation: every cleaning, normalisation, aggregation or filtering step logged with the tool and version used.
    • Review: who validated the data, what checks were applied, and what was flagged or excluded.
    • Deposit: registration in a repository with a persistent identifier — a DataCite or CrossRef DOI — and an ROR identifier for the responsible institution.
    • Citation and reuse: downstream citations captured so the provenance trail extends forward into the published research output that relies on it.

    Contributor-role taxonomies help name accountability at each stage. The CRediT contributor role of “Data Curation,” for example — a taxonomy CASRAI originated in 2014 and which is now stewarded by NISO as ANSI/NISO Z39.104-2022 — gives institutions a controlled vocabulary for naming who performed which custody step, complementing PROV-O’s more technical entity/activity/agent model. Research administrators building data management plans can pair the two: CRediT roles for human accountability, PROV-DM for machine-actionable history.

    Common Questions About Data Provenance

    What is data provenance?

    Data provenance is the documented history of a dataset’s origin and custody — who collected it, under what method, and what transformations it underwent before use. It functions as a chain of custody, supporting authenticity checks, quality auditing, and reproducibility of any research output that relies on the data.

    What is data provenance vs lineage?

    Data lineage maps the technical route data takes between systems — extraction, transformation, loading. Data provenance adds the accountability layer: who authorised each step, why it occurred, and under what licence. Lineage is the operational map; provenance is the governance record built on top of it.

    What are the two classes of data provenance?

    Provenance literature typically distinguishes backward (retrospective) provenance, which reconstructs a dataset’s origin and history after the fact, from forward (prospective) provenance, which records how data is expected to move and transform in a defined future workflow before it happens.

    What does provenance mean?

    Outside data contexts, provenance refers to the documented history of ownership or origin of an object — the term used to authenticate artworks and manuscripts. Applied to research data, the same principle holds: a verifiable record of origin that supports trust, exactly as a chain of custody supports evidentiary trust in other domains.

    Why Provenance Completes FAIR: Implications for Institutions

    The FAIR data principles (Findable, Accessible, Interoperable, Reusable) are frequently treated as a checklist for open deposit, but the Reusable facet explicitly requires more than a licence tag. Principle R1.2 states that “(meta)data are associated with detailed provenance” — a sub-principle that is easy to satisfy nominally and hard to satisfy meaningfully. A dataset can be technically Findable and Accessible while its provenance metadata is a single free-text sentence, which leaves reproducibility unverifiable in practice.

    This gap matters more as scrutiny of dataset origin intensifies elsewhere. MIT Media Lab’s audit of over 1,800 AI training datasets found licence omission or miscategorisation in more than two-thirds of cases — a warning sign for any field, including research data management, that treats provenance as an afterthought rather than a captured-at-source discipline.

    For institutions building or refreshing data management plans under UKRI or Horizon Europe funding requirements, the practical implication is straightforward: provenance capture belongs at collection time, encoded against PROV-DM or an equivalent model, not reconstructed retrospectively when a journal, repository, or auditor asks for it. Research administrators, repository managers, and publishers who build custody-chain logging into their research administration workflows now will find FAIR compliance — and reproducibility review — considerably less costly later.

  • FAIR Data Principles in 2026: A Practical Guide for Research Administrators

    The FAIR data principles — Findable, Accessible, Interoperable, Reusable — turn ten in 2026. Since Mark Wilkinson and colleagues published the framework in Scientific Data in 2016, FAIR has moved from an aspirational statement of good practice to a hard requirement embedded in funder mandates, journal policies, and institutional research data management infrastructure. UKRI’s open access policy now expects data underpinning publications to be made available in line with FAIR, the US NIH data sharing policy is actively enforced for funded projects, and Horizon Europe applicants must demonstrate FAIR-compliant data management as a condition of award.

    Yet a decade in, compliance remains uneven. Many institutions still treat FAIR as a checkbox on a data management plan template rather than a set of concrete technical and governance obligations. As the ten-year anniversary approaches and funders sharpen enforcement, research administrators need a working map from principle to practice — one that goes beyond restating the acronym and instead specifies what each letter actually requires of repositories, metadata schemas, and institutional policy.

    This article revisits the original FAIR framework as stewarded by FORCE11 and the GO FAIR initiative, and translates each element into actions that research offices, data stewards, and library services can implement now, ahead of the next REF cycle and continued tightening of funder mandates.

    What the FAIR Data Principles Actually Require

    Wilkinson et al. (2016) deliberately wrote FAIR as a set of guiding principles rather than a rigid standard, which has allowed broad adoption but also created room for superficial interpretation. FORCE11, the scholarly communication community that convened the original working group, and GO FAIR, the international support and coordination initiative, both continue to publish implementation guidance. For research administrators, the practical translation looks like this:

    • Findable — Every dataset needs a globally unique, persistent identifier (a DOI minted through DataCite is the de facto standard for research data) and rich, indexed metadata that describes the dataset independently of the data itself. Institutional repositories must expose this metadata to harvesters and search services, not bury it behind a login wall.
    • Accessible — Data (and, critically, its metadata) should be retrievable via a standardised, open communication protocol, with clear authentication and authorisation procedures where restrictions are legitimate. Accessible does not mean “open by default” — it means the access conditions are documented, discoverable, and enforced consistently, even when the data itself is restricted for ethical or commercial reasons.
    • Interoperable — Metadata and data should use formal, shared, broadly applicable vocabularies for knowledge representation, and reference other data and metadata using standard identifiers. This is where controlled vocabularies, ontologies, and cross-referencing to identifiers like ORCID (for contributors), ROR (for institutions), and CrossRef (for related publications) matter most.
    • Reusable — Data must carry a clear, accessible data usage licence, detailed provenance, and be described with enough domain-relevant metadata that a future researcher — human or machine — can understand and reuse it without contacting the original team.

    None of the four elements is optional or substitutable for another. A dataset with a DOI but no licence is findable but not reusable. A dataset described only in free-text notes is accessible but not interoperable. Institutions that treat FAIR as satisfied once a DOI is assigned are addressing roughly one letter out of four.

    Persistent Identifiers, Metadata, and Vocabularies: The Infrastructure Layer

    The technical backbone of FAIR compliance rests on three infrastructure decisions that research administrators are often best placed to influence, even without deep technical expertise.

    First, persistent identifier coverage needs to extend beyond the dataset itself. Contributor identification through ORCID, organisational identification through ROR, and publication linkage through CrossRef and DataCite together create the graph of relationships that makes data genuinely findable and interoperable — not just archived. Institutions that mandate ORCID at the point of data deposit, rather than treating it as optional metadata, see materially better linkage between datasets, grants, and outputs.

    Second, metadata schemas need to move beyond generic Dublin Core toward domain-specific standards where they exist — DataCite Metadata Schema as a baseline, supplemented by discipline-specific vocabularies (such as those maintained by biomedical, environmental, or social science data communities). Rich metadata is the single most under-invested element of FAIR compliance: it is unglamorous, resource-intensive to produce well, and rarely rewarded in the same way a publication or citation is.

    Third, standard vocabularies and licensing need institutional defaults rather than case-by-case decisions. A repository that offers a menu of Creative Commons or equivalent licences at deposit, with a sensible institutional default and clear guidance on when to deviate, removes the single most common point of friction — researchers who simply skip the licensing step because no default is presented.

    From FAIR to CARE: Data Governance Beyond Technical Compliance

    FAIR was designed primarily to solve a technical and infrastructural problem: making data machine-actionable and reusable. It says comparatively little about who benefits from that reuse, who consented to it, and who retains authority over data concerning specific communities. This gap is precisely what the CARE Principles for Indigenous Data Governance — Collective Benefit, Authority to Control, Responsibility, and Ethics — were developed to address, and the two frameworks are increasingly discussed together rather than as alternatives.

    Institutions building research data governance frameworks in 2026 need to treat FAIR and CARE as complementary rather than competing. FAIR asks “can this data be found, accessed, and reused efficiently?” CARE asks “should it be, on what terms, and who decides?” A research data management policy that only addresses FAIR risks technically excellent infrastructure applied to data — particularly Indigenous, community, or otherwise sensitive data — without adequate governance over consent, benefit-sharing, or ongoing authority. Data governance frameworks that reference both FAIR and CARE principles are becoming standard practice at institutions with significant Indigenous studies, community health, or population genomics portfolios, and reviewers increasingly expect to see both addressed in ethics and data management documentation, not just FAIR.

    Building a Research Data Management Plan That Delivers FAIR

    The research data management plan is where FAIR principles are supposed to become operational commitments, yet many plans are still written to satisfy a funder template rather than to genuinely guide the research team. A data management plan that actually delivers FAIR outcomes needs to specify, in concrete and checkable terms:

    • Which repository will host the data, and whether that repository mints persistent identifiers and supports the metadata schema required for the discipline.
    • Who is responsible for metadata creation and quality review before deposit — not left as an afterthought at project close-out.
    • Which licence will apply to the data, decided at the planning stage rather than retrofitted at submission.
    • What vocabularies or ontologies will be used to describe variables, samples, or methods, particularly where cross-study interoperability is a stated goal.
    • How access will be governed for any data subject to ethical, commercial, or CARE-relevant restrictions, including who approves access requests after the project team disbands.

    Institutions preparing for REF 2029 and equivalent national assessment exercises have a particular incentive to get this right now: data management practice is increasingly scrutinised as part of research environment statements, and a portfolio of well-governed, genuinely FAIR datasets is a defensible evidence base in a way that a folder of unlinked spreadsheets is not.

    What This Means for Research Administrators

    For research administrators, EARMA and ARMA members, and institutional research office staff, the ten-year mark for FAIR is a natural prompt to audit rather than assume compliance. Three actions stand out as immediately actionable:

    First, audit repository defaults. Check whether your institutional repository mints DOIs automatically, requires a licence selection at deposit, and exposes metadata to standard harvesting protocols. If any of these is missing, that is a findability or reusability gap regardless of how the policy documents read.

    Second, build ORCID, ROR, and DataCite/CrossRef linkage into deposit workflows as mandatory fields, not optional extras. This is the lowest-cost, highest-leverage intervention available to most institutions and directly strengthens the Findable and Interoperable pillars.

    Third, extend data governance conversations to explicitly include CARE alongside FAIR wherever research involves Indigenous communities, sensitive population data, or community-held knowledge. Reviewers, ethics committees, and increasingly funders are asking for both.

    Looking Ahead

    As FAIR approaches its tenth anniversary, the framework’s core insight — that data value compounds when it is genuinely findable, accessible, interoperable, and reusable — remains sound. What has changed is the level of scrutiny applied to claims of compliance. Funders, publishers, and institutions themselves are moving from asking “do you have a data management plan?” to asking “does your data actually behave like FAIR data?” For research administrators, closing that gap between policy and practice — with the infrastructure, governance, and plan quality to match — is the work of the next decade, not just the anniversary year.

  • UKRI’s New Research Data Policy: A Plain-English Briefing for Institutional Administrators

    UKRI is expected to publish an updated research data policy in summer 2026, and institutional research offices should not wait for the final text to start preparing. Signals from UKRI’s existing Common Principles on Data Policy, its 2022 open access policy, and the broader direction of travel across funders point clearly toward a single organising idea: “maximising data value.” For research administrators, that phrase is not a slogan — it is a compliance requirement in waiting, and it will touch data management plans, persistent identifiers, and the systems that track them long before any enforcement clock starts ticking.

    The pattern is familiar. When the UKRI open access policy took effect for journal articles in 2022 and for monographs in 2024, institutions that had already invested in repository infrastructure, author identifier hygiene, and rights-retention workflows absorbed the change with minimal disruption. Those that had not scrambled. A forthcoming UKRI research data policy is likely to follow the same script, extending the funder’s open research agenda from published articles to the underlying datasets, code, and materials that support them.

    This briefing sets out, in plain English, what “maximising data value” is likely to mean operationally, and what a research data management policy readiness checklist should contain before the formal text arrives.

    What “Maximising Data Value” Means for a UKRI Research Data Policy

    UKRI’s framing of data value draws directly on the FAIR principles — Findable, Accessible, Interoperable, and Reusable — first articulated in the scientific data community and now embedded in funder expectations across the UK, the EU’s Horizon Europe programme, and beyond. In practice, “maximising value” is unlikely to mean simply “publish more data.” It means data that can be discovered through standard metadata, accessed under clear licensing terms, described in formats other researchers’ tools can parse, and reused with enough provenance information to trust it.

    For administrators, the operational translation is threefold:

    • Findable — datasets need persistent identifiers and rich, machine-readable metadata, typically registered through services such as DataCite, so they surface in discovery tools rather than sitting on an unindexed institutional server.
    • Accessible — access conditions (open, embargoed, or restricted for sensitive data) must be stated explicitly and consistently, not left to individual researcher discretion.
    • Interoperable and Reusable — data needs documented standards, controlled vocabularies where they exist, and licensing that permits reuse, mirroring the rights-retention logic already familiar from open access compliance.

    None of this is achievable researcher-by-researcher at the point of grant closeout. It requires infrastructure that exists before the data is generated — which is precisely why an anticipatory approach matters more than a reactive one.

    Data Management Plans as the Compliance Backbone

    Data management plans (DMPs) are the mechanism through which funders convert data policy principles into auditable commitments. UKRI councils already require DMPs for many grant types, but a unified data policy is likely to standardise expectations across councils that have historically varied — a source of persistent friction for multi-council and interdisciplinary awards.

    Institutions should treat the DMP not as a one-off grant-application document but as a living compliance artefact, reviewed at key milestones: award, mid-project, and closeout. This is where the overlap with research integrity policy becomes explicit. Bodies such as COPE and the UK’s own research integrity infrastructure have repeatedly linked poor data stewardship — undocumented provenance, irreproducible datasets, unclear authorship of derived outputs — to the conditions that enable disputes and, in the worst cases, retractions tracked by services such as Retraction Watch. A robust DMP process is therefore not merely an administrative box to tick; it is a frontline research integrity control.

    Administrators should also expect closer alignment between DMP compliance and the CRediT contributor role taxonomy, which clarifies who is responsible for which stage of data collection, curation, and analysis. CASRAI originated the CRediT contributor role taxonomy in 2014. The standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. Institutions that already map CRediT roles into their publication workflows are well placed to extend the same logic to dataset contributorship statements.

    Persistent Identifiers: The Infrastructure Layer Nobody Notices Until It’s Missing

    Persistent identifiers (PIDs) are the connective tissue of any credible research data infrastructure, and they are the single most concrete thing an institution can fix before a policy lands. Three PIDs matter most:

    • ORCID identifiers for researchers, now widely mandated across funder and publisher workflows, ensuring datasets are correctly attributed even when authors move institutions or change names.
    • ROR (Research Organization Registry) identifiers for institutional affiliation, increasingly required alongside ORCID to disambiguate which organisation is accountable for which output.
    • DataCite DOIs for the datasets themselves, giving each dataset a citable, resolvable, permanent address independent of where it happens to be hosted.

    CrossRef DOIs for articles and DataCite DOIs for datasets should be linked bidirectionally wherever possible, so that a published paper and its underlying data form a verifiable pair. Institutions that have not yet audited their systems for consistent ORCID and ROR capture — particularly in their electronic research administration platforms, current research information systems, and repository intake forms — should treat this as the highest-priority, lowest-cost preparation step available. It requires no new policy to justify; it improves compliance readiness for every funder mandate, not just UKRI’s.

    What This Means for Research Administrators

    The institutions best positioned for a summer 2026 policy announcement will not be the ones that read it fastest — they will be the ones whose sponsored research administration infrastructure already produces compliant metadata as a by-product of normal grant management, rather than as a bolt-on exercise triggered by audit anxiety. Practical steps worth starting now include:

    • Auditing current DMP templates against FAIR principles and standardising them across faculties or research councils where practice has diverged.
    • Confirming that ORCID and ROR capture is mandatory, not optional, at the point of grant setup within the institution’s research administration system.
    • Establishing or reviewing institutional agreements with DataCite (directly or via a national or subject repository) for dataset DOI minting.
    • Mapping data stewardship responsibilities using a CRediT-style contributor framework, so accountability for data quality is documented rather than assumed.
    • Briefing research integrity offices now, so that data policy compliance is understood as an extension of existing research integrity policy rather than a parallel, competing process.

    Professional bodies including ARMA, NCURA, EARMA, and INORMS have all flagged funder data mandates as a growing training and resourcing need for research administrators; institutions that engage with these networks now will have a head start on interpreting whatever UKRI ultimately publishes.

    Looking Ahead

    A formal UKRI research data policy, when it arrives, will almost certainly be framed around the language of value, openness, and reuse rather than restriction. But the operational substance — FAIR-compliant metadata, disciplined data management plans, and consistent use of persistent identifiers — is already knowable, and already actionable. Institutions that treat the coming months as a compliance sprint rather than a waiting period will be the ones for whom “maximising data value” is simply a description of how they already work, not a new burden imposed from outside.