Tag: data management and sharing plan

  • Data Sharing Policy: A Research Office Template

    A data sharing policy is the institution-wide governance document that sets expectations for how researchers plan, deposit, and share research data — distinct from a data sharing agreement, which is the specific legal contract governing one data transfer. Research offices write policies to translate funder FAIR data mandates, such as the NIH’s 2023 Data Management and Sharing Policy, into consistent local practice.

    A data sharing policy is an institutional statement of principle and requirement: it tells every researcher, department, and grant applicant what the organisation expects of them before, during, and after a funded project, regardless of discipline or funder. It is not a substitute for a project-level data management plan (DMP), and it is not the same document as a data sharing agreement — the confusion between the two is the single most common drafting mistake research offices make.

    What is an institutional data sharing policy?

    An institutional data sharing policy is a governance document, usually owned jointly by the research office, library, and IT services, that sets baseline rules for how the organisation’s researchers manage and share the data underlying their published outputs. It applies across all disciplines and funders, rather than to a single grant.

    Published examples illustrate the range: the Office for National Statistics operates a data sharing policy governing record-level personal information, while Cancer Research UK’s data sharing and management policy sets FAIR-aligned requirements as a condition of every grant it awards. Both share a common shape — purpose, scope, principles, requirements, and named responsibilities — even though one governs a funder’s grant conditions and the other governs a public body’s statistical data.

    For a research office, the policy is the document that makes funder requirements operational at institutional scale: instead of each principal investigator interpreting a funder’s data mandate independently, the institution issues one interpretation, one set of approved repositories, and one escalation route for exceptions.

    Why research offices need a data sharing policy now

    Research offices need a written policy because funders increasingly make data sharing a condition of funding, not a recommendation, and institutions without a policy leave researchers to interpret those conditions inconsistently — which creates compliance risk at renewal, audit, and publication stages.

    The mandate landscape has hardened over the past decade:

    • NIH’s 2023 Data Management and Sharing Policy took effect on 25 January 2023 and requires a data management and sharing plan for essentially all NIH-funded research, reviewed alongside the science.
    • UKRI is a signatory to the 2016 Concordat on Open Research Data, which commits funded institutions to making research data openly available with as few restrictions as possible.
    • Horizon Europe’s Model Grant Agreement requires a FAIR-aligned data management plan for participating projects, applying the “as open as possible, as closed as necessary” principle carried over from Horizon 2020.
    • ICMJE’s data sharing statement requirement has applied to clinical trials that began enrolling participants on or after 1 January 2019, requiring a data availability statement as a condition of publication in ICMJE-following journals.

    Each of these mandates is written at the funder level. The institutional policy is what converts them into a single, consistent set of expectations that a research office can actually train staff on and audit against.

    Data sharing policy vs data sharing agreement

    A data sharing policy and a data sharing agreement solve different problems: the policy is a standing, institution-wide statement of expectations, while the agreement is a one-off legal contract governing a specific transfer of specific data between specific parties. Research offices need both, but they are drafted, owned, and reviewed differently.

    Aspect Institutional data sharing policy Data sharing agreement
    Scope All researchers, all funded projects, ongoing One dataset, one recipient, one purpose
    Trigger Institutional governance cycle A specific request or collaboration
    Legal status Internal policy; not itself a contract Binding contract, often referencing UK GDPR
    Typical owner Research office, library, IT, ethics committee Data protection officer, legal counsel
    Reviewed by Institution, periodically Both parties, per transfer

    A well-written policy should explicitly state this distinction and point researchers to the correct process for each: the policy for general expectations and deposit requirements, the agreement (or a data protection impact assessment) for any transfer involving personal, sensitive, or third-party data governed by UK GDPR.

    Template structure: what to include

    A usable institutional data sharing policy needs roughly eight components, moving from purpose through to enforcement, so that researchers and reviewers can find any given requirement in under a minute.

    1. Preamble and purpose — why the institution requires data sharing and its relationship to the FAIR principles, first published in Scientific Data in 2016.
    2. Scope — which staff, students, and data (all disciplines, all funders, or funder-specific) the policy covers.
    3. Definitions — research data, metadata, persistent identifier, data management plan, repository.
    4. Policy statements — the DMP requirement, repository and persistent-identifier expectations, metadata standards, data licensing, and minimum retention period.
    5. Data availability statements — a requirement that publications state how and where the underlying data can be accessed.
    6. Roles and responsibilities — what is expected of researchers, the research office, the library, IT, and departmental leadership.
    7. Exceptions and embargoes — the process for restricting access on ethical, legal, or commercial grounds.
    8. Review and implementation — the cycle on which the policy itself is revisited against evolving funder mandates.
    Section What it should specify
    Data deposit Named or criteria-based approved repositories, with a preference for those issuing DOIs via DataCite
    Persistent identifiers ORCID for researchers; DOIs for datasets
    Contributor recognition Use of Contributor Role Taxonomy (CRediT) statements so data curation and stewardship work is credited
    Retention A specific minimum period (commonly ten years post-publication) rather than an open-ended commitment
    Sensitive data A named route to ethics and data protection review before any exception is granted

    Note that CASRAI originated the CRediT contributor role taxonomy in 2014; the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022, and institutional policies that reference it should cite NISO, not CASRAI, as the current maintaining body.

    Frequently asked questions and next steps

    Is a data sharing agreement legally required?

    A data sharing agreement is not universally mandated by statute in the UK, but it is required in practice whenever personal or confidential data is transferred between organisations under UK GDPR, and it is frequently a condition set by funders or ethics committees. An institutional data sharing policy is separate and is typically a funder or institutional requirement rather than a legal one.

    What is the data sharing law in the UK?

    UK data sharing is governed primarily by the UK GDPR and the Data Protection Act 2018, which set the rules for handling personal data, alongside the common law of confidentiality. Research data policies must operate within this framework whenever datasets contain identifiable or sensitive personal information, in addition to meeting funder FAIR requirements.

    What are the six key data sharing principles?

    Widely cited data sharing principles hold that shared information should be necessary, proportionate, relevant, accurate, timely, and secure. Institutional research data policies should apply the same discipline alongside FAIR — findable, accessible, interoperable, reusable — so that openness and data protection obligations are handled together rather than in conflict.

    Once a first draft exists, research offices should route it through the same stakeholders named in the policy itself — library, IT, ethics, and legal — before it goes to institutional governance for sign-off, and set a firm review date rather than leaving the document to lapse.

    As funders continue tightening data mandates, from NIH’s 2023 policy to Horizon Europe’s FAIR requirements, institutions without a current, clearly scoped policy will increasingly find researchers improvising compliance at the point of grant application — precisely the risk a written data sharing policy is designed to remove. Research offices that keep the policy distinct from the data sharing agreement, and review it on a fixed cycle, are best placed to keep pace with the next round of funder requirements.

  • NIH Genomic Data Sharing Policy vs DMS Policy

    The NIH Genomic Data Sharing (GDS) Policy and the NIH Data Management and Sharing (DMS) Policy are two separate, still-active NIH policies with different effective dates, different scopes and different submission points — the GDS Policy (2015) governs consent and controlled access for large-scale genomic data, while the DMS Policy (2023) governs data management planning for all NIH-funded scientific data. Grantees who assume the 2023 policy absorbed the 2015 one risk missing a distinct compliance step.

    The NIH Genomic Data Sharing Policy is the funder requirement, effective since 25 January 2015 under Notice NOT-OD-14-124, that governs consent-based data use limitations, controlled-access repositories and data release timelines for large-scale human and non-human genomic data generated with NIH support.

    Table of Contents

    What Is the NIH Genomic Data Sharing (GDS) Policy?

    The GDS Policy replaced NIH’s 2007 Genome-Wide Association Studies (GWAS) data-sharing policy and extended its logic to a wider set of genomic technologies. It applies to studies that generate large-scale human or non-human genomic data, including genome-wide association studies, single nucleotide polymorphism (SNP) arrays, whole-genome and whole-exome sequence data, transcriptomic data and epigenomic data produced by array-based or high-throughput sequencing platforms.

    Two features distinguish it from a generic sharing mandate:

    • A two-tiered access model — unrestricted (open) data versus controlled-access data held in a repository such as dbGaP, the NIH database of Genotypes and Phenotypes.
    • A consent-based data use limitation system, under which informed consent documents must state what data types will be shared and whether access will be open or controlled, so that secondary users are legally and ethically bound to the participant’s original consent.

    The National Human Genome Research Institute (NHGRI) implements the policy operationally through Notices NOT-HG-15-038 and NOT-HG-20-011, and designates AnVIL alongside dbGaP as primary repositories for NHGRI-funded genomic data.

    How Does the GDS Policy Differ From the DMS Policy?

    The NIH Data Management and Sharing Policy, effective 25 January 2023 under Notice NOT-OD-21-013, is far broader in scope. It applies to essentially all NIH-funded research producing “scientific data” — any data commonly accepted in the field as sufficient to validate and replicate findings — not only genomic data. It requires a data management and sharing plan with every competing grant application, whereas the GDS Policy’s genomic-specific requirements historically attached at the Just-in-Time stage, after review but before award.

    NIH has since directed that the two policies be harmonised into a single submission: where a project is subject to both, the genomic-specific elements (consent language, data type, repository choice, controlled- versus open-access designation) are folded into one data management and sharing plan rather than filed as two separate documents. The table below sets out where the policies still diverge.

    Feature GDS Policy (2015) DMS Policy (2023)
    Governing notice NOT-OD-14-124 NOT-OD-21-013
    Effective date 25 January 2015 25 January 2023
    Scope Large-scale human and non-human genomic data All NIH-funded scientific data, any type
    Core document Genomic Data Sharing Plan + Institutional Certification Data management and sharing plan
    Consent mechanism Consent-based data use limitations, enforced via dbGaP Data Access Committees General “justifiable limitations” language; no genomic-specific consent tiers
    Typical repository dbGaP, AnVIL (controlled- or open-access) Any NIH-designated or discipline-appropriate research data repository
    Budget provision Not addressed directly Explicitly allows data management and sharing costs in the budget

    Who Must Submit an Institutional Certification?

    An Institutional Certification is a GDS Policy-specific attestation — separate from the data management and sharing plan — that the institution has reviewed the consent language, IRB approval and data use limitations attached to the human genomic data before it is deposited in a controlled-access repository. It is not required by the DMS Policy for non-genomic data.

    Institutions must certify, among other things, that:

    • The data was collected in a manner consistent with 45 CFR 46 (the Common Rule) and applicable state and local laws.
    • Consent forms permit the specific type of data use requested (general research use versus disease-specific use).
    • Identifiers have been removed or the data otherwise meets the applicable de-identification standard.

    Because this certification is a distinct compliance artefact from the data management and sharing plan, research administrators who track only DMS Plan compliance can miss it entirely on genomic awards.

    How Does Controlled Access Work Under the GDS Policy?

    Controlled-access genomic data sits in dbGaP behind a Data Access Committee (DAC) review process. Secondary users submit a data access request describing their intended research use; the DAC checks that use against the consent-based data use limitation recorded for that dataset before granting access. This is materially different from the DMS Policy’s general expectation of “broadest appropriate sharing,” which does not itself impose a use-limitation enforcement layer — that enforcement mechanism is a GDS-specific feature.

    Answer-First Q&A

    Does the 2023 DMS Policy Replace the 2015 GDS Policy?

    No. The DMS Policy did not replace or repeal the GDS Policy; both remain in force. NIH’s own guidance directs grantees generating large-scale genomic data to satisfy GDS-specific requirements — informed consent language, Institutional Certification, controlled-access designation — within the single data management and sharing plan required by the DMS Policy, rather than as an independent document.

    What Counts as “Large-Scale” Genomic Data Under the GDS Policy?

    NIH does not set one fixed threshold; NHGRI and other institutes assess scale case by case, typically referencing genome-wide association studies, whole-genome or whole-exome sequencing, and array-based platforms as presumptively “large-scale.” Investigators with borderline projects should confirm applicability with their institute’s program officer before submission, since NHGRI also encourages voluntary sharing of smaller datasets.

    When Is the Institutional Certification Submitted?

    The Institutional Certification is submitted at the Just-in-Time stage — after peer review, once an application is being considered for funding — not with the initial application. This differs from the data management and sharing plan itself, which NIH requires as part of the competing application under the DMS Policy.

    Which Repository Satisfies the GDS Policy?

    NIH designates dbGaP for controlled-access human genomic data and, for NHGRI-funded work specifically, AnVIL as the primary repository accepting both controlled- and open-access data. Investigators may propose an alternative repository in the data management and sharing plan, subject to institute approval before funding.

    Implications for Research Administrators

    The practical risk is not policy conflict but a compliance gap: an office that maps its DMS Policy checklist to grant application review alone will miss the GDS Policy’s Just-in-Time Institutional Certification and its ongoing dbGaP registration obligations. Research administration offices supporting genomic PIs need two intake questions, not one — does this award generate large-scale genomic data, and if so, has the Institutional Certification been routed separately from the data management and sharing plan.

    As NIH continues to harmonise guidance across institutes, expect more sub-policies — clinical trials data sharing, foreign genomic data transfer rules — to layer onto rather than replace the DMS Policy’s baseline. Treating “DMS compliance” as a single checkbox will increasingly understate what a genomics-heavy award actually requires.

  • Data Availability Statement Not Applicable Rules

    A data availability statement (DAS) reading “not applicable” is defensible only in narrow, specific circumstances — chiefly when no new data were generated or analysed, when data are proprietary clinical or commercial records, or when a legal or ethical restriction genuinely blocks disclosure. Outside those cases, “not applicable” is increasingly flagged by editors and funders as a red flag rather than a compliant statement.

    A data availability statement is a mandatory or recommended manuscript section, usually placed before the references, that tells readers where the data underpinning a study’s findings can be found and under what conditions they can be accessed. Since most major publishers (Springer Nature, Wiley, Taylor & Francis, PLOS) now require a DAS on every research article, “not applicable” has become one of the most commonly misused entries in it — and one of the most commonly queried at copyediting or peer-review stage.

    When is “not applicable” a defensible data availability statement?

    “Not applicable” is defensible when it is factually true that no dataset exists to disclose. Taylor & Francis’s author-services template lists this explicitly as one option among many, with the standard wording: “Data sharing is not applicable to this article as no new data were created or analyzed in this study.” Springer Nature uses near-identical phrasing for theoretical and mathematical papers that involve no empirical dataset.

    Three case types consistently pass editorial and funder scrutiny:

    • No new data generated. Review articles, theoretical papers, editorials, commentaries, book reviews, and hypothesis or proposal papers that synthesise existing literature rather than produce new datasets.
    • Genuinely proprietary or clinical data under contractual control. Data held by a third-party sponsor, clinical trial data governed by a data-use agreement the author cannot unilaterally waive, or commercially embargoed findings pending patent filing.
    • Data restricted by law or binding ethics approval. National statistical agency microdata, patient-level clinical records where the original informed-consent language did not cover public sharing, or datasets covered by data-protection legislation such as UK GDPR.

    When does “not applicable” trigger an editorial or funder query?

    “Not applicable” triggers a query whenever a study plainly did generate or analyse data but the statement fails to say why access is restricted. PLOS’s data-availability policy, in force for all research articles submitted since March 2014, states that the “not applicable” exemption applies only to article types that structurally contain no dataset — not to empirical studies that simply prefer not to share.

    Cranfield University’s research-data-management guidance explicitly names “Availability of data and materials: ‘Not applicable’” as an example of an unclear statement when used on an empirical paper, because it gives the reader no route to verification. That is the core distinction editors are trained to apply: “not applicable” answers “does a dataset exist?”, not “will you share it?” Using it to avoid disclosing data that does exist — without stating a legal, ethical or commercial restriction — is what draws a production-stage or peer-review query.

    Statement pattern Typically accepted? Why
    “Not applicable — no new data generated” Yes Factually verifiable from article type
    “Not applicable” on an empirical/quantitative study No — triggers query Data exists; statement misrepresents the situation
    “Data available on request from the corresponding author” Conditional Only under Basic or Share-Upon-Request publisher policies; must name the restriction
    “Data not available due to [named] ethical/legal/commercial restriction” Yes Restriction is stated and attributable
    Silence / statement omitted entirely No — triggers query Most publishers now mandate a DAS on every submission

    “Available on request” versus “not applicable”: are they the same thing?

    No — they answer different questions and are not interchangeable. A data availability statement upon request concedes that a dataset exists but sets a conditional access route (typically via the corresponding author), whereas “not applicable” asserts that no dataset exists at all. Taylor & Francis restricts “available on request” wording to journals operating under its Basic or Share Upon Reasonable Request policies; it is not a universal fallback.

    Editors increasingly scrutinise “available on request” statements too, following widely reported non-responsiveness rates in follow-up author contact — a dynamic documented in reproducibility literature and discussed on researcher forums such as Reddit’s r/AskAcademia. A defensible “on request” statement names the corresponding author’s role, the reason data are not openly deposited (privacy, participant consent, third-party licence), and — where a repository embargo applies — the release date.

    How do funder data-sharing mandates change the calculus?

    Funder policy increasingly overrides publisher-level flexibility on “not applicable.” Under the NIH Data Management and Sharing Policy, effective for all applications submitted on or after 25 January 2023, NIH-funded research that generates scientific data must include a Data Management and Sharing Plan — “not applicable” is only accepted where the award genuinely produces no scientific data (e.g. some career-development or infrastructure awards).

    In the UK, UKRI’s Common Principles on Data Policy and the underlying Concordat on Open Research Data set an expectation that publicly funded research data be made as open as possible, as restricted as necessary — meaning a “not applicable” statement on a UKRI-funded empirical study should be paired with a funder-facing data management plan explaining the exemption, not left to stand alone. The ICMJE data-sharing statement requirement, in effect for clinical trials that began enrolment on or after 1 January 2019, similarly mandates a specific data-sharing statement in the trial registration and the manuscript; a bare “not applicable” does not satisfy it for an enrolling trial.

    • Check the specific funder mandate before defaulting to “not applicable” — publisher policy and funder policy are separate compliance layers.
    • Where a funder plan exists (e.g. an NIH DMS Plan or a Horizon Europe data management plan under cOAlition S expectations), reference it rather than repeating a bare exemption.
    • For systematic reviews specifically, a data availability statement for systematic review should confirm whether extracted data tables, search strategies, or code are available, even though no primary dataset was generated — “not applicable” applies only to the absence of new primary data, not to the review’s own extraction materials.

    Answer-first Q&A

    What do you write in a data availability statement?

    A compliant data availability statement names where the data live (repository, supplementary file, or “not applicable” with a reason), includes a DOI or accession number where one exists, and states any access conditions. Reviews, theoretical papers, and studies with no new dataset should say so explicitly rather than leaving the section blank.

    What is the data availability statement data not available?

    A “data not available” statement means the underlying data exist but access is restricted — for ethical, legal, or commercial reasons — and the restriction must be named. This differs from “not applicable,” which asserts no dataset was ever created. Conflating the two is the single most common cause of an editorial query at submission or production stage.

    What does data availability mean?

    Data availability describes whether, and how, the dataset behind a study’s findings can be accessed by readers and reproducibility auditors. Publishers including Springer Nature and PLOS treat the statement as a mandatory element of the peer-review record, on equal footing with author contributions and conflict-of-interest disclosures.

    Implications for research administrators

    Research offices and library data-management teams are best placed to catch a misapplied “not applicable” before submission, because they hold institutional visibility across a researcher’s funder obligations that a single-article editor does not. A pre-submission check against the relevant funder’s data policy — UKRI, NIH, or a Horizon Europe grant agreement — will catch the majority of cases where “not applicable” would otherwise be accepted by a publisher’s automated submission system but later queried by a funder compliance audit.

    As funder data-sharing mandates tighten and publishers add automated DAS-completeness checks at submission, the margin for a generic “not applicable” will keep narrowing. Authors and administrators who document the specific reason — no new data, named legal restriction, or named commercial embargo — will clear both editorial and funder review; those who use it as a default will increasingly find it queried, not accepted.

    For related terminology, see the CASRAI Research Glossary and the CASRAI-originated CRediT contributor role taxonomy, now stewarded by NISO as ANSI/NISO Z39.104-2022, which governs how data-curation and formal-analysis contributions are credited alongside data availability disclosures.

  • Clinical Data Management Plan vs Research Data Management Plan: What’s the Difference

    On this page:

    A clinical data management plan and a research data management plan are two of the most frequently conflated documents in the clinical trial lifecycle. Both use the acronym “DMP” in casual conversation, both get drafted before a study starts, and both concern “data” in the broadest sense — but they answer to different masters, cover different lifecycle stages, and are read by different audiences. Submitting the wrong one to the wrong reviewer is a recurring, avoidable compliance headache for trial units and research offices alike.

    What Is a Clinical Data Management Plan?

    A Clinical Data Management Plan (CDMP) is an operational, trial-specific document that describes exactly how data will move from case report form (CRF) to locked database. It is written by or with the clinical data management (CDM) function — not the principal investigator’s grants office — and it sits alongside the protocol as one of the working documents that Good Clinical Practice (GCP), per ICH E6, expects a sponsor to maintain and be able to produce on inspection.

    A CDMP typically specifies:

    • CRF or eCRF design and the electronic data capture (EDC) system to be used
    • Database build, edit-check specifications and data validation rules
    • Data entry conventions (single vs double entry, query turnaround)
    • Medical coding dictionaries and versions, such as MedDRA and the WHO Drug Dictionary
    • Discrepancy management and serious adverse event reconciliation procedures
    • Roles, responsibilities and sign-off authority for database lock

    Because it is inspected against GCP, a CDMP is a living, version-controlled document updated through the study rather than filed once and forgotten.

    What Is a Research Data Management Plan?

    A Research Data Management Plan (RDMP) is a funder- or institution-facing document submitted at the grant proposal stage, well before a trial’s CDMP would even exist. Its job is compliance with funder and institutional data policy, not trial operations. UK Research and Innovation (UKRI) requires a data management plan for relevant grant applications, Horizon Europe applicants complete one through the Data Management Plan template built into the Horizon Europe Programme Guide, and the NIH Data Management and Sharing (DMS) Policy has required a DMS plan for NIH-funded research since January 2023.

    An RDMP typically covers:

    • What data types and volumes the project will generate or reuse
    • How data will be described, documented and made findable (metadata, identifiers)
    • Storage, security and access-control arrangements during the project
    • Ethical, consent and legal constraints on sharing (particularly for identifiable participant data)
    • Long-term preservation and repository plans, often with a DOI issued via DataCite
    • Alignment with the FAIR principles — Findable, Accessible, Interoperable, Reusable

    Unlike a CDMP, an RDMP is reviewed once (or at defined milestones) by a funder or research office, not audited line-by-line by a regulator during a GCP inspection.

    CDMP vs RDMP: Side-by-Side Comparison

    The table below sets out where the two documents genuinely diverge, so institutions running funded clinical trials know they usually need both — not one instead of the other.

    Dimension Clinical Data Management Plan (CDMP) Research Data Management Plan (RDMP)
    Primary purpose Ensure trial data is accurate, complete and audit-ready for database lock Satisfy funder/institutional policy on data stewardship and sharing
    Governing framework ICH E6 Good Clinical Practice; sponsor/CRO SOPs Funder mandates (UKRI, NIH, Horizon Europe); institutional RDM policy
    Typical author Data manager / clinical data management lead Principal investigator, often with library or research office support
    Created at Study set-up, before first patient enrolled Grant proposal stage, before funding is awarded
    Primary audience CDM team, biostatisticians, sponsor, regulatory inspectors Funder, ethics/IRB reviewers, institutional research office
    Content focus CRF design, edit checks, coding, database lock procedures Data description, storage, ethics, sharing, long-term preservation
    Review cadence Continuously updated through study conduct; inspected on audit Reviewed at proposal and, for some funders, at defined milestones

    Common Questions Answered

    What does a clinical data management plan include?

    A clinical data management plan includes CRF or eCRF specification, database design, data entry and validation procedures, edit-check logic, medical coding dictionaries such as MedDRA, discrepancy and adverse-event reconciliation processes, and clearly defined roles and responsibilities through to database lock, all maintained as a living, version-controlled document inspected under Good Clinical Practice.

    What should a data management plan include?

    A funder-facing research data management plan should describe the data types and volumes a project will generate, how data will be documented and made findable through metadata, storage and security arrangements, ethical and consent constraints on sharing identifiable data, and the eventual repository and preservation route, typically aligned to the FAIR data principles.

    What are the three phases of clinical data management?

    Clinical data management is generally organised into three sequential phases: study set-up, covering database build and CRF design; study conduct, covering data entry, cleaning and query resolution; and study close-out, covering final reconciliation, coding sign-off and database lock ahead of statistical analysis.

    Why the Distinction Matters for Research Administrators

    Institutions running externally funded clinical trials almost always need both documents, produced by different teams on different timelines. A funder reviewer looking for a FAIR-aligned sharing and preservation strategy will not find it in a CDMP’s edit-check specification — and a GCP inspector auditing database lock will not accept an RDMP’s high-level data-sharing statement as evidence of query resolution procedure.

    This is precisely the coordination gap that research administration functions increasingly exist to close: aligning the pre-award compliance document (the RDMP, owned by the grants office) with the operational trial document (the CDMP, owned by clinical data management) so that neither is quietly missing when a funder audit or a regulatory inspection arrives. Institutions that treat the two as interchangeable risk both funder non-compliance and GCP findings — for two entirely separate reasons.

    Consistent terminology helps here. Reviewers, auditors and research offices benefit from a shared reference for what each document is called and what it covers; the CASRAI research administration dictionary maintains definitions for terms that span exactly this pre-award-to-conduct boundary.

    Looking Ahead

    The line between the two documents is not static. ICH’s ongoing revision of E6 Good Clinical Practice has pushed sponsors toward more explicit, risk-based data governance language inside the CDMP itself, while funders such as UKRI and the NIH continue to tighten expectations for FAIR-aligned sharing inside the RDMP. Institutions that keep the two plans distinct — but explicitly cross-referenced — will be best placed to satisfy both regulators and funders as each side’s requirements keep evolving.

  • Machine-Actionable Data Management Plans: What Changes

    Data management plans (DMPs) have traditionally been static, prose documents written once at proposal stage and rarely opened again. That is changing. Funders, repositories and institutional systems are converging on machine-actionable data management plans (maDMPs) — DMPs structured so that software, not just people, can read and act on them. The shift is being driven by the RDA DMP Common Standard, a specification from the Research Data Alliance that turns free-text plans into structured, exchangeable data. This article explains what “machine-actionable” means in practice, what the standard actually changes, which tools implement it, and why funders are pushing the sector in this direction.

    What “Machine-Actionable” Actually Means

    A conventional DMP is a Word document or PDF: a human writes prose describing what data will be collected, how it will be stored, and where it will end up. A reviewer reads it once, files it, and rarely revisits it. Nothing in that document can be queried, validated automatically, or passed to another system without someone re-typing it.

    A machine-actionable DMP replaces (or accompanies) that prose with structured fields — dataset descriptions, distribution details, metadata standards, licences, repository identifiers — encoded so that a repository, funder portal, or research information system (CRIS) can parse them directly. The foundational framing paper, Ten Principles for Machine-Actionable Data Management Plans (Miksa, Simms, Mietchen & Jones, PLOS Computational Biology, 2019, cited over 130 times), describes the goal as embedding DMPs in existing research workflows so parts of the plan can be generated, validated and updated automatically rather than retyped at every stage.

    • Structured, not free-text — fields for dataset type, format, volume, access conditions and repository are discrete and machine-parseable.
    • A living document — updated through the project lifecycle rather than filed once and forgotten.
    • Interoperable — exportable between DMP tools, repositories, CRIS platforms and funder systems without manual re-entry.
    • Partially automatable — some fields (e.g. ORCID iDs, grant metadata, repository policies) can be pre-filled from connected systems.

    Definitions of related research-data terms are catalogued in the CASRAI Dictionary.

    The RDA DMP Common Standard: What It Changes

    The RDA DMP Common Standard for Machine-actionable Data Management Plans, developed by an RDA working group, defines a shared JSON schema for representing a DMP’s core elements: project and funder metadata, one or more datasets, each dataset’s distribution (repository, licence, access level), and the metadata standards applied to it. The schema is published and version-controlled openly on GitHub, so any tool builder can implement it without licensing constraints.

    Before a common schema existed, each DMP tool stored plans in its own proprietary structure. A plan created in one system could not be meaningfully exported to another, and funders could not aggregate structured data across grant portfolios without manual extraction. The Common Standard changes that by giving every participating tool the same underlying data model, so a DMP authored in one platform can, in principle, be exported as valid maDMP JSON and ingested by another.

    This matters most at the points where a DMP currently has to be re-keyed: submitting to a funder portal, registering a dataset with a repository, and reporting compliance at project close. A structured, standard-conformant DMP removes several of those manual hand-offs.

    Which Tools Implement the Standard

    Three tools dominate current maDMP practice, each maintained by a different non-profit research-infrastructure organisation:

    Tool Maintaining organisation Primary user base maDMP support
    DMPonline Digital Curation Centre (DCC), University of Edinburgh UK and international institutions API and structured export aligned to the RDA Common Standard
    DMPTool California Digital Library (CDL/UC3) US universities and federal-grant researchers Templates mapped to funder requirements; RDA-aligned export in progress
    ARGOS OpenAIRE, originally built under the EU FAIRsFAIR project Horizon Europe and EOSC-affiliated researchers Native maDMP JSON, direct repository and metadata-standard linking

    DMPonline and DMPTool both originated as template-driven questionnaires aligned to specific funder wording, then layered structured export on top as the Common Standard matured. ARGOS was built later, directly on the RDA schema, as part of the EU-funded FAIRsFAIR (“Fostering FAIR Data Practices in Europe”) project, which is why it links more natively to repositories and metadata standards rather than treating them as free-text fields. Institutions choosing between them should check which one their funder or repository already exchanges data with, rather than assuming full interoperability across all three.

    Why Funders Are Moving in This Direction

    Funders adopted DMP requirements originally to make researchers think about data stewardship before, not after, the fact. Horizon Europe requires a DMP as a formal deliverable for data-generating projects, due within six months of the project start and updated at least at the mid-term and final reporting points — a recurring obligation that is far easier to track programmatically than by re-reading prose each time. The US National Institutes of Health introduced its Data Management and Sharing Policy in 2023, requiring a DMS plan for every funded project involving scientific data, which has pushed US institutions toward tools that can validate plans at scale rather than review them manually.

    For funders managing thousands of active grants, machine-actionable plans mean compliance can be checked computationally — flagging, for instance, a dataset with no named repository or an access licence inconsistent with funder policy — instead of requiring programme officers to re-read each document individually. For research administrators, the practical benefit is fewer duplicate data-entry tasks across grant systems, repositories and institutional CRIS platforms, and DMPs that can be audited at renewal or close-out without starting from scratch.

    Common Questions About Machine-Actionable DMPs

    What is a machine-actionable data management plan?

    A machine-actionable data management plan (maDMP) is a DMP whose content is structured — typically as JSON conforming to the RDA DMP Common Standard — so that repositories, funder systems and research information platforms can read, validate and act on it automatically, rather than relying on a human re-reading free-form prose.

    What should a data management plan include?

    A DMP typically describes the types and volume of data to be generated, metadata standards applied, storage and security arrangements, ethical and legal considerations, roles and responsibilities, and the data-sharing and long-term preservation plan, including the intended repository and access licence.

    Why is research data management important?

    Sound research data management improves the integrity, reproducibility and reuse value of research outputs. It ensures data remain findable and accessible after a project ends, satisfies funder and publisher mandates, and reduces the risk that valuable data become unusable or unrecoverable once the original team disperses.

    The direction of travel is clear: DMPs are moving from a one-off compliance document to structured metadata that persists and updates across a project’s life, feeding repositories, funder reporting and institutional systems without re-transcription. Institutions that adopt an RDA-aligned tool now — DMPonline, DMPTool or ARGOS — are better positioned as more funders begin to require, rather than merely accept, structured plans.