Tag: fair data management

  • Data Management Plan Templates by Discipline

    A data management plan template sets out what data a project will produce, how it will be documented, stored, protected and shared, and who is responsible for each step. There is no single universal template: a physical-science plan built around instrument calibration and file formats looks very different from a life-science plan built around clinical consent and genomic repositories, or a social-science plan built around participant anonymisation and survey metadata. A data management plan (DMP) is a formal document, usually required at grant application or award stage, that describes how research data will be handled across the full research lifecycle, from collection through to long-term preservation or disposal.

    What every data management plan template must include

    Regardless of discipline, funders expect the same core sections. The Science Europe RDM Guide structures its template around 15 questions covering six core requirements: data description, documentation and metadata, storage and backup, legal and ethical requirements, data sharing, and responsibilities and resources. The UK’s Digital Curation Centre (DCC) publishes a parallel checklist used by most UK universities as the basis of their local DMP guidance.

    • Data description — types, formats, volumes and provenance of data to be generated or reused
    • Documentation and metadata standards that will make the data intelligible to others
    • Storage, backup and security arrangements during the active project
    • Ethical, legal and consent considerations, including any restrictions on sharing
    • Preservation, repository choice and long-term access arrangements
    • Roles, responsibilities and resourcing for data management tasks

    These core sections are then adapted to the realities of the data itself. That adaptation — not the boilerplate headings — is where discipline-specific templates diverge, and where a generic one-size-fits-all template becomes a liability rather than a help.

    Physical sciences: instrument and sensor data

    Physical-science DMPs — physics, chemistry, astronomy, earth and environmental sciences — are dominated by high-volume instrument, sensor and simulation output rather than by human-subjects concerns. The template needs to say more about formats, calibration and reduction, and comparatively little about consent.

    • Community-accepted file formats (for example FITS in astronomy, NetCDF in climate and earth science) to guarantee interoperability
    • Instrument and calibration metadata, so raw readings remain interpretable and reproducible years later
    • Data volume and velocity planning — strategies for reduction, transfer and storage of large or continuous streams from sensors, telescopes or particle detectors
    • Software and code versioning, since simulation and analysis code is often as essential to reproducibility as the raw data itself
    • Discipline repositories such as PANGAEA (earth and environmental data) or domain-specific archives maintained by observatories and facilities

    The European Research Council’s DMP template for Horizon Europe-funded projects follows the same emphasis: it requires beneficiaries to describe data volumes, formats and FAIR compliance rather than consent procedures, reflecting the instrument-heavy profile of much ERC-funded physical-science work.

    Life sciences: biological and clinical data

    Life-science DMPs — biology, medicine, genomics, clinical research — carry a heavier ethical and regulatory load. A strong template treats consent, de-identification and repository choice as first-class sections, not footnotes.

    • Ethical and legal compliance: how human-subject or animal data will be de-identified, and how the plan aligns with UK GDPR and relevant research ethics committee approvals
    • Community ontologies and minimum-information standards such as MIAME for microarray data or Darwin Core for biodiversity records, which allow datasets to be compared across studies
    • Persistent identifiers for samples, datasets and participants, supporting findability without compromising anonymity
    • Genomic and clinical data deposition requirements — for example, NIH policy requires eligible genomic datasets to be deposited in controlled-access repositories such as dbGaP or GEO
    • The UK’s Medical Research Council (MRC) requires a personalised DMP, using a published UKRI template, for MRC-funded studentships and non-doctoral training grants

    Because clinical and genomic data are rarely fully open, life-science templates typically distinguish between what will be shared openly, what will be shared under a data access agreement, and what cannot be shared at all — a three-tier distinction that is largely absent from physical-science templates.

    Social sciences: qualitative and survey data

    Social-science DMPs — sociology, psychology, education, economics — centre on informed consent, anonymisation and the management of qualitative material such as interview transcripts and survey responses.

    • Informed consent procedures and how participants are told what will happen to their data, including any future-use or secondary-use provisions
    • Anonymisation and de-identification plans for both qualitative data (transcripts, recordings, field notes) and quantitative survey data
    • Metadata standards for survey and social data, such as the Data Documentation Initiative (DDI), used by archives to describe questionnaires and variables
    • Data access agreements for restricted or sensitive datasets, specifying who can apply for access and under what conditions
    • Long-term archiving through a recognised social-science data service — in the UK, the UK Data Service (based at the UK Data Archive, University of Essex) is the standard repository for shareable social and economic data

    Qualitative data management also needs its own sub-plan: transcription protocols, version control across iterative coding, and a named point at which raw recordings are destroyed or securely archived under the original consent terms.

    Dimension Physical sciences Life sciences Social sciences
    Dominant data type Instrument, sensor, simulation output Genomic, clinical, biological samples Interview, survey, observational records
    Key standards FITS, NetCDF MIAME, Darwin Core DDI (Data Documentation Initiative)
    Primary risk to manage Volume, format interoperability Participant privacy, consent, regulation Anonymisation, re-identification risk
    Typical repository PANGAEA, facility archives dbGaP, GEO, ENA UK Data Service / UK Data Archive

    Frequently asked questions

    How do you write a data management plan?

    Start from your funder’s own template where one exists — UKRI, the ERC and most UK universities publish one — then work through data description, storage, ethics, sharing and preservation in turn. Discipline-specific detail, such as file formats or consent procedures, should be added within each section, not bolted on afterwards.

    What is included in a data management plan?

    A complete DMP describes the types of data that will be produced, the metadata and documentation standards used, storage and backup arrangements, ethical and legal requirements, sharing and access conditions, and preservation plans, with named roles for each task.

    Do all researchers need a data management plan?

    Most UK and EU research funders — including UKRI, Horizon Europe and members of cOAlition S — now require a DMP as a condition of funding. Even without a mandate, a DMP reduces the risk of data loss and supports compliance with institutional data protection policy.

    What does a good data management plan look like?

    A good DMP is specific rather than generic: it names the exact repository, format and access conditions that apply to the project’s actual data, not boilerplate language borrowed from an unrelated template. It is also a living document, revisited as the project changes.

    What this means for research administrators

    Research offices that hand every applicant the same generic DMP template are setting up avoidable review delays: reviewers increasingly expect discipline-appropriate detail on formats, consent or anonymisation rather than restated boilerplate. Building three lightweight discipline variants — physical, life and social science — from one core checklist, as outlined above, lets an institution keep a single governance structure while giving each researcher a template that actually matches their data.

    The underlying reference point across all three variants is the FAIR data principles — Findable, Accessible, Interoperable, Reusable — first formalised by Wilkinson et al. in Scientific Data (2016) and now embedded in funder policy from Horizon Europe to UKRI. Fair data management is the common thread; the template detail is where disciplines genuinely diverge. Institutions building or revising DMP guidance should treat the research administration function, not just the library, as the natural owner of discipline-specific template maintenance, and consult the CASRAI Dictionary for consistent definitions of the terms used across templates.

  • NSF Data Management Plan: A Directorate Guide

    An NSF data management plan (DMP) is a required proposal component describing how a project will handle, share, and preserve research data — and since 27 April 2026, its exact content is set by a structured Research.gov webform that adapts to the proposal’s lead directorate, meaning BIO, ENG, GEO, MPS, OPP, SBE, and EDU proposals no longer follow one identical template. Treating NSF as a single monolithic funder — the default approach in most DMP guides — now produces plans that miss directorate-specific expectations.

    A data management and sharing plan (DMSP) is the National Science Foundation’s formal proposal document setting out how a funded project will manage, share, and archive the data, samples, and other research products it produces. Under the Proposal & Award Policies and Procedures Guide (PAPPG) section II.D.2(ii) and Policy Notice NSF 26-202, every full proposal must include one — or a documented justification if the project will not generate data.

    What Does an NSF Data Management Plan Require in 2026?

    Every NSF proposal must address six general elements under PAPPG II.D.2(ii), regardless of directorate. These form the baseline that directorate-specific guidance then narrows or extends.

    • The types of data, samples, physical collections, software, and curriculum materials the project will produce
    • The standards used for data and metadata format and content, with a documented workaround where no standard exists
    • Policies for data access and sharing, including privacy, confidentiality, security, and intellectual-property protections
    • Policies for data reuse, redistribution, and the production of derivative products
    • Plans for archiving data and other research products and preserving long-term access to them

    Under Policy Notice NSF 26-202, a proposal that will not generate data does not need a full plan — a short justification statement satisfies the requirement instead.

    How Do NSF DMP Requirements Differ by Directorate?

    NSF publishes supplementary DMP guidance for seven directorates and offices — BIO, ENG, GEO, MPS, OPP, SBE, and EDU — plus at least one program-specific supplement (DMREF, under MPS). Where a directorate has issued no supplement, the general PAPPG rules apply by default. The table below summarises what each adds on top of the baseline.

    Directorate/Office Distinguishing emphasis
    Biological Sciences (BIO) Deposit in community-recognized public repositories (e.g. GenBank-class databases) with persistent identifiers linking data to publications
    Engineering (ENG) Broad coverage of software, models, and physical collections; attention to intellectual property where research has commercial potential
    Geosciences (GEO) Discipline-specific repository requirements — Ocean Sciences awardees, for example, are directed to the Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Mathematical & Physical Sciences (MPS) General guidance, plus a dedicated program-level supplement for the Designing Materials to Revolutionize and Engineer our Future (DMREF) programme
    Office of Polar Programs (OPP) Governed by a separate Dear Colleague Letter establishing a distinct data and code/sample management policy rather than the standard DMSP framework alone
    Social, Behavioral & Economic Sciences (SBE) Heavy emphasis on human-subjects protections — anonymisation, handling of personally identifiable information, informed consent for data sharing, and deposit in recognised social-science archives
    STEM Education (EDU) Maintains its own directorate-level data-management-plans page addressing education-research data and human-subjects considerations

    This directorate layering is why a plan drafted for a BIO proposal will read very differently from one drafted for an SBE or OPP proposal, even though both start from the same PAPPG baseline.

    What Changed With the April 2026 Research.gov Webform?

    Effective 27 April 2026, NSF replaced the free-standing two-page PDF data management and sharing plan with a structured webform submitted directly through Research.gov. This is the single most consequential change to NSF DMP practice since the policy’s creation, and it makes most “download the template” search results obsolete.

    • The plan is now entered field-by-field in Research.gov rather than uploaded as a standalone PDF attachment
    • The webform adapts its prompts to the proposal’s selected lead directorate, formalising the differences long implied but not enforced by the old PDF format
    • Investigators should verify current guidance for their directorate before assuming a saved PDF template from a prior submission still matches the required fields

    Institutional research offices that maintain locally cached “NSF DMP template” documents should retire the PDF version and point investigators to the live Research.gov webform for the current submission year.

    NSF Data Management Plan Checklist, by Directorate

    Use this sequence to build a directorate-appropriate plan rather than a generic one:

    • Confirm the proposal’s lead directorate and pull its specific guidance page (BIO, ENG, GEO, MPS, OPP, SBE, or EDU) alongside the general PAPPG II.D.2(ii) requirements
    • List every data type, sample, and software product the project will generate
    • Identify the metadata standard and, for BIO or GEO proposals, the target public repository (e.g. BCO-DMO for ocean sciences)
    • For SBE or human-subjects work, document anonymisation, consent, and PII-handling procedures explicitly
    • For OPP proposals, check whether the relevant Dear Colleague Letter policy supersedes the standard DMSP structure
    • Set an archiving and long-term preservation plan with a named repository or institutional data service
    • Submit through the Research.gov webform rather than attaching a standalone PDF

    Common Questions About NSF Data Management Plans

    Does NSF require a data management plan?

    Yes. NSF requires a data management and sharing plan as a mandatory component of every full proposal, per PAPPG II.D.2(ii). Proposals that will not produce data must instead include a written justification explaining why no plan is needed.

    What is included in a data management plan?

    A complete plan covers the types of data produced, the metadata standards applied, access and sharing policies, provisions for reuse and derivatives, and an archiving and preservation plan for long-term accessibility.

    Do I need a data management plan?

    Any NSF full proposal needs one unless the project genuinely generates no data, samples, or research products — in which case a short justification statement, not a full plan, satisfies the requirement under Policy Notice NSF 26-202.

    What This Means for Research Administrators

    Directorate-tailored webforms shift the compliance burden earlier in the proposal cycle. Research offices that previously offered a single boilerplate DMP template now need directorate-aware review checkpoints, because a plan that satisfies BIO’s repository expectations will not automatically satisfy SBE’s human-subjects requirements or OPP’s separate policy letter. Institutions supporting multi-directorate portfolios should update internal guidance documents to reference the correct directorate page rather than a single generic NSF DMP resource.

    The Outlook for NSF Data Management Requirements

    The move to a structured, directorate-tailored webform signals that NSF intends to enforce, rather than merely suggest, discipline-specific data practices. Investigators and research offices that continue treating the NSF data management plan as a single generic two-pager risk submitting plans that technically comply with PAPPG but miss the sharper, directorate-specific expectations now built into the submission system itself.

  • FAIR Dataset Mandates Risk Becoming a Checkbox

    A FAIR dataset is one that meets the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016 — but a funder mandate requiring deposit and a data management plan does not, on its own, guarantee this. Genuine FAIR compliance demands rich metadata, persistent identifiers and community-standard formats that most minimally compliant deposits skip entirely, because current incentive structures reward the act of depositing, not the work of curating.

    A FAIR dataset is a digital research object — data or its metadata — that satisfies the Findable, Accessible, Interoperable and Reusable principles first formalised by the FORCE11 community and published in Scientific Data in March 2016. The principles were designed to be applied in degrees, not as a pass/fail gate, which is precisely where funder policy and researcher practice have diverged.

    What does a FAIR dataset actually require?

    The FAIR principles set out four categories of requirement, each broken into specific sub-criteria. They are deliberately conceptual rather than prescriptive, which is a strength for cross-disciplinary adoption and a weakness for enforcement.

    • Findable — data and metadata carry a globally unique, persistent identifier and are indexed in a searchable resource.
    • Accessible — retrieval uses a standardised, open protocol, with metadata remaining accessible even when the underlying data cannot be.
    • Interoperable — data and metadata use a shared, formal language and vocabularies that follow FAIR principles themselves.
    • Reusable — data carry a clear licence, detailed provenance, and conform to domain-relevant community standards.

    The Research Data Alliance’s FAIR Data Maturity Model, published in 2020, decomposes these four principles into 41 discrete indicators covering both data and metadata. That granularity matters: a dataset can satisfy some indicators and fail most others while still being described, loosely, as “FAIR.” A funder checking only for repository deposit is verifying perhaps one or two of the 41.

    Why do funder mandates default to minimal compliance?

    Funder FAIR requirements typically operationalise as two things: a submitted data management plan and a deposit in a recognised repository at the end of the project. Neither step audits metadata richness, vocabulary use, or licensing clarity. The result is a policy that is easy to comply with and easy to satisfy without producing a dataset anyone outside the original team could actually reuse.

    Three structural gaps explain why:

    • Resourcing. Science Europe’s funders’ briefing on data management planning recommends that compliant curation cost roughly 5% of total research budget — a figure rarely built into grant awards, leaving curation as unfunded overhead.
    • Recognition. Data curation is not weighted in hiring, promotion or tenure decisions in most institutions, so time spent enriching metadata competes directly with time spent on publications that do count.
    • Standards gaps. Many disciplines still lack the domain-relevant community vocabularies that Interoperability and Reusability depend on, so even willing depositors have nothing FAIR-compliant to conform to.

    Horizon Europe requires that all data produced under the programme be FAIR “by default,” which is the strongest funder-level statement of intent currently in force. Yet the European Commission’s own guidance materials acknowledge that FAIRness is a spectrum, not a binary condition — an admission that sits uneasily alongside a compliance model built around a single deposit checkpoint.

    The maturity gap: from “FAIR start” to genuine reusability

    The European Commission’s Joint Research Centre published FAIR Data Guidelines in 2025 that organise the RDA’s 41 indicators into five progressive maturity levels. The framework is useful precisely because it makes visible how far “minimally compliant” sits from “genuinely reusable.”

    Maturity level What it requires
    FAIR start Published in a catalogue with mandatory metadata; data itself is not structured for machine reuse.
    FAIR play Links added between datasets and related resources, with enriched provenance and cross-referencing.
    FAIR go Data structured to community standards, with defined terminologies (not necessarily machine-readable).
    FAIR share Machine-readable data models (JSON Schema, XML Schema, SHACL) with richly documented provenance.
    FAIRest of them all Machine-readable model endorsed by the domain community; terms exposed via shared FAIR vocabularies.

    Most mandate-driven deposits land at “FAIR start” — indexed, licensed, discoverable, but not structured for genuine machine or cross-team reuse. The JRC guidelines are explicit that not every dataset needs the top tier, but they are equally explicit that FAIRness can degrade over time if metadata and platforms are not actively maintained. A one-off deposit satisfying a funder’s closeout requirement is not maintenance; it is a snapshot.

    Rebuilding incentives for genuine data stewardship

    Treating FAIR as a compliance checkbox is a governance failure, not a researcher failure. Three changes would shift the incentive structure toward genuine stewardship rather than deposit-and-forget behaviour.

    1. Credit the labour. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. “Data curation” is one of its fourteen roles, offering institutions an existing, citable mechanism to formally recognise stewardship work in author contribution statements — a mechanism that remains inconsistently applied in promotion and tenure review.
    2. Fund it explicitly. Grant budgets should ring-fence curation costs at the level Science Europe’s own guidance recommends, rather than treating data management plans as an unfunded compliance document.
    3. Audit maturity, not deposit. Funders and institutions should reference maturity models such as the RDA’s 41 indicators or the JRC’s five-level scale in closeout review, rather than accepting repository deposit as sufficient evidence of FAIR compliance.

    FAIR is also not a complete governance answer on its own. The CARE Principles for Indigenous Data Governance, released by the Global Indigenous Data Alliance in 2019, extend the framework to cover collective benefit, authority to control, responsibility and ethics — dimensions that a pure findability-and-format checklist does not touch. Institutions building data policy around FAIR alone are optimising for machine reuse while leaving governance and consent questions unaddressed.

    Frequently asked questions

    What is a FAIR dataset?

    A FAIR dataset satisfies the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016. It carries a persistent identifier, standardised access, shared vocabularies, and clear licensing and provenance — not merely a repository listing.

    What does FAIR stand for with data?

    FAIR stands for Findable, Accessible, Interoperable and Reusable. The acronym describes a framework for data stewardship, not a certification; the Research Data Alliance breaks it into 41 measurable indicators rather than a single pass condition.

    What does FAIR stand for in data management?

    In data management, FAIR describes the target state a data management plan should work toward: identifiers, rich metadata, open protocols and community-standard formats. It guides curation decisions throughout a project, not just the final deposit.

    Why does FAIR data matter?

    FAIR data matters because it lets both humans and machines discover, verify and reuse research outputs without contacting the original authors. Poorly curated “FAIR” deposits undermine reproducibility and waste the public investment funders intended the mandate to protect.

    Implications and outlook

    Funder FAIR mandates have succeeded in one respect: deposit rates have risen sharply since 2016. They have not, on current evidence, produced datasets that are reliably machine-actionable or cross-team reusable at scale. That gap will not close through stricter wording in policy documents; it requires funders to resource curation at realistic cost, institutions to credit it in career progression via mechanisms such as CRediT’s Data curation role, and disciplines to finish building the community standards that Interoperability depends on. Until those three conditions are met, “FAIR by default” will remain a policy aspiration rather than a description of the average deposited dataset.

  • FAIR Principles Data Maturity: Score Against RDA

    FAIR data maturity is scored by testing each dataset against the 41 indicators of the Research Data Alliance’s FAIR Data Maturity Model, grading Findability, Accessibility, Interoperability and Reusability separately, then weighting results by each indicator’s priority tier — essential, important, or useful. FAIR principles data management moves from an abstract commitment to a measurable score once an institution runs this test consistently across its repository.

    FAIR data is data that meets the Findable, Accessible, Interoperable and Reusable criteria first published by Wilkinson et al. in Scientific Data in 2016 — a paper now cited more than 22,000 times. This guide is a practical scoring walkthrough, not another explainer of what the four letters mean: it shows research offices how to actually audit existing datasets and repositories against the RDA model and turn the result into a remediation plan.

    What is the RDA FAIR Data Maturity Model?

    The RDA FAIR Data Maturity Model is a specification published by a Research Data Alliance working group in 2020 to standardise how organisations test FAIRness. Before it existed, dozens of institutions had built incompatible local checklists, making it impossible to compare a “FAIR score” from one repository against another.

    The model does not ship as software. It is a reference document that defines:

    • 41 indicators — testable statements mapped to the fifteen core GO FAIR sub-principles (F1–F4, A1–A2, I1–I3, R1–R1.3)
    • Three priority tiers — essential, important and useful — so institutions can triage effort rather than treat every indicator as equally urgent
    • Evaluation guidance — worked examples for testing each indicator against real metadata and data objects, rather than self-reported compliance

    Because the indicators trace directly to the GO FAIR principles, a dataset that scores well against the RDA model is, by construction, meeting the same criteria described in the original 2016 Scientific Data paper — just with a repeatable measurement attached.

    How does FAIR maturity scoring actually work?

    Scoring is done indicator by indicator, not principle by principle. Most institutions that implement the RDA model score each of the 41 indicators on a simple 0–4 scale — 0 (not implemented) through 4 (fully implemented) — then multiply by a priority weight before aggregating to a per-dataset and per-repository total.

    FAIR letter Sub-principles tested Typical essential-tier evidence
    Findable F1–F4 Persistent identifier (DOI via DataCite), indexed metadata record, machine-readable catalogue entry
    Accessible A1–A2 Retrieval via an open protocol (HTTPS), metadata that resolves even if the data itself is restricted
    Interoperable I1–I3 Structured, non-proprietary format; controlled vocabularies; qualified links to related records
    Reusable R1–R1.3 Machine-readable licence, documented provenance, alignment with a domain metadata standard

    A dataset that carries a DOI and open licence but lacks controlled vocabulary terms will score high on Findable and Reusable, and low on Interoperable — the point of indicator-level scoring is precisely to surface that kind of uneven profile, which a single pass/fail “is it FAIR?” verdict would hide.

    Manual vs automated assessment: which tool fits?

    Two complementary assessment routes exist. Automated tools are fast but only test what a machine can verify; manual review is slower but catches the indicators that require human judgement, such as whether a licence is genuinely clear or a vocabulary is genuinely domain-appropriate.

    Tool / method Coverage of the 41 indicators Output Best suited to
    F-UJI (FAIRsFAIR project) Machine-testable subset only — roughly 17 metrics derived from the RDA indicators Automated percentage score per FAIR letter, run against a DOI Bulk baseline scans across a whole repository
    FAIR-Aware (DANS) Self-assessment questionnaire, not indicator-scored Qualitative readiness report and recommendations Researchers preparing a dataset before deposit
    Manual RDA specification review All 41 indicators, including human-judgement ones Full indicator-by-indicator score with evidence notes Institutional audits and remediation planning

    A hybrid approach is the most defensible for an institution-wide programme: run an automated scan across every repository record for a fast baseline, then reserve manual review for the essential-tier indicators no tool can verify — licence clarity, provenance completeness and domain-standard alignment.

    A step-by-step scoring walkthrough

    The following sequence turns the RDA model from a reference document into a repeatable institutional process.

    1. Select a representative sample. Pull datasets across disciplines, repository platforms, and funder mandates — a sample skewed toward one department will misstate institutional maturity.
    2. Map each dataset’s DOI or identifier record and run an automated F-UJI scan for the machine-testable indicators before any manual work begins.
    3. Score the remaining essential-tier indicators manually, checking licence text, metadata schema, and vocabulary choice against the evidence guidance in the RDA specification.
    4. Weight and aggregate. Multiply each indicator score by its priority weight, sum within each FAIR letter, then average across the sample to produce a repository-level maturity profile.
    5. Report by weakest letter, not overall average. An institution scoring 3.6/4 on Findable but 1.2/4 on Interoperable needs a vocabulary-adoption project, not a generic “improve FAIR compliance” action item.

    Worked example — three datasets from the same institutional repository, scored on the 0–4 scale before weighting:

    Dataset Findable Accessible Interoperable Reusable
    Clinical trial dataset (restricted access) 4 3 2 3
    Environmental sensor archive 3 4 3 2
    Survey microdata (open) 2 4 1 4

    This profile — strong on Accessible, weak on Interoperable across all three — is a genuinely institution-specific finding a generic FAIR explainer cannot give you; only a scored audit surfaces it, and it points to a single fix (adopting a shared controlled vocabulary at ingest) rather than four separate ones.

    Common questions about FAIR data scoring

    What are FAIR principles for data?

    FAIR principles are four criteria — Findable, Accessible, Interoperable and Reusable — first published in a 2016 Scientific Data paper by Wilkinson et al. They require datasets to carry a persistent identifier, standardised retrieval protocols, shared vocabularies and machine-readable licensing, so both humans and software can locate and reuse research data reliably.

    What are the four pillars of the FAIR data principles?

    The four pillars are Findable (unique persistent identifiers and rich metadata), Accessible (standardised, open retrieval protocols), Interoperable (shared vocabularies and qualified references) and Reusable (clear licensing, provenance and community standards). The RDA FAIR Data Maturity Model breaks these four pillars into 41 individually testable indicators.

    What are the FAIR data principles of UKRI?

    UKRI does not publish a separate FAIR standard. Its research councils, including NERC’s Environmental Data Service, require grant-funded datasets to follow the same GO FAIR-published Findable, Accessible, Interoperable and Reusable principles, citing benefits including increased citation, stronger research integrity, and compliance with data management plan commitments.

    What are the FAIR principles of GDPR?

    FAIR and GDPR address different concerns and are not in conflict. FAIR governs discoverability and reuse of metadata, while GDPR governs lawful processing of personal data. A dataset containing personal information can be fully FAIR — richly described and findable — while access to the underlying records stays restricted under GDPR-compliant authorisation.

    What this means for research data offices

    A scored FAIR audit gives research offices something a qualitative checklist cannot: a repository-level baseline that can be re-measured after each remediation cycle. Institutions preparing data management plan compliance evidence for UKRI, Horizon Europe, or cOAlition S-aligned funders can cite the same indicator scores as their supporting evidence, rather than producing a fresh narrative justification each time.

    Scoring also clarifies where FAIR and openness diverge. Following the “as open as possible, as closed as necessary” principle, a dataset can score highly on all four FAIR letters while remaining access-controlled — the metadata is open and richly described even when the underlying records are not. Institutions handling Indigenous or community-originated data should additionally weigh the CARE Principles — Collective Benefit, Authority to Control, Responsibility and Ethics — published by the Global Indigenous Data Alliance, which govern who controls reuse decisions rather than how discoverable the data is.

    The practical next step after a first scoring pass is not a single “get to 100%” target — no dataset needs every useful-tier indicator satisfied — but a prioritised backlog built from essential-tier gaps, feeding directly into repository ingest workflows and metadata templates so the next deposit scores higher without a second audit.

  • Data Sharing Policy: A Research Office Template

    A data sharing policy is the institution-wide governance document that sets expectations for how researchers plan, deposit, and share research data — distinct from a data sharing agreement, which is the specific legal contract governing one data transfer. Research offices write policies to translate funder FAIR data mandates, such as the NIH’s 2023 Data Management and Sharing Policy, into consistent local practice.

    A data sharing policy is an institutional statement of principle and requirement: it tells every researcher, department, and grant applicant what the organisation expects of them before, during, and after a funded project, regardless of discipline or funder. It is not a substitute for a project-level data management plan (DMP), and it is not the same document as a data sharing agreement — the confusion between the two is the single most common drafting mistake research offices make.

    What is an institutional data sharing policy?

    An institutional data sharing policy is a governance document, usually owned jointly by the research office, library, and IT services, that sets baseline rules for how the organisation’s researchers manage and share the data underlying their published outputs. It applies across all disciplines and funders, rather than to a single grant.

    Published examples illustrate the range: the Office for National Statistics operates a data sharing policy governing record-level personal information, while Cancer Research UK’s data sharing and management policy sets FAIR-aligned requirements as a condition of every grant it awards. Both share a common shape — purpose, scope, principles, requirements, and named responsibilities — even though one governs a funder’s grant conditions and the other governs a public body’s statistical data.

    For a research office, the policy is the document that makes funder requirements operational at institutional scale: instead of each principal investigator interpreting a funder’s data mandate independently, the institution issues one interpretation, one set of approved repositories, and one escalation route for exceptions.

    Why research offices need a data sharing policy now

    Research offices need a written policy because funders increasingly make data sharing a condition of funding, not a recommendation, and institutions without a policy leave researchers to interpret those conditions inconsistently — which creates compliance risk at renewal, audit, and publication stages.

    The mandate landscape has hardened over the past decade:

    • NIH’s 2023 Data Management and Sharing Policy took effect on 25 January 2023 and requires a data management and sharing plan for essentially all NIH-funded research, reviewed alongside the science.
    • UKRI is a signatory to the 2016 Concordat on Open Research Data, which commits funded institutions to making research data openly available with as few restrictions as possible.
    • Horizon Europe’s Model Grant Agreement requires a FAIR-aligned data management plan for participating projects, applying the “as open as possible, as closed as necessary” principle carried over from Horizon 2020.
    • ICMJE’s data sharing statement requirement has applied to clinical trials that began enrolling participants on or after 1 January 2019, requiring a data availability statement as a condition of publication in ICMJE-following journals.

    Each of these mandates is written at the funder level. The institutional policy is what converts them into a single, consistent set of expectations that a research office can actually train staff on and audit against.

    Data sharing policy vs data sharing agreement

    A data sharing policy and a data sharing agreement solve different problems: the policy is a standing, institution-wide statement of expectations, while the agreement is a one-off legal contract governing a specific transfer of specific data between specific parties. Research offices need both, but they are drafted, owned, and reviewed differently.

    Aspect Institutional data sharing policy Data sharing agreement
    Scope All researchers, all funded projects, ongoing One dataset, one recipient, one purpose
    Trigger Institutional governance cycle A specific request or collaboration
    Legal status Internal policy; not itself a contract Binding contract, often referencing UK GDPR
    Typical owner Research office, library, IT, ethics committee Data protection officer, legal counsel
    Reviewed by Institution, periodically Both parties, per transfer

    A well-written policy should explicitly state this distinction and point researchers to the correct process for each: the policy for general expectations and deposit requirements, the agreement (or a data protection impact assessment) for any transfer involving personal, sensitive, or third-party data governed by UK GDPR.

    Template structure: what to include

    A usable institutional data sharing policy needs roughly eight components, moving from purpose through to enforcement, so that researchers and reviewers can find any given requirement in under a minute.

    1. Preamble and purpose — why the institution requires data sharing and its relationship to the FAIR principles, first published in Scientific Data in 2016.
    2. Scope — which staff, students, and data (all disciplines, all funders, or funder-specific) the policy covers.
    3. Definitions — research data, metadata, persistent identifier, data management plan, repository.
    4. Policy statements — the DMP requirement, repository and persistent-identifier expectations, metadata standards, data licensing, and minimum retention period.
    5. Data availability statements — a requirement that publications state how and where the underlying data can be accessed.
    6. Roles and responsibilities — what is expected of researchers, the research office, the library, IT, and departmental leadership.
    7. Exceptions and embargoes — the process for restricting access on ethical, legal, or commercial grounds.
    8. Review and implementation — the cycle on which the policy itself is revisited against evolving funder mandates.
    Section What it should specify
    Data deposit Named or criteria-based approved repositories, with a preference for those issuing DOIs via DataCite
    Persistent identifiers ORCID for researchers; DOIs for datasets
    Contributor recognition Use of Contributor Role Taxonomy (CRediT) statements so data curation and stewardship work is credited
    Retention A specific minimum period (commonly ten years post-publication) rather than an open-ended commitment
    Sensitive data A named route to ethics and data protection review before any exception is granted

    Note that CASRAI originated the CRediT contributor role taxonomy in 2014; the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022, and institutional policies that reference it should cite NISO, not CASRAI, as the current maintaining body.

    Frequently asked questions and next steps

    Is a data sharing agreement legally required?

    A data sharing agreement is not universally mandated by statute in the UK, but it is required in practice whenever personal or confidential data is transferred between organisations under UK GDPR, and it is frequently a condition set by funders or ethics committees. An institutional data sharing policy is separate and is typically a funder or institutional requirement rather than a legal one.

    What is the data sharing law in the UK?

    UK data sharing is governed primarily by the UK GDPR and the Data Protection Act 2018, which set the rules for handling personal data, alongside the common law of confidentiality. Research data policies must operate within this framework whenever datasets contain identifiable or sensitive personal information, in addition to meeting funder FAIR requirements.

    What are the six key data sharing principles?

    Widely cited data sharing principles hold that shared information should be necessary, proportionate, relevant, accurate, timely, and secure. Institutional research data policies should apply the same discipline alongside FAIR — findable, accessible, interoperable, reusable — so that openness and data protection obligations are handled together rather than in conflict.

    Once a first draft exists, research offices should route it through the same stakeholders named in the policy itself — library, IT, ethics, and legal — before it goes to institutional governance for sign-off, and set a firm review date rather than leaving the document to lapse.

    As funders continue tightening data mandates, from NIH’s 2023 policy to Horizon Europe’s FAIR requirements, institutions without a current, clearly scoped policy will increasingly find researchers improvising compliance at the point of grant application — precisely the risk a written data sharing policy is designed to remove. Research offices that keep the policy distinct from the data sharing agreement, and review it on a fixed cycle, are best placed to keep pace with the next round of funder requirements.

  • UKRI Data Management Plan Template Guide for Multi-Council Grants

    UKRI’s common data management plan template asks applicants to describe, section by section, how research data will be generated, documented, stored, shared and preserved — but the level of detail, word limit and submission requirement differ by council: MRC and BBSRC mandate a full plan, NERC requires only a one-page outline, and EPSRC does not require submission at all.

    A data management plan (DMP) is a structured document, submitted with or alongside a grant application, that specifies how research data will be collected, documented, stored, shared and preserved throughout and after a funded project. For UKRI-funded researchers, the practical difficulty is not knowing what a DMP is — it is knowing which version of the UKRI data management plan template applies to their council, how long it should be, and what each field is actually asking for. This walkthrough goes section by section across the four councils most research administrators handle together on multi-strand or interdisciplinary awards: MRC, BBSRC, NERC and EPSRC.

    What does UKRI’s common data management plan template cover?

    UKRI does not operate a single, mandatory template across all seven research councils. Instead, each council publishes its own guidance built around a common core of questions: what data will be produced, how it will be documented, where it will be stored, who can access it, and how long it will be retained. This shared structure is why researchers refer informally to a “UKRI data management plan template”, even though the actual document you complete depends on which council is funding the work.

    The starting point for most multi-council applicants is the MRC data management plan template, a Word document published via UKRI’s publications library, because several other councils’ library-hosted templates (including NC3Rs-badged studies) reuse its structure. NERC, BBSRC and EPSRC each layer council-specific expectations — word limits, submission timing, and retention periods — on top of that shared skeleton.

    How do requirements differ across MRC, BBSRC, NERC and EPSRC?

    The single biggest source of error in multi-council DMPs is applying one council’s rules to another council’s proposal. The table below sets out the four core differences research administrators need to check before drafting.

    Council DMP required at application? Template source Length Minimum data retention
    MRC Yes — mandatory for all funding proposals MRC Data Management Plan template (UKRI publications library) 500–1,500 words; 1,500 words for longitudinal studies, population cohorts, genetic, omics, imaging data and biobanks 10 years (20 years for population health and clinical studies)
    BBSRC Yes — mandatory for grant applications BBSRC template via DMPOnline (Digital Curation Centre) Maximum 500 words (check individual grant-stream variation) 10 years after project completion
    NERC Yes — one-page outline at application; full plan later NERC Outline Data Management Plan template and guidance (UKRI publications library) One page at application; full plan agreed with the relevant NERC data centre within 3–6 months of award start 10 years minimum
    EPSRC No — not submitted with the application No dedicated EPSRC council template; DMPOnline hosts an EPSRC-structured version for internal use No fixed limit — proportionate to the project 10 years from the end of any privileged-access period

    EPSRC is the outlier: it does not require a DMP to be submitted with the proposal, but most host institutions’ own research data policies still require one to exist internally so costs and storage needs are planned accurately. STFC sits closer to MRC and BBSRC — a DMP is mandatory for most schemes and capped at two sides of A4 — but, unlike MRC, STFC does not prescribe a fixed template.

    Completing the template field by field

    Across MRC, BBSRC, NERC and EPSRC guidance, the same seven fields recur, even where wording and word allowances differ. Address each one in this order.

    • Data collection and generation. State the type of data (quantitative, qualitative, imaging, genomic, environmental sensor data, software), the format, the estimated volume, and whether it is newly generated or reused from an existing source.
    • Documentation and metadata. Name the metadata standard you will apply and describe accompanying documentation — a data dictionary, README file or laboratory notebook — needed for another researcher to interpret the dataset without you.
    • Ethics, consent and legal basis. Cover informed consent, anonymisation or pseudonymisation methods, and who holds intellectual property rights, particularly for MRC-funded clinical or population studies, where this field is scrutinised most closely.
    • Storage and security during the project. Specify where data will sit while the grant is active, backup frequency, and access controls — this is where EPSRC-funded teams should still document internal practice even though nothing is submitted to the council.
    • Long-term preservation. Name the repository (an institutional archive, a NERC environmental data centre, or the UK Data Service for ESRC-adjacent social science data) and confirm the retention period matches your council’s minimum from the table above.
    • Data sharing and access conditions. State which datasets will be shared openly, any embargo or proprietary period, and the justification if some data cannot be shared — commercial sensitivity, participant privacy or national security are the standard justifications UKRI accepts.
    • Responsibilities and resourcing. Name who owns data management delivery after the grant ends and itemise any storage, curation or specialist-staff costs, which can — and should — be included in the full economic cost of the proposal.

    For MRC and NERC applications specifically, the plan text is typically copied directly into the Je-S or funding-service application form rather than uploaded as a separate attachment — check the individual call documentation, since attachment rules vary by scheme and change between funding rounds.

    Common questions about the UKRI data management plan

    How do you write a data management plan?

    Start from your funding council’s specific template rather than a generic one, then work through data collection, documentation, storage, sharing and retention in turn. Keep language concrete and proportionate to your project’s data volume, and justify any decision not to share data rather than leaving it unexplained.

    What is included in a data management plan?

    A complete plan covers the types of data produced, the metadata and documentation standards used, storage and security arrangements, the repository chosen for preservation, access and sharing conditions, and the retention period. UKRI councils also expect a statement of who is responsible for delivery and what resources this requires.

    Do you need a data management plan for a UKRI grant?

    It depends on the council. MRC, BBSRC, NERC and STFC require a DMP to be submitted with most funding proposals, while EPSRC does not require submission, and AHRC has no general DMP requirement at all. Always confirm the specific call documentation, since requirements can vary by scheme within a single council.

    What does a good data management plan look like?

    A strong plan is specific to the project rather than generic, stays within the council’s stated word or page limit, and answers every field with a concrete detail — a named repository, a defined retention period, a stated metadata standard — instead of a vague intention. Reviewers assess it alongside the rest of the proposal during peer review.

    What this means for multi-council applicants

    Institutions running interdisciplinary programmes — a BBSRC-MRC joint call, or a NERC-EPSRC environmental engineering award — cannot draft one DMP and submit it unchanged to both funders. Word limits alone range from 500 words (BBSRC) to 1,500 words (MRC’s most data-intensive study types), and only NERC requires a two-stage outline-then-full-plan process. Research administration teams supporting these awards should build a field-by-field checklist per council into their proposal workflow, rather than relying on a single house template.

    As UKRI continues to consolidate open-research expectations across its councils, researchers should expect incremental convergence on shared metadata and repository standards — but not, in the near term, a single mandatory cross-council template. Until that happens, matching the right template to the right council, at the right length, remains the determining factor in a compliant submission.

    For teams coordinating research administration workflows across funders and councils, see CASRAI’s research administration resources, and consult the CASRAI Dictionary for definitions of related research data terminology.

  • NIH Genomic Data Sharing Policy vs DMS Policy

    The NIH Genomic Data Sharing (GDS) Policy and the NIH Data Management and Sharing (DMS) Policy are two separate, still-active NIH policies with different effective dates, different scopes and different submission points — the GDS Policy (2015) governs consent and controlled access for large-scale genomic data, while the DMS Policy (2023) governs data management planning for all NIH-funded scientific data. Grantees who assume the 2023 policy absorbed the 2015 one risk missing a distinct compliance step.

    The NIH Genomic Data Sharing Policy is the funder requirement, effective since 25 January 2015 under Notice NOT-OD-14-124, that governs consent-based data use limitations, controlled-access repositories and data release timelines for large-scale human and non-human genomic data generated with NIH support.

    Table of Contents

    What Is the NIH Genomic Data Sharing (GDS) Policy?

    The GDS Policy replaced NIH’s 2007 Genome-Wide Association Studies (GWAS) data-sharing policy and extended its logic to a wider set of genomic technologies. It applies to studies that generate large-scale human or non-human genomic data, including genome-wide association studies, single nucleotide polymorphism (SNP) arrays, whole-genome and whole-exome sequence data, transcriptomic data and epigenomic data produced by array-based or high-throughput sequencing platforms.

    Two features distinguish it from a generic sharing mandate:

    • A two-tiered access model — unrestricted (open) data versus controlled-access data held in a repository such as dbGaP, the NIH database of Genotypes and Phenotypes.
    • A consent-based data use limitation system, under which informed consent documents must state what data types will be shared and whether access will be open or controlled, so that secondary users are legally and ethically bound to the participant’s original consent.

    The National Human Genome Research Institute (NHGRI) implements the policy operationally through Notices NOT-HG-15-038 and NOT-HG-20-011, and designates AnVIL alongside dbGaP as primary repositories for NHGRI-funded genomic data.

    How Does the GDS Policy Differ From the DMS Policy?

    The NIH Data Management and Sharing Policy, effective 25 January 2023 under Notice NOT-OD-21-013, is far broader in scope. It applies to essentially all NIH-funded research producing “scientific data” — any data commonly accepted in the field as sufficient to validate and replicate findings — not only genomic data. It requires a data management and sharing plan with every competing grant application, whereas the GDS Policy’s genomic-specific requirements historically attached at the Just-in-Time stage, after review but before award.

    NIH has since directed that the two policies be harmonised into a single submission: where a project is subject to both, the genomic-specific elements (consent language, data type, repository choice, controlled- versus open-access designation) are folded into one data management and sharing plan rather than filed as two separate documents. The table below sets out where the policies still diverge.

    Feature GDS Policy (2015) DMS Policy (2023)
    Governing notice NOT-OD-14-124 NOT-OD-21-013
    Effective date 25 January 2015 25 January 2023
    Scope Large-scale human and non-human genomic data All NIH-funded scientific data, any type
    Core document Genomic Data Sharing Plan + Institutional Certification Data management and sharing plan
    Consent mechanism Consent-based data use limitations, enforced via dbGaP Data Access Committees General “justifiable limitations” language; no genomic-specific consent tiers
    Typical repository dbGaP, AnVIL (controlled- or open-access) Any NIH-designated or discipline-appropriate research data repository
    Budget provision Not addressed directly Explicitly allows data management and sharing costs in the budget

    Who Must Submit an Institutional Certification?

    An Institutional Certification is a GDS Policy-specific attestation — separate from the data management and sharing plan — that the institution has reviewed the consent language, IRB approval and data use limitations attached to the human genomic data before it is deposited in a controlled-access repository. It is not required by the DMS Policy for non-genomic data.

    Institutions must certify, among other things, that:

    • The data was collected in a manner consistent with 45 CFR 46 (the Common Rule) and applicable state and local laws.
    • Consent forms permit the specific type of data use requested (general research use versus disease-specific use).
    • Identifiers have been removed or the data otherwise meets the applicable de-identification standard.

    Because this certification is a distinct compliance artefact from the data management and sharing plan, research administrators who track only DMS Plan compliance can miss it entirely on genomic awards.

    How Does Controlled Access Work Under the GDS Policy?

    Controlled-access genomic data sits in dbGaP behind a Data Access Committee (DAC) review process. Secondary users submit a data access request describing their intended research use; the DAC checks that use against the consent-based data use limitation recorded for that dataset before granting access. This is materially different from the DMS Policy’s general expectation of “broadest appropriate sharing,” which does not itself impose a use-limitation enforcement layer — that enforcement mechanism is a GDS-specific feature.

    Answer-First Q&A

    Does the 2023 DMS Policy Replace the 2015 GDS Policy?

    No. The DMS Policy did not replace or repeal the GDS Policy; both remain in force. NIH’s own guidance directs grantees generating large-scale genomic data to satisfy GDS-specific requirements — informed consent language, Institutional Certification, controlled-access designation — within the single data management and sharing plan required by the DMS Policy, rather than as an independent document.

    What Counts as “Large-Scale” Genomic Data Under the GDS Policy?

    NIH does not set one fixed threshold; NHGRI and other institutes assess scale case by case, typically referencing genome-wide association studies, whole-genome or whole-exome sequencing, and array-based platforms as presumptively “large-scale.” Investigators with borderline projects should confirm applicability with their institute’s program officer before submission, since NHGRI also encourages voluntary sharing of smaller datasets.

    When Is the Institutional Certification Submitted?

    The Institutional Certification is submitted at the Just-in-Time stage — after peer review, once an application is being considered for funding — not with the initial application. This differs from the data management and sharing plan itself, which NIH requires as part of the competing application under the DMS Policy.

    Which Repository Satisfies the GDS Policy?

    NIH designates dbGaP for controlled-access human genomic data and, for NHGRI-funded work specifically, AnVIL as the primary repository accepting both controlled- and open-access data. Investigators may propose an alternative repository in the data management and sharing plan, subject to institute approval before funding.

    Implications for Research Administrators

    The practical risk is not policy conflict but a compliance gap: an office that maps its DMS Policy checklist to grant application review alone will miss the GDS Policy’s Just-in-Time Institutional Certification and its ongoing dbGaP registration obligations. Research administration offices supporting genomic PIs need two intake questions, not one — does this award generate large-scale genomic data, and if so, has the Institutional Certification been routed separately from the data management and sharing plan.

    As NIH continues to harmonise guidance across institutes, expect more sub-policies — clinical trials data sharing, foreign genomic data transfer rules — to layer onto rather than replace the DMS Policy’s baseline. Treating “DMS compliance” as a single checkbox will increasingly understate what a genomics-heavy award actually requires.