Tag: data sharing policy

  • MRC Data Management Plan vs Wellcome Rules for Bioscience Grantees

    An MRC data management plan (DMP) sets out how researchers will collect, document, store, secure and share data on an MRC-funded project, using UK Research and Innovation’s (UKRI) official template. Wellcome instead requires a broader “outputs management plan” covering data, software and physical materials, with no fixed template. Both are due at application stage, but their scope, sharing timelines and repository rules differ in ways that matter for bioscience grantees.

    A data management plan is a funder-mandated document that specifies how research data will be handled, from creation through to long-term preservation and reuse.

    What must an MRC data management plan include?

    The Medical Research Council (MRC), a UKRI council, requires all funding applicants to submit a DMP as part of their research proposal. Applicants must use UKRI’s official MRC data management plan template, an ODT document last revised on 1 April 2024 to align with the MRC’s revised data sharing policy.

    The template asks researchers to address:

    • Data types and volumes — what will be generated or reused, and in what formats.
    • Documentation and metadata — how the data will be made interpretable to other researchers.
    • Ethics and legal compliance — data protection, consent and confidentiality arrangements.
    • Storage, backup and security — arrangements during the life of the project.
    • Sharing and preservation — the named repository and any restrictions on access.
    • Trusted research and innovation (TRI) considerations — a requirement added in the April 2024 revision, reflecting UKRI-wide guidance on research security.

    The underlying MRC data sharing policy was itself revised on 29 November 2023 to reflect the commitments in the MRC’s Strategic Delivery Plan 2022 to 2025, incorporating a wider definition of “research data” and updated open access and data protection law. Reviewers assess DMPs against a published rubric, and MRC guidance states it expects valuable data to be shared with as few restrictions as possible.

    How does Wellcome’s outputs management plan differ?

    Wellcome does not ask for a “data management plan” in the MRC sense. Its Policy on Data, Software and Materials Management and Sharing — released on 10 July 2017, replacing an earlier Policy on Data Management and Sharing — requires an outputs management plan wherever a project will generate data, software or materials of clear value to others.

    Three features distinguish the Wellcome approach from MRC’s:

    • Broader scope — the plan must cover physical materials such as antibodies and cell lines, not only digital data and software.
    • No fixed template — applicants draft a plan “proportionate” to the scale and likely value of the outputs, rather than completing a standard form.
    • Living document — the plan is expected to be maintained and reviewed throughout the research lifecycle, not filed once at application stage.

    Wellcome frames its position as “as open as possible, as closed as necessary” — language that mirrors the European Commission’s Horizon Europe open-data principle — allowing restrictions to protect participant confidentiality or to enable intellectual property to be developed under its IP and patenting policy.

    MRC vs Wellcome: data-sharing requirements compared

    The table below summarises the structural differences a bioscience grantee applying to both funders needs to reconcile.

    Feature MRC Wellcome
    Plan name Data management plan (DMP) Outputs management plan
    Template Fixed UKRI ODT template (rev. April 2024) No template; proportionate free-text plan
    Scope Research data Data, software and physical materials
    Governing policy MRC data sharing policy (rev. Nov 2023) Policy on Data, Software and Materials Management and Sharing (2017)
    Review Assessed by peer reviewers against a published rubric Assessed as part of the wider proposal; monitored at end-of-grant reporting
    Extra checks Trusted research and innovation considerations required IP and patenting policy considerations required
    Repository expectation Discipline-specific repository, minimal restrictions Recognised community repository with persistent identifiers

    What are the sharing timelines and repository rules?

    Wellcome sets the more explicit timeline of the two funders. Its policy states that, as a minimum, data underpinning a research paper must be made available at the time of publication, along with any original software needed to view the dataset or replicate the analysis. For research related to public health emergencies, Wellcome requires quality-assured interim and final data to be shared “as rapidly and widely as possible”, ahead of journal publication.

    MRC’s policy is principles-based rather than date-bound: it asks applicants to share data “in a timely and responsible manner” with as few restrictions as possible, leaving the specific timeline to be justified case by case in the DMP itself.

    On repositories, both funders expect deposit in a recognised, discipline-appropriate service with persistent identifiers where possible. Wellcome additionally operates Wellcome Open Research, a publishing platform for rapid dissemination of funded results. On costs, both funders will fund justified data-sharing expenses within the grant; notably, in early 2018 Wellcome, the MRC, Cancer Research UK and the Bill & Melinda Gates Foundation jointly announced they would cover the costs of sharing clinical trials data via the Clinical Study Data Request (CSDR) platform — a rare example of aligned funder practice that removes cost as a barrier to compliance.

    Common questions about data management plans

    What is a data management plan?

    A data management plan (DMP) is a formal document describing how research data will be collected, documented, stored, secured and shared throughout and after a project. UK funders including MRC and Wellcome require a DMP, or an equivalent outputs plan, at application stage to demonstrate researchers have planned for responsible data stewardship and future reuse.

    How to write a data management plan?

    Writing a DMP means addressing data type and volume, documentation and metadata standards, ethical and legal compliance, storage and security arrangements, and a sharing and preservation route via a named repository. MRC applicants must use UKRI’s fixed template; Wellcome applicants draft a proportionate outputs management plan without a set format.

    What are the 5 steps to data management?

    Most funder templates cover five areas: data description, documentation and metadata, ethical and legal compliance, storage and security, and data sharing and preservation. MRC and Wellcome both map onto this structure, though Wellcome extends the final step to cover software and physical materials alongside data.

    What this means for UK bioscience grant applicants

    Researchers holding, or applying for, both MRC and Wellcome funding on related bioscience or clinical work cannot use a single generic DMP. The MRC’s fixed template and trusted-research-and-innovation checks demand a structured, form-based response; Wellcome’s proportionate outputs management plan demands editorial judgement about what counts as a “significant” output and how physical materials will be tracked alongside data.

    For institutional research administration teams, the practical implication is a checklist mismatch: MRC compliance is verified against a rubric at peer review, while Wellcome compliance is verified narratively at end-of-grant reporting. Multi-funder consortium grants — increasingly common in UK bioscience — should draft to the stricter of the two requirements (typically Wellcome’s publication-time data availability) and then map that single commitment back into each funder’s own plan format, rather than drafting two plans independently.

    As UKRI continues to harmonise data policy guidance across its seven councils, and as Wellcome’s outputs-based model gains attention from other biomedical funders, expect further convergence — but for now, grantees still need to satisfy two distinct documents, two distinct review processes, and two distinct definitions of what “data” even covers.

  • FAIR Dataset Mandates Risk Becoming a Checkbox

    A FAIR dataset is one that meets the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016 — but a funder mandate requiring deposit and a data management plan does not, on its own, guarantee this. Genuine FAIR compliance demands rich metadata, persistent identifiers and community-standard formats that most minimally compliant deposits skip entirely, because current incentive structures reward the act of depositing, not the work of curating.

    A FAIR dataset is a digital research object — data or its metadata — that satisfies the Findable, Accessible, Interoperable and Reusable principles first formalised by the FORCE11 community and published in Scientific Data in March 2016. The principles were designed to be applied in degrees, not as a pass/fail gate, which is precisely where funder policy and researcher practice have diverged.

    What does a FAIR dataset actually require?

    The FAIR principles set out four categories of requirement, each broken into specific sub-criteria. They are deliberately conceptual rather than prescriptive, which is a strength for cross-disciplinary adoption and a weakness for enforcement.

    • Findable — data and metadata carry a globally unique, persistent identifier and are indexed in a searchable resource.
    • Accessible — retrieval uses a standardised, open protocol, with metadata remaining accessible even when the underlying data cannot be.
    • Interoperable — data and metadata use a shared, formal language and vocabularies that follow FAIR principles themselves.
    • Reusable — data carry a clear licence, detailed provenance, and conform to domain-relevant community standards.

    The Research Data Alliance’s FAIR Data Maturity Model, published in 2020, decomposes these four principles into 41 discrete indicators covering both data and metadata. That granularity matters: a dataset can satisfy some indicators and fail most others while still being described, loosely, as “FAIR.” A funder checking only for repository deposit is verifying perhaps one or two of the 41.

    Why do funder mandates default to minimal compliance?

    Funder FAIR requirements typically operationalise as two things: a submitted data management plan and a deposit in a recognised repository at the end of the project. Neither step audits metadata richness, vocabulary use, or licensing clarity. The result is a policy that is easy to comply with and easy to satisfy without producing a dataset anyone outside the original team could actually reuse.

    Three structural gaps explain why:

    • Resourcing. Science Europe’s funders’ briefing on data management planning recommends that compliant curation cost roughly 5% of total research budget — a figure rarely built into grant awards, leaving curation as unfunded overhead.
    • Recognition. Data curation is not weighted in hiring, promotion or tenure decisions in most institutions, so time spent enriching metadata competes directly with time spent on publications that do count.
    • Standards gaps. Many disciplines still lack the domain-relevant community vocabularies that Interoperability and Reusability depend on, so even willing depositors have nothing FAIR-compliant to conform to.

    Horizon Europe requires that all data produced under the programme be FAIR “by default,” which is the strongest funder-level statement of intent currently in force. Yet the European Commission’s own guidance materials acknowledge that FAIRness is a spectrum, not a binary condition — an admission that sits uneasily alongside a compliance model built around a single deposit checkpoint.

    The maturity gap: from “FAIR start” to genuine reusability

    The European Commission’s Joint Research Centre published FAIR Data Guidelines in 2025 that organise the RDA’s 41 indicators into five progressive maturity levels. The framework is useful precisely because it makes visible how far “minimally compliant” sits from “genuinely reusable.”

    Maturity level What it requires
    FAIR start Published in a catalogue with mandatory metadata; data itself is not structured for machine reuse.
    FAIR play Links added between datasets and related resources, with enriched provenance and cross-referencing.
    FAIR go Data structured to community standards, with defined terminologies (not necessarily machine-readable).
    FAIR share Machine-readable data models (JSON Schema, XML Schema, SHACL) with richly documented provenance.
    FAIRest of them all Machine-readable model endorsed by the domain community; terms exposed via shared FAIR vocabularies.

    Most mandate-driven deposits land at “FAIR start” — indexed, licensed, discoverable, but not structured for genuine machine or cross-team reuse. The JRC guidelines are explicit that not every dataset needs the top tier, but they are equally explicit that FAIRness can degrade over time if metadata and platforms are not actively maintained. A one-off deposit satisfying a funder’s closeout requirement is not maintenance; it is a snapshot.

    Rebuilding incentives for genuine data stewardship

    Treating FAIR as a compliance checkbox is a governance failure, not a researcher failure. Three changes would shift the incentive structure toward genuine stewardship rather than deposit-and-forget behaviour.

    1. Credit the labour. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. “Data curation” is one of its fourteen roles, offering institutions an existing, citable mechanism to formally recognise stewardship work in author contribution statements — a mechanism that remains inconsistently applied in promotion and tenure review.
    2. Fund it explicitly. Grant budgets should ring-fence curation costs at the level Science Europe’s own guidance recommends, rather than treating data management plans as an unfunded compliance document.
    3. Audit maturity, not deposit. Funders and institutions should reference maturity models such as the RDA’s 41 indicators or the JRC’s five-level scale in closeout review, rather than accepting repository deposit as sufficient evidence of FAIR compliance.

    FAIR is also not a complete governance answer on its own. The CARE Principles for Indigenous Data Governance, released by the Global Indigenous Data Alliance in 2019, extend the framework to cover collective benefit, authority to control, responsibility and ethics — dimensions that a pure findability-and-format checklist does not touch. Institutions building data policy around FAIR alone are optimising for machine reuse while leaving governance and consent questions unaddressed.

    Frequently asked questions

    What is a FAIR dataset?

    A FAIR dataset satisfies the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016. It carries a persistent identifier, standardised access, shared vocabularies, and clear licensing and provenance — not merely a repository listing.

    What does FAIR stand for with data?

    FAIR stands for Findable, Accessible, Interoperable and Reusable. The acronym describes a framework for data stewardship, not a certification; the Research Data Alliance breaks it into 41 measurable indicators rather than a single pass condition.

    What does FAIR stand for in data management?

    In data management, FAIR describes the target state a data management plan should work toward: identifiers, rich metadata, open protocols and community-standard formats. It guides curation decisions throughout a project, not just the final deposit.

    Why does FAIR data matter?

    FAIR data matters because it lets both humans and machines discover, verify and reuse research outputs without contacting the original authors. Poorly curated “FAIR” deposits undermine reproducibility and waste the public investment funders intended the mandate to protect.

    Implications and outlook

    Funder FAIR mandates have succeeded in one respect: deposit rates have risen sharply since 2016. They have not, on current evidence, produced datasets that are reliably machine-actionable or cross-team reusable at scale. That gap will not close through stricter wording in policy documents; it requires funders to resource curation at realistic cost, institutions to credit it in career progression via mechanisms such as CRediT’s Data curation role, and disciplines to finish building the community standards that Interoperability depends on. Until those three conditions are met, “FAIR by default” will remain a policy aspiration rather than a description of the average deposited dataset.

  • Data Sharing Policy: A Research Office Template

    A data sharing policy is the institution-wide governance document that sets expectations for how researchers plan, deposit, and share research data — distinct from a data sharing agreement, which is the specific legal contract governing one data transfer. Research offices write policies to translate funder FAIR data mandates, such as the NIH’s 2023 Data Management and Sharing Policy, into consistent local practice.

    A data sharing policy is an institutional statement of principle and requirement: it tells every researcher, department, and grant applicant what the organisation expects of them before, during, and after a funded project, regardless of discipline or funder. It is not a substitute for a project-level data management plan (DMP), and it is not the same document as a data sharing agreement — the confusion between the two is the single most common drafting mistake research offices make.

    What is an institutional data sharing policy?

    An institutional data sharing policy is a governance document, usually owned jointly by the research office, library, and IT services, that sets baseline rules for how the organisation’s researchers manage and share the data underlying their published outputs. It applies across all disciplines and funders, rather than to a single grant.

    Published examples illustrate the range: the Office for National Statistics operates a data sharing policy governing record-level personal information, while Cancer Research UK’s data sharing and management policy sets FAIR-aligned requirements as a condition of every grant it awards. Both share a common shape — purpose, scope, principles, requirements, and named responsibilities — even though one governs a funder’s grant conditions and the other governs a public body’s statistical data.

    For a research office, the policy is the document that makes funder requirements operational at institutional scale: instead of each principal investigator interpreting a funder’s data mandate independently, the institution issues one interpretation, one set of approved repositories, and one escalation route for exceptions.

    Why research offices need a data sharing policy now

    Research offices need a written policy because funders increasingly make data sharing a condition of funding, not a recommendation, and institutions without a policy leave researchers to interpret those conditions inconsistently — which creates compliance risk at renewal, audit, and publication stages.

    The mandate landscape has hardened over the past decade:

    • NIH’s 2023 Data Management and Sharing Policy took effect on 25 January 2023 and requires a data management and sharing plan for essentially all NIH-funded research, reviewed alongside the science.
    • UKRI is a signatory to the 2016 Concordat on Open Research Data, which commits funded institutions to making research data openly available with as few restrictions as possible.
    • Horizon Europe’s Model Grant Agreement requires a FAIR-aligned data management plan for participating projects, applying the “as open as possible, as closed as necessary” principle carried over from Horizon 2020.
    • ICMJE’s data sharing statement requirement has applied to clinical trials that began enrolling participants on or after 1 January 2019, requiring a data availability statement as a condition of publication in ICMJE-following journals.

    Each of these mandates is written at the funder level. The institutional policy is what converts them into a single, consistent set of expectations that a research office can actually train staff on and audit against.

    Data sharing policy vs data sharing agreement

    A data sharing policy and a data sharing agreement solve different problems: the policy is a standing, institution-wide statement of expectations, while the agreement is a one-off legal contract governing a specific transfer of specific data between specific parties. Research offices need both, but they are drafted, owned, and reviewed differently.

    Aspect Institutional data sharing policy Data sharing agreement
    Scope All researchers, all funded projects, ongoing One dataset, one recipient, one purpose
    Trigger Institutional governance cycle A specific request or collaboration
    Legal status Internal policy; not itself a contract Binding contract, often referencing UK GDPR
    Typical owner Research office, library, IT, ethics committee Data protection officer, legal counsel
    Reviewed by Institution, periodically Both parties, per transfer

    A well-written policy should explicitly state this distinction and point researchers to the correct process for each: the policy for general expectations and deposit requirements, the agreement (or a data protection impact assessment) for any transfer involving personal, sensitive, or third-party data governed by UK GDPR.

    Template structure: what to include

    A usable institutional data sharing policy needs roughly eight components, moving from purpose through to enforcement, so that researchers and reviewers can find any given requirement in under a minute.

    1. Preamble and purpose — why the institution requires data sharing and its relationship to the FAIR principles, first published in Scientific Data in 2016.
    2. Scope — which staff, students, and data (all disciplines, all funders, or funder-specific) the policy covers.
    3. Definitions — research data, metadata, persistent identifier, data management plan, repository.
    4. Policy statements — the DMP requirement, repository and persistent-identifier expectations, metadata standards, data licensing, and minimum retention period.
    5. Data availability statements — a requirement that publications state how and where the underlying data can be accessed.
    6. Roles and responsibilities — what is expected of researchers, the research office, the library, IT, and departmental leadership.
    7. Exceptions and embargoes — the process for restricting access on ethical, legal, or commercial grounds.
    8. Review and implementation — the cycle on which the policy itself is revisited against evolving funder mandates.
    Section What it should specify
    Data deposit Named or criteria-based approved repositories, with a preference for those issuing DOIs via DataCite
    Persistent identifiers ORCID for researchers; DOIs for datasets
    Contributor recognition Use of Contributor Role Taxonomy (CRediT) statements so data curation and stewardship work is credited
    Retention A specific minimum period (commonly ten years post-publication) rather than an open-ended commitment
    Sensitive data A named route to ethics and data protection review before any exception is granted

    Note that CASRAI originated the CRediT contributor role taxonomy in 2014; the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022, and institutional policies that reference it should cite NISO, not CASRAI, as the current maintaining body.

    Frequently asked questions and next steps

    Is a data sharing agreement legally required?

    A data sharing agreement is not universally mandated by statute in the UK, but it is required in practice whenever personal or confidential data is transferred between organisations under UK GDPR, and it is frequently a condition set by funders or ethics committees. An institutional data sharing policy is separate and is typically a funder or institutional requirement rather than a legal one.

    What is the data sharing law in the UK?

    UK data sharing is governed primarily by the UK GDPR and the Data Protection Act 2018, which set the rules for handling personal data, alongside the common law of confidentiality. Research data policies must operate within this framework whenever datasets contain identifiable or sensitive personal information, in addition to meeting funder FAIR requirements.

    What are the six key data sharing principles?

    Widely cited data sharing principles hold that shared information should be necessary, proportionate, relevant, accurate, timely, and secure. Institutional research data policies should apply the same discipline alongside FAIR — findable, accessible, interoperable, reusable — so that openness and data protection obligations are handled together rather than in conflict.

    Once a first draft exists, research offices should route it through the same stakeholders named in the policy itself — library, IT, ethics, and legal — before it goes to institutional governance for sign-off, and set a firm review date rather than leaving the document to lapse.

    As funders continue tightening data mandates, from NIH’s 2023 policy to Horizon Europe’s FAIR requirements, institutions without a current, clearly scoped policy will increasingly find researchers improvising compliance at the point of grant application — precisely the risk a written data sharing policy is designed to remove. Research offices that keep the policy distinct from the data sharing agreement, and review it on a fixed cycle, are best placed to keep pace with the next round of funder requirements.

  • Data Management Plans for Qualitative Research: FAIR Meets Consent and Anonymisation

    A data management plan for qualitative research must do something generic STEM-oriented DMP templates rarely address: reconcile funder mandates for FAIR (Findable, Accessible, Interoperable, Reusable) data with an ethical and legal duty to restrict access to identifiable interview, focus-group and observational data. The two obligations are not opposites — a well-built plan satisfies both by specifying tiered access, consent-driven sharing limits and documented anonymisation, rather than treating “open data” and “protected data” as a binary choice.

    A data management plan (DMP) is a written document, created before data collection begins, that specifies how a project will collect, document, store, protect, share and preserve its research data throughout the data lifecycle. For qualitative and human-subjects research, the plan must additionally specify how consent scope, anonymisation depth and legal basis under data protection law will be recorded and enforced at each stage.

    What is a data management plan for qualitative research?

    A qualitative DMP is the same core document required for any funded project — covering data types, documentation, storage, sharing and preservation — but written around data that is, by design, difficult to de-identify. Interview transcripts, field notes, focus-group recordings and open-text survey responses carry contextual detail that resists the aggregation techniques used to anonymise quantitative datasets.

    Most UK funders require a DMP at application stage. The Economic and Social Research Council has required a data management and sharing statement since its 2010 Research Data Policy, and UK Research and Innovation’s common principles on data policy apply across its research councils. The plan is normally reviewed alongside the ethics application, since data handling and consent decisions are made together.

    Why do FAIR mandates and human-subjects ethics create tension?

    The FAIR principles — Findable, Accessible, Interoperable, Reusable, set out by Wilkinson et al. in Scientific Data (2016) — were written for structured, aggregatable datasets. Applied literally to identifiable human-subjects data, “Accessible” and “Reusable” can conflict with the duty to limit who can read a participant’s own words.

    Funders resolve this with a qualifier, not an exemption: data should be “as open as possible, as closed as necessary” — the formulation used in the European Commission’s Horizon Europe research data policy and echoed by UKRI. This does not excuse qualitative researchers from FAIR compliance; it changes what “Accessible” means in practice, from public download to documented, conditional access.

    FAIR principle Qualitative-data constraint Practical mitigation in the DMP
    Findable Full metadata can itself be re-identifying (project title, participant demographics) Publish a discoverable, de-identified metadata record with a persistent identifier even when the data itself stays closed
    Accessible Transcripts/recordings contain direct identifiers and verbatim quotes Deposit in a repository offering tiered or restricted access, not open download
    Interoperable Coding schemes and qualitative software formats (e.g. NVivo, ATLAS.ti) are often proprietary Document the coding frame and export a non-proprietary format alongside the proprietary project file
    Reusable Reuse by unknown third parties was rarely covered by original consent Use granular, re-use-specific consent wording that anticipates archiving and secondary analysis

    Under UK GDPR and the Data Protection Act 2018, personal data genuinely and irreversibly anonymised falls outside data protection law — but the Information Commissioner’s Office is explicit that this bar is high, and that pseudonymised data (a code replacing a name, with the key retained) remains personal data. The DMP must state, precisely, which version of the data at which stage is personal data and which is anonymised.

    Consent forms are the operative control, not an afterthought. A plan built for FAIR-GDPR reconciliation should specify:

    • Granular consent options separating participation, quotation in publications, and archiving of transcripts or recordings for secondary use
    • An explicit legal basis under UK GDPR Article 6 (and Article 9 condition where special category data — health, ethnicity, political opinion — is discussed)
    • A defined right-of-withdrawal window after which removal from an archived, de-identified dataset is no longer practicable
    • Named repository and access-control arrangements disclosed to participants at consent, not decided afterwards

    The UK Data Service — the Economic and Social Research Council’s designated data archive — operates a three-tier access model qualitative DMPs can cite directly: Open data (freely downloadable), Safeguarded data (registered users agree to an end-user licence), and Controlled data (approved researchers only, via a secure environment). Mapping each output to one of these tiers, rather than a vague “available on request” line, distinguishes a compliant plan from a defensive one.

    What anonymisation techniques belong in the plan?

    Anonymisation of qualitative data is a layered process, not a single redaction pass. A robust DMP names the specific technique used at each stage:

    1. Pseudonymisation during analysis — replacing names with participant codes while a separate, access-restricted key file links code to identity
    2. De-identification for sharing — removing or generalising indirect identifiers: exact job titles, place names, dates, organisational affiliations
    3. Redaction of unavoidable identifiers — where context itself identifies a small or unique population (a single named institution, a rare occupation), replacing detail with a category description
    4. Access-tier assignment — deciding, output by output, whether the residual disclosure risk permits Safeguarded deposit or requires Controlled access only

    The Qualitative Data Repository at Syracuse University and the Consortium of European Social Science Data Archives (CESSDA) both publish worked examples of this layered approach for interview and ethnographic data, and are appropriate repositories to name in a DMP for social-science-led projects.

    When is “not applicable” a legitimate data availability statement?

    A data availability statement (DAS) reading “not applicable” is defensible only when it is reasoned, not default. Journals following ICMJE and COPE guidance expect a DAS for every submission, including qualitative studies; the acceptable move is not silence but a stated justification — for example, that full transcripts cannot be shared because de-identification would strip the interpretive detail the analysis depends on, while a de-identified excerpt corpus or the coding frame is deposited instead.

    Reviewers increasingly flag blanket “not applicable” statements as a data-quality signal, because most qualitative datasets have something shareable — a codebook, an interview guide, aggregated theme frequencies — even when raw transcripts cannot be released. A DMP that pre-commits to this reasoning avoids a weak DAS being drafted under publication-deadline pressure.

    Common questions on qualitative data management plans

    What should a data management plan for qualitative research include?

    It should cover data types collected (transcripts, recordings, field notes), consent scope, anonymisation method, storage and access controls, the repository and access tier for shared outputs, and a retention and deletion schedule for identifiable source files.

    How do you anonymise qualitative data to comply with GDPR?

    Apply pseudonymisation during analysis, then de-identify indirect identifiers (locations, job titles, dates) before sharing. Under UK GDPR, only data anonymised to the point that re-identification is not reasonably likely falls outside data protection law; pseudonymised data remains regulated personal data.

    Do FAIR principles require open data sharing for human-subjects research?

    No. FAIR requires data to be findable and accessible under stated conditions, not necessarily open. Funders including UKRI and the European Commission apply the “as open as possible, as closed as necessary” standard, which explicitly permits restricted or controlled access for identifiable qualitative data.

    Can a data availability statement say “not applicable” for qualitative research?

    Only with a stated reason, such as re-identification risk that de-identification cannot remove. Journals following ICMJE and COPE practice expect a justified statement — noting what, if anything, is shareable (a codebook or interview guide) — rather than a blanket refusal.

    Implications and outlook

    Institutional research offices and ethics committees increasingly review DMPs and consent forms as one package, because anonymisation and access-tier decisions in the DMP determine what the consent form must promise participants. Research administrators supporting qualitative and mixed-methods proposals should treat the FAIR-versus-consent tension as a design question resolved at the DMP stage — via tiered access, granular consent and named repositories — not a compliance problem deferred to publication.

    As funders tighten machine-actionable DMP requirements, qualitative projects that specify access tiers and anonymisation methods in structured, repository-mappable language will be better placed to meet FAIR audit expectations and data protection obligations, without defaulting to an unjustified “closed” or “not applicable” position.

    For related definitions and standards context, see CASRAI’s research data terminology dictionary and the research administration resource hub.

  • NIH Genomic Data Sharing Policy vs DMS Policy

    The NIH Genomic Data Sharing (GDS) Policy and the NIH Data Management and Sharing (DMS) Policy are two separate, still-active NIH policies with different effective dates, different scopes and different submission points — the GDS Policy (2015) governs consent and controlled access for large-scale genomic data, while the DMS Policy (2023) governs data management planning for all NIH-funded scientific data. Grantees who assume the 2023 policy absorbed the 2015 one risk missing a distinct compliance step.

    The NIH Genomic Data Sharing Policy is the funder requirement, effective since 25 January 2015 under Notice NOT-OD-14-124, that governs consent-based data use limitations, controlled-access repositories and data release timelines for large-scale human and non-human genomic data generated with NIH support.

    Table of Contents

    What Is the NIH Genomic Data Sharing (GDS) Policy?

    The GDS Policy replaced NIH’s 2007 Genome-Wide Association Studies (GWAS) data-sharing policy and extended its logic to a wider set of genomic technologies. It applies to studies that generate large-scale human or non-human genomic data, including genome-wide association studies, single nucleotide polymorphism (SNP) arrays, whole-genome and whole-exome sequence data, transcriptomic data and epigenomic data produced by array-based or high-throughput sequencing platforms.

    Two features distinguish it from a generic sharing mandate:

    • A two-tiered access model — unrestricted (open) data versus controlled-access data held in a repository such as dbGaP, the NIH database of Genotypes and Phenotypes.
    • A consent-based data use limitation system, under which informed consent documents must state what data types will be shared and whether access will be open or controlled, so that secondary users are legally and ethically bound to the participant’s original consent.

    The National Human Genome Research Institute (NHGRI) implements the policy operationally through Notices NOT-HG-15-038 and NOT-HG-20-011, and designates AnVIL alongside dbGaP as primary repositories for NHGRI-funded genomic data.

    How Does the GDS Policy Differ From the DMS Policy?

    The NIH Data Management and Sharing Policy, effective 25 January 2023 under Notice NOT-OD-21-013, is far broader in scope. It applies to essentially all NIH-funded research producing “scientific data” — any data commonly accepted in the field as sufficient to validate and replicate findings — not only genomic data. It requires a data management and sharing plan with every competing grant application, whereas the GDS Policy’s genomic-specific requirements historically attached at the Just-in-Time stage, after review but before award.

    NIH has since directed that the two policies be harmonised into a single submission: where a project is subject to both, the genomic-specific elements (consent language, data type, repository choice, controlled- versus open-access designation) are folded into one data management and sharing plan rather than filed as two separate documents. The table below sets out where the policies still diverge.

    Feature GDS Policy (2015) DMS Policy (2023)
    Governing notice NOT-OD-14-124 NOT-OD-21-013
    Effective date 25 January 2015 25 January 2023
    Scope Large-scale human and non-human genomic data All NIH-funded scientific data, any type
    Core document Genomic Data Sharing Plan + Institutional Certification Data management and sharing plan
    Consent mechanism Consent-based data use limitations, enforced via dbGaP Data Access Committees General “justifiable limitations” language; no genomic-specific consent tiers
    Typical repository dbGaP, AnVIL (controlled- or open-access) Any NIH-designated or discipline-appropriate research data repository
    Budget provision Not addressed directly Explicitly allows data management and sharing costs in the budget

    Who Must Submit an Institutional Certification?

    An Institutional Certification is a GDS Policy-specific attestation — separate from the data management and sharing plan — that the institution has reviewed the consent language, IRB approval and data use limitations attached to the human genomic data before it is deposited in a controlled-access repository. It is not required by the DMS Policy for non-genomic data.

    Institutions must certify, among other things, that:

    • The data was collected in a manner consistent with 45 CFR 46 (the Common Rule) and applicable state and local laws.
    • Consent forms permit the specific type of data use requested (general research use versus disease-specific use).
    • Identifiers have been removed or the data otherwise meets the applicable de-identification standard.

    Because this certification is a distinct compliance artefact from the data management and sharing plan, research administrators who track only DMS Plan compliance can miss it entirely on genomic awards.

    How Does Controlled Access Work Under the GDS Policy?

    Controlled-access genomic data sits in dbGaP behind a Data Access Committee (DAC) review process. Secondary users submit a data access request describing their intended research use; the DAC checks that use against the consent-based data use limitation recorded for that dataset before granting access. This is materially different from the DMS Policy’s general expectation of “broadest appropriate sharing,” which does not itself impose a use-limitation enforcement layer — that enforcement mechanism is a GDS-specific feature.

    Answer-First Q&A

    Does the 2023 DMS Policy Replace the 2015 GDS Policy?

    No. The DMS Policy did not replace or repeal the GDS Policy; both remain in force. NIH’s own guidance directs grantees generating large-scale genomic data to satisfy GDS-specific requirements — informed consent language, Institutional Certification, controlled-access designation — within the single data management and sharing plan required by the DMS Policy, rather than as an independent document.

    What Counts as “Large-Scale” Genomic Data Under the GDS Policy?

    NIH does not set one fixed threshold; NHGRI and other institutes assess scale case by case, typically referencing genome-wide association studies, whole-genome or whole-exome sequencing, and array-based platforms as presumptively “large-scale.” Investigators with borderline projects should confirm applicability with their institute’s program officer before submission, since NHGRI also encourages voluntary sharing of smaller datasets.

    When Is the Institutional Certification Submitted?

    The Institutional Certification is submitted at the Just-in-Time stage — after peer review, once an application is being considered for funding — not with the initial application. This differs from the data management and sharing plan itself, which NIH requires as part of the competing application under the DMS Policy.

    Which Repository Satisfies the GDS Policy?

    NIH designates dbGaP for controlled-access human genomic data and, for NHGRI-funded work specifically, AnVIL as the primary repository accepting both controlled- and open-access data. Investigators may propose an alternative repository in the data management and sharing plan, subject to institute approval before funding.

    Implications for Research Administrators

    The practical risk is not policy conflict but a compliance gap: an office that maps its DMS Policy checklist to grant application review alone will miss the GDS Policy’s Just-in-Time Institutional Certification and its ongoing dbGaP registration obligations. Research administration offices supporting genomic PIs need two intake questions, not one — does this award generate large-scale genomic data, and if so, has the Institutional Certification been routed separately from the data management and sharing plan.

    As NIH continues to harmonise guidance across institutes, expect more sub-policies — clinical trials data sharing, foreign genomic data transfer rules — to layer onto rather than replace the DMS Policy’s baseline. Treating “DMS compliance” as a single checkbox will increasingly understate what a genomics-heavy award actually requires.

  • Data Availability Statement Not Applicable Rules

    A data availability statement (DAS) reading “not applicable” is defensible only in narrow, specific circumstances — chiefly when no new data were generated or analysed, when data are proprietary clinical or commercial records, or when a legal or ethical restriction genuinely blocks disclosure. Outside those cases, “not applicable” is increasingly flagged by editors and funders as a red flag rather than a compliant statement.

    A data availability statement is a mandatory or recommended manuscript section, usually placed before the references, that tells readers where the data underpinning a study’s findings can be found and under what conditions they can be accessed. Since most major publishers (Springer Nature, Wiley, Taylor & Francis, PLOS) now require a DAS on every research article, “not applicable” has become one of the most commonly misused entries in it — and one of the most commonly queried at copyediting or peer-review stage.

    When is “not applicable” a defensible data availability statement?

    “Not applicable” is defensible when it is factually true that no dataset exists to disclose. Taylor & Francis’s author-services template lists this explicitly as one option among many, with the standard wording: “Data sharing is not applicable to this article as no new data were created or analyzed in this study.” Springer Nature uses near-identical phrasing for theoretical and mathematical papers that involve no empirical dataset.

    Three case types consistently pass editorial and funder scrutiny:

    • No new data generated. Review articles, theoretical papers, editorials, commentaries, book reviews, and hypothesis or proposal papers that synthesise existing literature rather than produce new datasets.
    • Genuinely proprietary or clinical data under contractual control. Data held by a third-party sponsor, clinical trial data governed by a data-use agreement the author cannot unilaterally waive, or commercially embargoed findings pending patent filing.
    • Data restricted by law or binding ethics approval. National statistical agency microdata, patient-level clinical records where the original informed-consent language did not cover public sharing, or datasets covered by data-protection legislation such as UK GDPR.

    When does “not applicable” trigger an editorial or funder query?

    “Not applicable” triggers a query whenever a study plainly did generate or analyse data but the statement fails to say why access is restricted. PLOS’s data-availability policy, in force for all research articles submitted since March 2014, states that the “not applicable” exemption applies only to article types that structurally contain no dataset — not to empirical studies that simply prefer not to share.

    Cranfield University’s research-data-management guidance explicitly names “Availability of data and materials: ‘Not applicable’” as an example of an unclear statement when used on an empirical paper, because it gives the reader no route to verification. That is the core distinction editors are trained to apply: “not applicable” answers “does a dataset exist?”, not “will you share it?” Using it to avoid disclosing data that does exist — without stating a legal, ethical or commercial restriction — is what draws a production-stage or peer-review query.

    Statement pattern Typically accepted? Why
    “Not applicable — no new data generated” Yes Factually verifiable from article type
    “Not applicable” on an empirical/quantitative study No — triggers query Data exists; statement misrepresents the situation
    “Data available on request from the corresponding author” Conditional Only under Basic or Share-Upon-Request publisher policies; must name the restriction
    “Data not available due to [named] ethical/legal/commercial restriction” Yes Restriction is stated and attributable
    Silence / statement omitted entirely No — triggers query Most publishers now mandate a DAS on every submission

    “Available on request” versus “not applicable”: are they the same thing?

    No — they answer different questions and are not interchangeable. A data availability statement upon request concedes that a dataset exists but sets a conditional access route (typically via the corresponding author), whereas “not applicable” asserts that no dataset exists at all. Taylor & Francis restricts “available on request” wording to journals operating under its Basic or Share Upon Reasonable Request policies; it is not a universal fallback.

    Editors increasingly scrutinise “available on request” statements too, following widely reported non-responsiveness rates in follow-up author contact — a dynamic documented in reproducibility literature and discussed on researcher forums such as Reddit’s r/AskAcademia. A defensible “on request” statement names the corresponding author’s role, the reason data are not openly deposited (privacy, participant consent, third-party licence), and — where a repository embargo applies — the release date.

    How do funder data-sharing mandates change the calculus?

    Funder policy increasingly overrides publisher-level flexibility on “not applicable.” Under the NIH Data Management and Sharing Policy, effective for all applications submitted on or after 25 January 2023, NIH-funded research that generates scientific data must include a Data Management and Sharing Plan — “not applicable” is only accepted where the award genuinely produces no scientific data (e.g. some career-development or infrastructure awards).

    In the UK, UKRI’s Common Principles on Data Policy and the underlying Concordat on Open Research Data set an expectation that publicly funded research data be made as open as possible, as restricted as necessary — meaning a “not applicable” statement on a UKRI-funded empirical study should be paired with a funder-facing data management plan explaining the exemption, not left to stand alone. The ICMJE data-sharing statement requirement, in effect for clinical trials that began enrolment on or after 1 January 2019, similarly mandates a specific data-sharing statement in the trial registration and the manuscript; a bare “not applicable” does not satisfy it for an enrolling trial.

    • Check the specific funder mandate before defaulting to “not applicable” — publisher policy and funder policy are separate compliance layers.
    • Where a funder plan exists (e.g. an NIH DMS Plan or a Horizon Europe data management plan under cOAlition S expectations), reference it rather than repeating a bare exemption.
    • For systematic reviews specifically, a data availability statement for systematic review should confirm whether extracted data tables, search strategies, or code are available, even though no primary dataset was generated — “not applicable” applies only to the absence of new primary data, not to the review’s own extraction materials.

    Answer-first Q&A

    What do you write in a data availability statement?

    A compliant data availability statement names where the data live (repository, supplementary file, or “not applicable” with a reason), includes a DOI or accession number where one exists, and states any access conditions. Reviews, theoretical papers, and studies with no new dataset should say so explicitly rather than leaving the section blank.

    What is the data availability statement data not available?

    A “data not available” statement means the underlying data exist but access is restricted — for ethical, legal, or commercial reasons — and the restriction must be named. This differs from “not applicable,” which asserts no dataset was ever created. Conflating the two is the single most common cause of an editorial query at submission or production stage.

    What does data availability mean?

    Data availability describes whether, and how, the dataset behind a study’s findings can be accessed by readers and reproducibility auditors. Publishers including Springer Nature and PLOS treat the statement as a mandatory element of the peer-review record, on equal footing with author contributions and conflict-of-interest disclosures.

    Implications for research administrators

    Research offices and library data-management teams are best placed to catch a misapplied “not applicable” before submission, because they hold institutional visibility across a researcher’s funder obligations that a single-article editor does not. A pre-submission check against the relevant funder’s data policy — UKRI, NIH, or a Horizon Europe grant agreement — will catch the majority of cases where “not applicable” would otherwise be accepted by a publisher’s automated submission system but later queried by a funder compliance audit.

    As funder data-sharing mandates tighten and publishers add automated DAS-completeness checks at submission, the margin for a generic “not applicable” will keep narrowing. Authors and administrators who document the specific reason — no new data, named legal restriction, or named commercial embargo — will clear both editorial and funder review; those who use it as a default will increasingly find it queried, not accepted.

    For related terminology, see the CASRAI Research Glossary and the CASRAI-originated CRediT contributor role taxonomy, now stewarded by NISO as ANSI/NISO Z39.104-2022, which governs how data-curation and formal-analysis contributions are credited alongside data availability disclosures.