Tag: research data management plan

  • Data Management Plans for Qualitative Research: FAIR Meets Consent and Anonymisation

    A data management plan for qualitative research must do something generic STEM-oriented DMP templates rarely address: reconcile funder mandates for FAIR (Findable, Accessible, Interoperable, Reusable) data with an ethical and legal duty to restrict access to identifiable interview, focus-group and observational data. The two obligations are not opposites — a well-built plan satisfies both by specifying tiered access, consent-driven sharing limits and documented anonymisation, rather than treating “open data” and “protected data” as a binary choice.

    A data management plan (DMP) is a written document, created before data collection begins, that specifies how a project will collect, document, store, protect, share and preserve its research data throughout the data lifecycle. For qualitative and human-subjects research, the plan must additionally specify how consent scope, anonymisation depth and legal basis under data protection law will be recorded and enforced at each stage.

    What is a data management plan for qualitative research?

    A qualitative DMP is the same core document required for any funded project — covering data types, documentation, storage, sharing and preservation — but written around data that is, by design, difficult to de-identify. Interview transcripts, field notes, focus-group recordings and open-text survey responses carry contextual detail that resists the aggregation techniques used to anonymise quantitative datasets.

    Most UK funders require a DMP at application stage. The Economic and Social Research Council has required a data management and sharing statement since its 2010 Research Data Policy, and UK Research and Innovation’s common principles on data policy apply across its research councils. The plan is normally reviewed alongside the ethics application, since data handling and consent decisions are made together.

    Why do FAIR mandates and human-subjects ethics create tension?

    The FAIR principles — Findable, Accessible, Interoperable, Reusable, set out by Wilkinson et al. in Scientific Data (2016) — were written for structured, aggregatable datasets. Applied literally to identifiable human-subjects data, “Accessible” and “Reusable” can conflict with the duty to limit who can read a participant’s own words.

    Funders resolve this with a qualifier, not an exemption: data should be “as open as possible, as closed as necessary” — the formulation used in the European Commission’s Horizon Europe research data policy and echoed by UKRI. This does not excuse qualitative researchers from FAIR compliance; it changes what “Accessible” means in practice, from public download to documented, conditional access.

    FAIR principle Qualitative-data constraint Practical mitigation in the DMP
    Findable Full metadata can itself be re-identifying (project title, participant demographics) Publish a discoverable, de-identified metadata record with a persistent identifier even when the data itself stays closed
    Accessible Transcripts/recordings contain direct identifiers and verbatim quotes Deposit in a repository offering tiered or restricted access, not open download
    Interoperable Coding schemes and qualitative software formats (e.g. NVivo, ATLAS.ti) are often proprietary Document the coding frame and export a non-proprietary format alongside the proprietary project file
    Reusable Reuse by unknown third parties was rarely covered by original consent Use granular, re-use-specific consent wording that anticipates archiving and secondary analysis

    Under UK GDPR and the Data Protection Act 2018, personal data genuinely and irreversibly anonymised falls outside data protection law — but the Information Commissioner’s Office is explicit that this bar is high, and that pseudonymised data (a code replacing a name, with the key retained) remains personal data. The DMP must state, precisely, which version of the data at which stage is personal data and which is anonymised.

    Consent forms are the operative control, not an afterthought. A plan built for FAIR-GDPR reconciliation should specify:

    • Granular consent options separating participation, quotation in publications, and archiving of transcripts or recordings for secondary use
    • An explicit legal basis under UK GDPR Article 6 (and Article 9 condition where special category data — health, ethnicity, political opinion — is discussed)
    • A defined right-of-withdrawal window after which removal from an archived, de-identified dataset is no longer practicable
    • Named repository and access-control arrangements disclosed to participants at consent, not decided afterwards

    The UK Data Service — the Economic and Social Research Council’s designated data archive — operates a three-tier access model qualitative DMPs can cite directly: Open data (freely downloadable), Safeguarded data (registered users agree to an end-user licence), and Controlled data (approved researchers only, via a secure environment). Mapping each output to one of these tiers, rather than a vague “available on request” line, distinguishes a compliant plan from a defensive one.

    What anonymisation techniques belong in the plan?

    Anonymisation of qualitative data is a layered process, not a single redaction pass. A robust DMP names the specific technique used at each stage:

    1. Pseudonymisation during analysis — replacing names with participant codes while a separate, access-restricted key file links code to identity
    2. De-identification for sharing — removing or generalising indirect identifiers: exact job titles, place names, dates, organisational affiliations
    3. Redaction of unavoidable identifiers — where context itself identifies a small or unique population (a single named institution, a rare occupation), replacing detail with a category description
    4. Access-tier assignment — deciding, output by output, whether the residual disclosure risk permits Safeguarded deposit or requires Controlled access only

    The Qualitative Data Repository at Syracuse University and the Consortium of European Social Science Data Archives (CESSDA) both publish worked examples of this layered approach for interview and ethnographic data, and are appropriate repositories to name in a DMP for social-science-led projects.

    When is “not applicable” a legitimate data availability statement?

    A data availability statement (DAS) reading “not applicable” is defensible only when it is reasoned, not default. Journals following ICMJE and COPE guidance expect a DAS for every submission, including qualitative studies; the acceptable move is not silence but a stated justification — for example, that full transcripts cannot be shared because de-identification would strip the interpretive detail the analysis depends on, while a de-identified excerpt corpus or the coding frame is deposited instead.

    Reviewers increasingly flag blanket “not applicable” statements as a data-quality signal, because most qualitative datasets have something shareable — a codebook, an interview guide, aggregated theme frequencies — even when raw transcripts cannot be released. A DMP that pre-commits to this reasoning avoids a weak DAS being drafted under publication-deadline pressure.

    Common questions on qualitative data management plans

    What should a data management plan for qualitative research include?

    It should cover data types collected (transcripts, recordings, field notes), consent scope, anonymisation method, storage and access controls, the repository and access tier for shared outputs, and a retention and deletion schedule for identifiable source files.

    How do you anonymise qualitative data to comply with GDPR?

    Apply pseudonymisation during analysis, then de-identify indirect identifiers (locations, job titles, dates) before sharing. Under UK GDPR, only data anonymised to the point that re-identification is not reasonably likely falls outside data protection law; pseudonymised data remains regulated personal data.

    Do FAIR principles require open data sharing for human-subjects research?

    No. FAIR requires data to be findable and accessible under stated conditions, not necessarily open. Funders including UKRI and the European Commission apply the “as open as possible, as closed as necessary” standard, which explicitly permits restricted or controlled access for identifiable qualitative data.

    Can a data availability statement say “not applicable” for qualitative research?

    Only with a stated reason, such as re-identification risk that de-identification cannot remove. Journals following ICMJE and COPE practice expect a justified statement — noting what, if anything, is shareable (a codebook or interview guide) — rather than a blanket refusal.

    Implications and outlook

    Institutional research offices and ethics committees increasingly review DMPs and consent forms as one package, because anonymisation and access-tier decisions in the DMP determine what the consent form must promise participants. Research administrators supporting qualitative and mixed-methods proposals should treat the FAIR-versus-consent tension as a design question resolved at the DMP stage — via tiered access, granular consent and named repositories — not a compliance problem deferred to publication.

    As funders tighten machine-actionable DMP requirements, qualitative projects that specify access tiers and anonymisation methods in structured, repository-mappable language will be better placed to meet FAIR audit expectations and data protection obligations, without defaulting to an unjustified “closed” or “not applicable” position.

    For related definitions and standards context, see CASRAI’s research data terminology dictionary and the research administration resource hub.

  • UKRI Data Management Plan Template Guide for Multi-Council Grants

    UKRI’s common data management plan template asks applicants to describe, section by section, how research data will be generated, documented, stored, shared and preserved — but the level of detail, word limit and submission requirement differ by council: MRC and BBSRC mandate a full plan, NERC requires only a one-page outline, and EPSRC does not require submission at all.

    A data management plan (DMP) is a structured document, submitted with or alongside a grant application, that specifies how research data will be collected, documented, stored, shared and preserved throughout and after a funded project. For UKRI-funded researchers, the practical difficulty is not knowing what a DMP is — it is knowing which version of the UKRI data management plan template applies to their council, how long it should be, and what each field is actually asking for. This walkthrough goes section by section across the four councils most research administrators handle together on multi-strand or interdisciplinary awards: MRC, BBSRC, NERC and EPSRC.

    What does UKRI’s common data management plan template cover?

    UKRI does not operate a single, mandatory template across all seven research councils. Instead, each council publishes its own guidance built around a common core of questions: what data will be produced, how it will be documented, where it will be stored, who can access it, and how long it will be retained. This shared structure is why researchers refer informally to a “UKRI data management plan template”, even though the actual document you complete depends on which council is funding the work.

    The starting point for most multi-council applicants is the MRC data management plan template, a Word document published via UKRI’s publications library, because several other councils’ library-hosted templates (including NC3Rs-badged studies) reuse its structure. NERC, BBSRC and EPSRC each layer council-specific expectations — word limits, submission timing, and retention periods — on top of that shared skeleton.

    How do requirements differ across MRC, BBSRC, NERC and EPSRC?

    The single biggest source of error in multi-council DMPs is applying one council’s rules to another council’s proposal. The table below sets out the four core differences research administrators need to check before drafting.

    Council DMP required at application? Template source Length Minimum data retention
    MRC Yes — mandatory for all funding proposals MRC Data Management Plan template (UKRI publications library) 500–1,500 words; 1,500 words for longitudinal studies, population cohorts, genetic, omics, imaging data and biobanks 10 years (20 years for population health and clinical studies)
    BBSRC Yes — mandatory for grant applications BBSRC template via DMPOnline (Digital Curation Centre) Maximum 500 words (check individual grant-stream variation) 10 years after project completion
    NERC Yes — one-page outline at application; full plan later NERC Outline Data Management Plan template and guidance (UKRI publications library) One page at application; full plan agreed with the relevant NERC data centre within 3–6 months of award start 10 years minimum
    EPSRC No — not submitted with the application No dedicated EPSRC council template; DMPOnline hosts an EPSRC-structured version for internal use No fixed limit — proportionate to the project 10 years from the end of any privileged-access period

    EPSRC is the outlier: it does not require a DMP to be submitted with the proposal, but most host institutions’ own research data policies still require one to exist internally so costs and storage needs are planned accurately. STFC sits closer to MRC and BBSRC — a DMP is mandatory for most schemes and capped at two sides of A4 — but, unlike MRC, STFC does not prescribe a fixed template.

    Completing the template field by field

    Across MRC, BBSRC, NERC and EPSRC guidance, the same seven fields recur, even where wording and word allowances differ. Address each one in this order.

    • Data collection and generation. State the type of data (quantitative, qualitative, imaging, genomic, environmental sensor data, software), the format, the estimated volume, and whether it is newly generated or reused from an existing source.
    • Documentation and metadata. Name the metadata standard you will apply and describe accompanying documentation — a data dictionary, README file or laboratory notebook — needed for another researcher to interpret the dataset without you.
    • Ethics, consent and legal basis. Cover informed consent, anonymisation or pseudonymisation methods, and who holds intellectual property rights, particularly for MRC-funded clinical or population studies, where this field is scrutinised most closely.
    • Storage and security during the project. Specify where data will sit while the grant is active, backup frequency, and access controls — this is where EPSRC-funded teams should still document internal practice even though nothing is submitted to the council.
    • Long-term preservation. Name the repository (an institutional archive, a NERC environmental data centre, or the UK Data Service for ESRC-adjacent social science data) and confirm the retention period matches your council’s minimum from the table above.
    • Data sharing and access conditions. State which datasets will be shared openly, any embargo or proprietary period, and the justification if some data cannot be shared — commercial sensitivity, participant privacy or national security are the standard justifications UKRI accepts.
    • Responsibilities and resourcing. Name who owns data management delivery after the grant ends and itemise any storage, curation or specialist-staff costs, which can — and should — be included in the full economic cost of the proposal.

    For MRC and NERC applications specifically, the plan text is typically copied directly into the Je-S or funding-service application form rather than uploaded as a separate attachment — check the individual call documentation, since attachment rules vary by scheme and change between funding rounds.

    Common questions about the UKRI data management plan

    How do you write a data management plan?

    Start from your funding council’s specific template rather than a generic one, then work through data collection, documentation, storage, sharing and retention in turn. Keep language concrete and proportionate to your project’s data volume, and justify any decision not to share data rather than leaving it unexplained.

    What is included in a data management plan?

    A complete plan covers the types of data produced, the metadata and documentation standards used, storage and security arrangements, the repository chosen for preservation, access and sharing conditions, and the retention period. UKRI councils also expect a statement of who is responsible for delivery and what resources this requires.

    Do you need a data management plan for a UKRI grant?

    It depends on the council. MRC, BBSRC, NERC and STFC require a DMP to be submitted with most funding proposals, while EPSRC does not require submission, and AHRC has no general DMP requirement at all. Always confirm the specific call documentation, since requirements can vary by scheme within a single council.

    What does a good data management plan look like?

    A strong plan is specific to the project rather than generic, stays within the council’s stated word or page limit, and answers every field with a concrete detail — a named repository, a defined retention period, a stated metadata standard — instead of a vague intention. Reviewers assess it alongside the rest of the proposal during peer review.

    What this means for multi-council applicants

    Institutions running interdisciplinary programmes — a BBSRC-MRC joint call, or a NERC-EPSRC environmental engineering award — cannot draft one DMP and submit it unchanged to both funders. Word limits alone range from 500 words (BBSRC) to 1,500 words (MRC’s most data-intensive study types), and only NERC requires a two-stage outline-then-full-plan process. Research administration teams supporting these awards should build a field-by-field checklist per council into their proposal workflow, rather than relying on a single house template.

    As UKRI continues to consolidate open-research expectations across its councils, researchers should expect incremental convergence on shared metadata and repository standards — but not, in the near term, a single mandatory cross-council template. Until that happens, matching the right template to the right council, at the right length, remains the determining factor in a compliant submission.

    For teams coordinating research administration workflows across funders and councils, see CASRAI’s research administration resources, and consult the CASRAI Dictionary for definitions of related research data terminology.

  • Clinical Data Management Plan vs Research Data Management Plan: What’s the Difference

    On this page:

    A clinical data management plan and a research data management plan are two of the most frequently conflated documents in the clinical trial lifecycle. Both use the acronym “DMP” in casual conversation, both get drafted before a study starts, and both concern “data” in the broadest sense — but they answer to different masters, cover different lifecycle stages, and are read by different audiences. Submitting the wrong one to the wrong reviewer is a recurring, avoidable compliance headache for trial units and research offices alike.

    What Is a Clinical Data Management Plan?

    A Clinical Data Management Plan (CDMP) is an operational, trial-specific document that describes exactly how data will move from case report form (CRF) to locked database. It is written by or with the clinical data management (CDM) function — not the principal investigator’s grants office — and it sits alongside the protocol as one of the working documents that Good Clinical Practice (GCP), per ICH E6, expects a sponsor to maintain and be able to produce on inspection.

    A CDMP typically specifies:

    • CRF or eCRF design and the electronic data capture (EDC) system to be used
    • Database build, edit-check specifications and data validation rules
    • Data entry conventions (single vs double entry, query turnaround)
    • Medical coding dictionaries and versions, such as MedDRA and the WHO Drug Dictionary
    • Discrepancy management and serious adverse event reconciliation procedures
    • Roles, responsibilities and sign-off authority for database lock

    Because it is inspected against GCP, a CDMP is a living, version-controlled document updated through the study rather than filed once and forgotten.

    What Is a Research Data Management Plan?

    A Research Data Management Plan (RDMP) is a funder- or institution-facing document submitted at the grant proposal stage, well before a trial’s CDMP would even exist. Its job is compliance with funder and institutional data policy, not trial operations. UK Research and Innovation (UKRI) requires a data management plan for relevant grant applications, Horizon Europe applicants complete one through the Data Management Plan template built into the Horizon Europe Programme Guide, and the NIH Data Management and Sharing (DMS) Policy has required a DMS plan for NIH-funded research since January 2023.

    An RDMP typically covers:

    • What data types and volumes the project will generate or reuse
    • How data will be described, documented and made findable (metadata, identifiers)
    • Storage, security and access-control arrangements during the project
    • Ethical, consent and legal constraints on sharing (particularly for identifiable participant data)
    • Long-term preservation and repository plans, often with a DOI issued via DataCite
    • Alignment with the FAIR principles — Findable, Accessible, Interoperable, Reusable

    Unlike a CDMP, an RDMP is reviewed once (or at defined milestones) by a funder or research office, not audited line-by-line by a regulator during a GCP inspection.

    CDMP vs RDMP: Side-by-Side Comparison

    The table below sets out where the two documents genuinely diverge, so institutions running funded clinical trials know they usually need both — not one instead of the other.

    Dimension Clinical Data Management Plan (CDMP) Research Data Management Plan (RDMP)
    Primary purpose Ensure trial data is accurate, complete and audit-ready for database lock Satisfy funder/institutional policy on data stewardship and sharing
    Governing framework ICH E6 Good Clinical Practice; sponsor/CRO SOPs Funder mandates (UKRI, NIH, Horizon Europe); institutional RDM policy
    Typical author Data manager / clinical data management lead Principal investigator, often with library or research office support
    Created at Study set-up, before first patient enrolled Grant proposal stage, before funding is awarded
    Primary audience CDM team, biostatisticians, sponsor, regulatory inspectors Funder, ethics/IRB reviewers, institutional research office
    Content focus CRF design, edit checks, coding, database lock procedures Data description, storage, ethics, sharing, long-term preservation
    Review cadence Continuously updated through study conduct; inspected on audit Reviewed at proposal and, for some funders, at defined milestones

    Common Questions Answered

    What does a clinical data management plan include?

    A clinical data management plan includes CRF or eCRF specification, database design, data entry and validation procedures, edit-check logic, medical coding dictionaries such as MedDRA, discrepancy and adverse-event reconciliation processes, and clearly defined roles and responsibilities through to database lock, all maintained as a living, version-controlled document inspected under Good Clinical Practice.

    What should a data management plan include?

    A funder-facing research data management plan should describe the data types and volumes a project will generate, how data will be documented and made findable through metadata, storage and security arrangements, ethical and consent constraints on sharing identifiable data, and the eventual repository and preservation route, typically aligned to the FAIR data principles.

    What are the three phases of clinical data management?

    Clinical data management is generally organised into three sequential phases: study set-up, covering database build and CRF design; study conduct, covering data entry, cleaning and query resolution; and study close-out, covering final reconciliation, coding sign-off and database lock ahead of statistical analysis.

    Why the Distinction Matters for Research Administrators

    Institutions running externally funded clinical trials almost always need both documents, produced by different teams on different timelines. A funder reviewer looking for a FAIR-aligned sharing and preservation strategy will not find it in a CDMP’s edit-check specification — and a GCP inspector auditing database lock will not accept an RDMP’s high-level data-sharing statement as evidence of query resolution procedure.

    This is precisely the coordination gap that research administration functions increasingly exist to close: aligning the pre-award compliance document (the RDMP, owned by the grants office) with the operational trial document (the CDMP, owned by clinical data management) so that neither is quietly missing when a funder audit or a regulatory inspection arrives. Institutions that treat the two as interchangeable risk both funder non-compliance and GCP findings — for two entirely separate reasons.

    Consistent terminology helps here. Reviewers, auditors and research offices benefit from a shared reference for what each document is called and what it covers; the CASRAI research administration dictionary maintains definitions for terms that span exactly this pre-award-to-conduct boundary.

    Looking Ahead

    The line between the two documents is not static. ICH’s ongoing revision of E6 Good Clinical Practice has pushed sponsors toward more explicit, risk-based data governance language inside the CDMP itself, while funders such as UKRI and the NIH continue to tighten expectations for FAIR-aligned sharing inside the RDMP. Institutions that keep the two plans distinct — but explicitly cross-referenced — will be best placed to satisfy both regulators and funders as each side’s requirements keep evolving.