Tag: research data management policy

  • What Is a Data Trust? Research Data Governance

    A data trust is a legal and technical framework in which an independent trustee, bound by fiduciary duty, makes decisions about a pool of data on behalf of the people or organisations who contributed it. For research data, this offers a genuine alternative to depositing datasets individually in a repository: instead of each contributor negotiating access terms alone, a trustee stewards shared data collectively, with accountability built into the governance structure itself.

    A data trust can be defined precisely: it is an independent steward, holding data under a formal duty of impartiality, prudence, transparency and undivided loyalty to the beneficiaries whose data it manages, according to the Open Data Institute (ODI), which coined and refined the term from 2018.

    What is a data trust?

    A data trust is a legal structure in which one party authorises an independent trustee to make decisions about data on their behalf, for the benefit of a defined group of stakeholders. The ODI, which published its first explainer on the concept in July 2018 and adopted a working definition later that year, models the idea on established asset trusts such as land trusts, transposing the same fiduciary logic onto data.

    The clearest working example is UK Biobank, established in 2006 as a charitable company with trustees to steward genetic data and biological samples from around 500,000 participants. The ODI itself trialled the concept in practice with the UK Government’s Office for AI in April 2019, testing whether fiduciary stewardship could work as applied governance rather than theory alone. Separately, the University of Cambridge’s Data Trusts Initiative has examined data trusts as a mechanism for pooling individuals’ legal data rights into a single negotiating and stewardship entity.

    How does a data trust govern research data differently from repository deposit?

    Under the standard deposit model, a researcher or institution submits a dataset to a repository, which applies institutional policy and a licence to govern reuse — the repository itself owes no fiduciary duty to depositors. Under a data trust, an independent trustee holds ongoing decision-making authority over the pooled data and is legally obliged to act in the beneficiaries’ interests, not merely to apply a static licence at the point of deposit.

    This distinction matters most for sensitive, re-identifiable, or commercially valuable research data, where a one-off licence cannot anticipate every future access request. A trust structure allows collective, ongoing renegotiation of terms as new uses arise, rather than requiring each depositor to individually vet every downstream request.

    Feature Data trust Repository deposit
    Legal basis Formal trust or fiduciary agreement Institutional policy plus a data licence
    Decision-maker Independent trustee(s) with ongoing authority Depositor sets terms once, at submission
    Fiduciary duty Yes — legally binding to beneficiaries No — repository is a custodian, not a fiduciary
    Best suited to Sensitive, re-identifiable, or contested data Open, low-risk, citation-ready datasets

    Data sharing agreement vs data processing agreement: where does a data trust fit?

    A data sharing agreement sets out the terms under which two or more parties exchange data they each control, while a data processing agreement — required under UK GDPR Article 28 wherever a processor handles data on a controller’s behalf — fixes the narrower, instructed relationship between a data controller and a processor acting only on its instructions.

    A data trust does not replace either instrument; it changes who holds the authority to agree them. Rather than each institution separately negotiating a data sharing agreement for every new research collaboration, the trustee negotiates and monitors compliance centrally, on behalf of all contributors, reducing duplicated legal effort across a research consortium.

    What does a data trust mean for FAIR data stewardship?

    The FAIR Principles — Findable, Accessible, Interoperable, Reusable, formalised by Wilkinson and colleagues in Scientific Data in 2016 — govern how research data should be described and made available, but they do not specify who decides access terms. A data trust supplies exactly that missing governance layer.

    • Findability and interoperability metadata can still be maintained in a conventional repository even where the trust governs access rights.
    • Accessibility becomes a trustee decision rather than a fixed licence, allowing tiered or conditional access for sensitive datasets that would otherwise be withheld entirely.
    • Reusability is strengthened where beneficiaries trust the stewardship arrangement enough to contribute richer, less redacted data in the first place.

    Institutions bound by research data management policy obligations — including UKRI’s Common Principles on Data Policy — can treat a data trust as a compliance mechanism that satisfies funder access requirements without forcing full open deposit of sensitive material.

    Indigenous data sovereignty and the CARE Principles

    The Global Indigenous Data Alliance published the CARE Principles — Collective Benefit, Authority to Control, Responsibility, and Ethics — in 2019, explicitly to complement FAIR by centring people and purpose rather than data alone. CARE was developed in direct response to concerns that FAIR-only stewardship could enable extraction of Indigenous data without consent or benefit-sharing.

    A data trust structure is one of the few governance mechanisms that can operationalise CARE’s “Authority to Control” principle in practice: it gives a defined community, rather than a repository operator, the standing to appoint trustees and set binding terms. This is a genuinely distinct information-gain point rarely covered in generic data-trust explainers, most of which address corporate or civic data rather than research data sovereignty.

    Answer-first Q&A

    What is a data trust?

    A data trust is a legal and technical structure that manages data on behalf of contributors through an independent trustee. The trustee holds a fiduciary duty — impartiality, prudence, transparency, and undivided loyalty — to the people or organisations whose data is pooled, rather than to any single commercial interest.

    What is the data trust structure?

    The structure places data under the control of a board of trustees who owe a fiduciary responsibility to the beneficiaries. Terms of access, use, and onward sharing are set collectively and can be renegotiated over time, unlike a fixed licence attached to a single dataset at deposit.

    What is a public data trust?

    A public data trust is governed by community, government, or non-profit board members committed to widening access to data affecting a defined population. In a research setting, this model supports population studies, public-health cohorts, and civic datasets where public benefit and consent are central governance concerns.

    What is the role of a data trustee?

    A data trustee manages, protects, and ensures the integrity and appropriate use of pooled data. Trustees identify sensitivity and risk, approve or decline access requests, and enforce the trust’s terms — a standing, ongoing role rather than a one-time licensing decision made at the point of deposit.

    Implications and outlook for research administrators

    For research administrators, the practical implication is that data trusts are not a substitute for repository infrastructure — findability, persistent identifiers, and metadata still depend on conventional deposit systems. What a trust adds is a governance layer above the infrastructure, suited to consortium data, population cohorts, and datasets involving Indigenous or otherwise sovereignty-sensitive communities.

    Institutions weighing a data trust model should expect higher upfront legal cost than a standard repository licence, offset against lower recurring negotiation cost across a multi-year, multi-partner project. As FAIR-compliant infrastructure matures and CARE-aligned governance expectations grow, data trusts are likely to remain a minority but increasingly cited option for exactly the categories of research data — sensitive, collectively owned, or community-governed — that pure open deposit handles least well.

  • Research Data Management Policy: €10.2bn Case

    A research data management policy that treats FAIR compliance as a line-item cost, rather than a reuse and reputation asset, is the wrong accounting model. PwC estimated in a 2018 study for the European Commission that the absence of FAIR (Findable, Accessible, Interoperable, Reusable) research data costs the European economy at least €10.2 billion a year, largely through duplicated data collection and wasted researcher time. That figure is the strongest evidence available that under-investment in research data management (RDM) infrastructure is a false economy, not a saving.

    A research data management policy is an institutional document setting out the responsibilities of researchers and the institution for planning, storing, securing, sharing and preserving research data across its lifecycle. Most UK universities — Southampton, Birmingham, Manchester, Edinburgh and others — already publish one. The argument here is narrower and more contentious: most are drafted, funded and governed as compliance paperwork, when the evidence says they should be funded as reuse and reputation infrastructure.

    Why RDM policy gets treated as a cost centre

    Institutional budgets typically classify research data management as overhead: storage costs, repository subscriptions, a data steward’s salary, training time. Each appears as a debit with no offsetting credit line, because savings from avoided duplication and faster reuse accrue diffusely, across future researchers and grants, not to the budget holder who paid for the infrastructure.

    This accounting mismatch is compounded by how the data management plan (DMP) requirement is handled in practice. Most funders now mandate one, but research offices frequently treat it as a box-ticking exercise completed at proposal stage and never revisited, rather than a live operational document. That framing under-serves the researcher, who gets no practical reuse benefit, and the institution, which under-recovers the true cost of good RDM from grants that would pay for it.

    UK Research and Innovation (UKRI) explicitly states that costs associated with research data management — storage, curation, repository deposit — are eligible for recovery under its funding. Institutions treating RDM as unfunded overhead are frequently leaving recoverable grant money unclaimed rather than avoiding a cost.

    What the evidence actually says about FAIR and avoided cost

    The FAIR data principles were formalised in 2016 by Wilkinson et al. in Scientific Data as a guide for making digital assets Findable, Accessible, Interoperable and Reusable by both humans and machines. FAIR data is not a compliance checkbox; it is a design standard for making data usable by someone who was not present when it was collected.

    The clearest attributed cost estimate comes from PwC’s 2018 cost-benefit analysis for the European Commission, which put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven by researcher time lost searching for data, recreation of data that already exists, and lost interdisciplinary reuse. A separate, frequently cited illustration is the University of Minnesota’s decades-long diet study, whose original data nearly disappeared into storage before being recovered and reanalysed — a reminder that data loss is a recurring, avoidable event when retention and documentation are afterthoughts.

    Three mechanisms explain where the savings actually come from:

    • Avoided duplication. Findable, well-described data lets a second researcher build on an existing dataset instead of re-running a costly collection exercise.
    • Faster reuse cycles. Interoperable data in standard formats with persistent identifiers can be integrated into new analyses without reformatting or re-negotiating access.
    • Preserved institutional memory. Deposit in a certified repository protects data against the single most common loss vector: staff turnover and undocumented local storage.

    None of this shows up as a saving on a university’s annual accounts, which is precisely why RDM investment is chronically under-prioritised relative to its documented return.

    How funder compliance requirements are changing the calculus

    Funder mandates are steadily converting FAIR data from voluntary good practice into a hard compliance gate, which changes the institutional risk calculus even for leaders unconvinced by the reuse argument. UKRI’s Common Principles on Research Data, and the underlying Concordat on Open Research Data, require a data management plan for funded research and state that data should be made openly available with as few restrictions as necessary. Horizon Europe applies comparable requirements, and cOAlition S’s Plan S pushes the same expectations into journal-level open-access policy.

    A comparison of how three major funders frame the requirement illustrates the convergence:

    Funder / framework Core RDM requirement FAIR reference
    UKRI Data management plan for funded research; RDM costs eligible for recovery Endorses FAIR via the Concordat on Open Research Data
    Horizon Europe DMP required within six months of project start, updated across lifecycle “As open as possible, as closed as necessary,” explicitly FAIR-aligned
    cOAlition S (Plan S) Underlying data should accompany open-access publications References FAIR principles for supporting data

    Institutions that fund RDM only to the minimum needed for a single grant’s DMP template are exposed twice: to duplicated administrative cost when infrastructure is rebuilt project by project, and to compliance risk as funders move toward auditing DMP adherence rather than merely requiring its submission.

    The case for investing in data stewardship, not just policy text

    A policy document alone does not create FAIR data. That requires people: a data steward function — a dedicated role, a network of disciplinary data champions, or a research data service embedded in the library — able to advise researchers on repository choice, metadata standards and licensing at the point where those decisions are actually made, not after the fact.

    Institutions that fund this role tend to route researchers toward standards-based infrastructure rather than ad hoc local storage: a research data repository registered in re3data.org, ideally holding Core Trust Seal certification, with persistent identifiers (DOIs) and standard metadata attached to every deposit. This is the practical, unglamorous mechanism by which the €10.2 billion estimate above is actually avoided — not through a policy PDF, but through a person and a repository that make FAIR operational.

    CASRAI’s relevance here is provenance and interoperability, not ownership. CASRAI originated the CRediT contributor role taxonomy in 2014, now stewarded by NISO as ANSI/NISO Z39.104-2022 — the same underlying argument in a different domain: standardising who-did-what reduces duplicated verification effort just as standardising data description reduces duplicated data collection. Institutions weighing their research administration infrastructure should treat RDM policy, contributor attribution and open data reuse as one reputational and efficiency system, not separate obligations.

    Answer-first Q&A

    What is a research data management policy?

    A research data management policy is an institutional document defining responsibilities for planning, storing, securing, sharing, and archiving research data across its lifecycle. UK universities including Edinburgh and Manchester publish theirs publicly, typically requiring a data management plan at proposal stage and deposit in an approved repository after project completion.

    What are the FAIR data principles?

    The FAIR data principles — Findable, Accessible, Interoperable, Reusable — were published by Wilkinson et al. in 2016 in Scientific Data as guidance for making digital research assets usable by both humans and machines, through persistent identifiers, standard metadata, and clear licensing.

    Do UK and EU funders require a data management plan?

    Yes. UKRI requires a data management plan for funded research and treats RDM costs as eligible for recovery, while Horizon Europe requires a DMP within six months of project start under its “as open as possible, as closed as necessary” principle.

    How much does poor research data management actually cost?

    PwC’s 2018 analysis for the European Commission put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven primarily by duplicated data collection and researcher time lost searching for data that already exists elsewhere.

    Implications for institutional leaders

    The practical implication is a reframing exercise, not necessarily a large new budget line. Research offices should cost RDM infrastructure — repositories, data steward time, metadata training — against the funder-eligible recovery already available through DMP-linked grants, rather than absorbing it as unfunded overhead. Leaders reviewing their research data management policy should ask whether it funds a data steward with real authority over repository choice and metadata quality, or whether it is a document that satisfies a compliance checklist and stops there.

    The evidence — a €10.2 billion EU-wide cost estimate, UKRI’s funding eligibility for RDM costs, and Horizon Europe’s escalating DMP requirements — points one direction: institutions that keep treating FAIR compliance as a cost centre are choosing to keep paying the duplication tax FAIR data was designed to eliminate.

  • National Data Repository Mandates: UK, US, EU

    National data repository requirements now differ sharply by jurisdiction: the UK coordinates through UKRI’s Concordat on Open Research Data and a planned National Data Library, the US relies on agency-specific mandates such as the NIH Data Management and Sharing Policy layered on the OPEN Government Data Act, and the EU binds Horizon Europe funding to mandatory FAIR data management plans routed through the European Open Science Cloud. All three converge on the FAIR principles as the technical baseline, but they diverge sharply on enforcement, centralisation and what “as open as possible” means in practice.

    A national data repository is a government- or funder-endorsed infrastructure (or federated network of infrastructures) for depositing, curating and providing persistent access to datasets produced by publicly funded research, so that they meet the FAIR standard of being Findable, Accessible, Interoperable and Reusable. No single global rulebook defines what such a repository must look like — which is precisely why the UK, US and EU have built three structurally different systems around the same FAIR foundation.

    What counts as a national data repository?

    A national data repository is infrastructure, endorsed at government or funder level, that stores research datasets with persistent identifiers, standardised metadata and defined reuse licences. The FAIR data principles — first formalised in Scientific Data in 2016 — define the technical bar: data and metadata must be findable via persistent identifiers, accessible over open protocols, interoperable through shared vocabularies, and reusable under clear provenance and licensing.

    Crucially, FAIR does not mean unconditionally open. The dominant policy language across all three jurisdictions is some variant of “as open as possible, as closed as necessary” — datasets with legitimate privacy, security or intellectual-property constraints can remain FAIR while access to the raw data itself stays restricted, provided the metadata is still discoverable.

    How does the UK mandate research data repositories?

    The UK’s approach is coordinated centrally through UK Research and Innovation (UKRI) rather than fragmented across individual funders. The Concordat on Open Research Data, agreed by UK funders and sector bodies, sets the expectation that publicly funded research data should be made openly available with as few restrictions as possible, in a timely and responsible manner.

    UKRI has been developing a harmonised open research data policy to replace the varying requirements previously set by its individual research councils, with a more explicit alignment to FAIR principles than the original Concordat text. The UK does not run one single mandatory repository for all disciplines; instead it combines a cross-disciplinary resource — the UK Data Service, holding the country’s largest collection of economic, population and social research data — with discipline-specific data centres. A National Data Library initiative is also under development. Enforcement runs through grant conditions rather than statute.

    How does the US enforce data-sharing requirements?

    The US combines a government-wide legal baseline with agency-specific enforcement, producing a federated rather than centralised system. The OPEN Government Data Act codifies the principle that federal government data — including federally funded research outputs captured by agencies — should be open and machine-readable by default, operationalised through the Data.gov catalogue.

    The sharpest enforcement sits with individual funding agencies. Under the NIH Data Management and Sharing (DMS) Policy, effective since January 2023, NIH-funded researchers must submit a DMS Plan describing how scientific data will be managed and shared, with FAIR principles strongly encouraged. The National Science Foundation requires a Data Management Plan for all proposals and supports deposit through disciplinary repositories and its own NSF Public Access Repository (NSF-PAR). This gives communities flexibility to choose fitting repositories, at the cost of one unified national research-data repository.

    How does the EU mandate FAIR data through Horizon Europe?

    The EU operates the most centrally binding framework of the three. The Directive on open data and the re-use of public sector information requires member states to establish national policies for open access to publicly funded research data on an “open by default” basis, explicitly aligned with FAIR principles. For research funded under Horizon Europe, making data FAIR is a mandatory grant condition, not a recommendation: funded projects must produce a Data Management Plan and comply with FAIR requirements as a condition of the award, under the same “as open as possible, as closed as necessary” test used elsewhere.

    Infrastructure is built around the European Open Science Cloud (EOSC), described by the European Commission as a federated environment intended to become a “web of FAIR data and services” spanning all scientific disciplines. Within that federation, researchers commonly deposit through the general-purpose repository Zenodo — built and operated with CERN — while the Community Research and Development Information Service (CORDIS) serves as the EU’s public repository of record for funded project information.

    Where do the three approaches converge and diverge?

    All three jurisdictions treat FAIR as the technical baseline and all three qualify openness with a “necessary restriction” clause. The differences lie in enforcement mechanism, degree of centralisation, and whether a single flagship repository exists.

    Feature UK US EU
    Primary instrument UKRI Concordat on Open Research Data (evolving to a harmonised FAIR-explicit policy) OPEN Government Data Act; NIH DMS Policy; NSF Public Access Policy EU Open Data Directive; Horizon Europe grant conditions
    Legal basis Funder policy condition Federal statute plus agency policy Legally binding directive plus grant condition
    FAIR status Increasingly explicit in new UKRI policy Encouraged, embedded in agency plans Mandatory for Horizon Europe-funded projects
    Data management plan required Yes, for UKRI funding Yes, for NIH and NSF funding Yes, mandatory for Horizon Europe
    Repository model Centralised flagship (UK Data Service) plus disciplinary centres Federated (Data.gov, NSF-PAR, disciplinary repositories) Federated supranational (EOSC, Zenodo, CORDIS)

    Common questions on national data repository mandates

    What are the FAIR data principles required by UKRI?

    UKRI requires funded researchers to make outputs Findable, Accessible, Interoperable and Reusable, aligned with its Concordat on Open Research Data. UKRI councils frame this as maximising the impact, visibility and citation of research while applying the “as open as possible, as restricted as necessary” test to data with legitimate sensitivities.

    Does the NIH require a data management and sharing plan?

    Yes. Since 25 January 2023, the NIH Data Management and Sharing (DMS) Policy requires funded researchers to submit a DMS Plan describing how scientific data will be preserved and shared. NIH strongly encourages applying FAIR principles when selecting repositories and structuring metadata for that plan.

    Is FAIR data mandatory under Horizon Europe?

    Yes, unlike the UK’s evolving policy and the US’s encouraged-but-agency-specific approach, Horizon Europe makes FAIR data management a binding grant condition. Funded projects must submit a Data Management Plan and comply with FAIR requirements, subject to the same necessary-restriction exceptions used across all three jurisdictions.

    Is there one single national data repository researchers must use?

    No jurisdiction mandates a single universal repository. The UK combines a flagship service (UK Data Service) with disciplinary centres; the US runs a federated system across Data.gov and agency repositories such as NSF-PAR; the EU federates access through EOSC, Zenodo and CORDIS. Researchers typically choose the repository matching their discipline and funder requirements.

    What this means for institutions and researchers

    For research administrators managing multi-jurisdictional funding, a single data management plan template cannot satisfy all three regimes. Compliance teams must map deposit requirements per funder rather than assume FAIR-labelled data automatically meets every mandate’s specific repository, licensing and metadata conditions.

    The trend line points toward convergence. The UK’s move to a harmonised, more explicitly FAIR-aligned UKRI policy and the EU’s EOSC federation both signal a shift from fragmented rules toward unified infrastructure. The US remains the outlier: its federal open-data statute operates largely independently of agency-specific mandates from NIH and NSF.

    Institutions should treat “FAIR” and “open” as related but distinct compliance targets. A dataset can be fully FAIR — persistently identified, well-described, licensed — while remaining access-restricted for legitimate reasons in every jurisdiction covered here. Repository choice and data management plan content should be checked against the specific funder mandate, not a generic FAIR checklist.

  • Indigenous Data Sovereignty: Why FAIR Needs CARE

    Indigenous data sovereignty is the right of Indigenous peoples and nations to govern the collection, ownership, interpretation, and application of data about their own communities, lands, and knowledge. Blanket “open by default” research-data mandates built on the FAIR Data Principles can override that right when they treat findability and accessibility as unconditional. The fix is not to abandon FAIR, but to add a CARE-informed consent layer — tiered access controls, negotiated data-sharing agreements, and governance authority held by the originating community — that sits inside FAIR’s own accessibility principle rather than outside it.

    As funders push open-data compliance deeper into grant conditions, research offices increasingly reconcile a mandate to publish with a community’s right to say no, say later, or say “only under these conditions.”

    What is indigenous data sovereignty?

    Indigenous data sovereignty describes the inherent right of Indigenous peoples to govern data about their own communities, resources, and lands — a right that derives from tribal and national self-determination rather than from any single data-protection statute. The Global Indigenous Data Alliance (GIDA) traces the movement’s institutional roots to country-specific networks: the Aotearoa New Zealand-based Te Mana Raraunga (Māori Data Sovereignty Network, formed 2015), Australia’s Maiam nayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Collective (2017), Canada’s First Nations Information Governance Centre, and the US Indigenous Data Sovereignty Network.

    These networks converged on a shared position: data collected about Indigenous peoples should remain subject to the governance of the nation or community it describes — including tribal law — not solely the policies of the funder, institution, or repository that hosts it. This is a governance claim, not merely a privacy preference, and it applies whether the data in question is health records, environmental monitoring, ceremonial knowledge, or genomic samples.

    How do CARE principles relate to FAIR data principles?

    The CARE Principles for Indigenous Data Governance — Collective Benefit, Authority to Control, Responsibility, and Ethics — were developed specifically to sit alongside the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable), not to replace them. The Research Data Alliance’s International Indigenous Data Sovereignty Interest Group formalised CARE in 2019 to address what FAIR, on its own, does not: who benefits, who decides, and under what ethical obligations data circulates.

    Principle set Primary question it answers Governing focus
    FAIR (Findable, Accessible, Interoperable, Reusable) How usable is the data, technically? Data as an object
    CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) Who benefits, and who decides? Data as a relationship

    Framing these as rivals misreads FAIR’s own text. FAIR principle A1.2 explicitly states that the accessibility protocol must “allow for an authentication and authorisation procedure, where necessary” — meaning FAIR was never a synonym for unconditional open access. Data can be fully findable, with rich metadata, a persistent identifier, and a documented access route, while the underlying content sits behind a governed permission gate. That gap between “discoverable” and “downloadable” is precisely where a CARE-informed consent layer belongs.

    Do open data mandates override indigenous data sovereignty?

    Open data mandates do not automatically override Indigenous data sovereignty, but poorly designed ones can function that way in practice. Funder policies such as UKRI’s research data policy and cOAlition S’s Plan S commitments require data to be made available with “as open as possible, as restricted as necessary” language — a formulation that already anticipates legitimate restriction, yet is frequently implemented by institutions as a default push toward maximal openness.

    PLOS’s own editorial position, published in its EveryONE blog in October 2023, states plainly that Indigenous Data Sovereignty is the right of Indigenous peoples to own and govern data about their communities, resources, and lands — and that open-access publishing policies must accommodate, not override, that right through mechanisms such as data-access statements that explain restrictions rather than force disclosure. The Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Code of Ethics for Aboriginal and Torres Strait Islander Research similarly requires researcher agreements on data ownership, access, and storage to be negotiated with communities before collection begins, not retrofitted at publication.

    • Where mandates and sovereignty align: both frameworks require documented data-management plans, clear provenance, and persistent identifiers.
    • Where friction emerges: “open by default” clauses that treat non-disclosure as an exception requiring justification, rather than a governance decision requiring respect.
    • The resolvable middle: metadata and access statements can be fully open even when the underlying dataset is access-controlled.

    A consent layer is a set of governance and technical controls — inserted between data creation and data reuse — that lets a community set the terms under which its data is discovered, accessed, and re-used, without removing that data from the research record entirely. In practice this combines four elements research administrators already have tools for:

    1. Tiered metadata: a public, FAIR-compliant record (title, abstract, provenance, persistent identifier via DataCite or Crossref) that is fully findable even when the dataset itself is restricted.
    2. Governance-holder sign-off: a named Indigenous governance body (tribal council, iwi authority, data sovereignty collective) with authority to approve, condition, or decline each reuse request — not a one-time blanket consent captured at initial collection.
    3. A trusted research environment (TRE): a controlled-access computing environment where approved researchers can analyse restricted data without exporting raw records, satisfying reusability without unconditional distribution.
    4. Biocultural or Traditional Knowledge labels: machine-readable metadata tags (the Local Contexts initiative’s TK and BC Labels) that travel with a dataset to signal provenance, cultural protocols, and permitted uses wherever it is indexed or mirrored.

    None of these four elements block findability. They condition access — which is exactly what FAIR’s accessible principle already permits.

    Data sharing agreement vs data processing agreement — which applies?

    A data sharing agreement (DSA) and a data processing agreement (DPA) serve different legal functions, and conflating them is a common source of failure in Indigenous data governance. A DSA governs the transfer of data between two parties who each have independent authority over how it is subsequently used — the correct instrument for Indigenous data sovereignty, because it lets the originating community retain and exercise ongoing authority to control, per CARE’s second principle.

    A DPA, by contrast, is used when one party (a processor) handles data strictly on behalf of another (the controller) with no independent decision-making rights — the model built into contract templates under UK GDPR. Using a DPA where a DSA is required strips the originating community of ongoing authority.

    Instrument Who holds decision authority Fit for Indigenous data sovereignty
    Data Sharing Agreement (DSA) Both parties, independently Appropriate — preserves community authority to control
    Data Processing Agreement (DPA) Controller only; processor has none Inappropriate as a standalone instrument — reduces community to data subject

    Implications for research administrators

    Research data management (RDM) policy templates written purely around funder compliance checklists will systematically under-serve Indigenous data governance unless they build in a consent layer as a standard clause, not an exception process. Institutions should require, at the data-management-plan stage, an explicit question: does this dataset describe an Indigenous community, and if so, has a governance body with authority to control been identified and consulted before collection?

    Research data repositories that host Indigenous-derived datasets should support tiered access controls and TK/BC Label metadata natively, rather than treating restricted-access as a bespoke workaround bolted onto an open-by-default platform. Institutions building or procuring a trusted research environment for sensitive data should evaluate whether it can enforce community-set reuse conditions per dataset, not merely per project.

    Conclusion: consent is compatible with findability

    Indigenous data sovereignty and the FAIR Data Principles are not opposed frameworks competing for the same ground — FAIR governs how data is described and discovered, while CARE and a CARE-informed consent layer govern who decides what happens next. A research data management policy that hard-codes this distinction, uses the right agreement type for the right relationship, and gives Indigenous governance bodies a standing role rather than a one-off consultation, satisfies funder open-data requirements and Indigenous data sovereignty at the same time. The two are compatible by design; the mandates just need to stop assuming otherwise.

  • Research Data Management Policy: Not Just a DMP

    A research data management policy is an institution-wide governance document that sets ownership, retention, storage and researcher-responsibility rules for all research data an organisation produces — distinct from a data management plan (DMP), which is a project-specific document written for a single grant. Confusing the two leaves institutions with fragmented practice: strong per-grant DMPs but no consistent rule for what happens to data once a project, or a researcher, moves on.

    A research data management policy is the institutional framework; the DMP is one project’s implementation of it. This article sets out the structural difference and gives a template for writing the institutional-level document, covering ownership, retention tiers, storage classes and researcher obligations.

    What is a research data management policy?

    A research data management (RDM) policy is a formally approved institutional document — typically ratified by a university executive, senate or research committee — that defines how all research data created, collected or reused at that institution must be handled across its lifecycle: creation, active use, retention, sharing and disposal.

    Unlike guidance notes or web pages, a policy carries institutional authority: it assigns accountability, sets minimum retention periods, and states what happens by default when a researcher leaves or a grant closes. The UKRI Concordat on Open Research Data (2016, updated 2020), signed by UK Research and Innovation, Universities UK and the Wellcome Trust among others, sets out common principles — including that research data are a public good and that costs of good data management are legitimate, fundable research costs. Most UK institutional RDM policies, including those at Edinburgh, Southampton and Manchester, cite the Concordat directly as their basis.

    Research data management policy vs a data management plan

    The policy and the DMP operate at different scopes and answer different questions. The policy answers “what does this institution require of everyone, always?” The DMP answers “how will this specific project handle its specific data?” A DMP written for a UKRI or Horizon Europe grant should reference and comply with the institutional policy, not substitute for it.

    Dimension Institutional RDM policy Data management plan (DMP)
    Scope Whole institution, all research Single project or grant
    Author Research office, library, IT, governance committee Principal investigator / research team
    Trigger Approved once, reviewed periodically Written at proposal stage, revised through project life
    Contains Ownership defaults, retention minimums, storage tiers, roles Dataset types, volumes, specific repositories, embargo dates
    Enforcement Institutional compliance / disciplinary framework Funder compliance check at reporting/audit
    Review cycle Every 3-5 years (Edinburgh’s policy specifies five) Reviewed and updated within the life of one project

    A well-run institution needs both, in that order: the policy first, so every subsequent DMP inherits a consistent set of defaults — retention minimums, approved repositories, data protection procedures — rather than each research team inventing its own.

    Template structure for an institutional RDM policy

    Reviewing current UK institutional policies (Edinburgh, Southampton, Manchester, Birmingham, Cambridge) shows a consistent structural skeleton. A new or revised policy should include, in order:

    • Purpose and scope — why the policy exists, and which staff, students and data types it covers.
    • Definition of research data — the institution’s own working definition (the UKRI Concordat’s is a common starting point: digital or analogue information collected, observed or created to validate research findings).
    • Roles and responsibilities — who is the data owner by default (usually the institution), who is the data steward (usually the principal investigator), and what the research office, IT services and library each provide.
    • Data management planning requirement — a mandate that a DMP must exist for every funded (and, ideally, every unfunded) research project, and where that requirement sits relative to ethics approval.
    • Storage and security tiers — approved storage classes mapped to data sensitivity.
    • Retention and disposal — minimum retention period, and the trigger for review or deletion.
    • Sharing, access and FAIR compliance — the institution’s default position on open data, exceptions for confidentiality, and adherence to the FAIR principles (Findable, Accessible, Interoperable, Reusable), as defined by Wilkinson et al. in Scientific Data (2016).
    • Legal and ethical compliance — UK GDPR and Data Protection Act 2018 obligations for personal data, plus any sector-specific requirements.
    • Review cycle and ownership of the policy itself — who revises it and how often.

    This ordering matters: policies that lead with storage and IT detail before establishing roles tend to read as IT documents rather than governance ones, which weakens researcher buy-in.

    Retention, ownership and storage tiers

    Retention should be set as a minimum, not a target. A commonly cited UK baseline is three years from project end or publication, with the caveat that funder, sponsor or disciplinary requirements specifying longer periods take precedence — clinical and health-related data, for example, routinely requires 10-15 year retention under separate regulatory regimes.

    Ownership defaults matter because researchers move institutions far more often than data does. Most UK institutional policies assign underlying ownership of research data to the institution as the legal entity that employed the researcher and typically held the grant, while the principal investigator retains stewardship responsibility — the practical duty of care — during and after the project. This split must be stated explicitly, not left implicit, because it is the clause institutions rely on when a departing researcher wants to take data with them.

    Storage tiers should be mapped to data sensitivity rather than treated as one undifferentiated pool. A workable minimum is three tiers:

    • Tier 1 — open/shareable: deposited in a Re3data-listed, CoreTrustSeal-certified repository with a DOI via DataCite.
    • Tier 2 — restricted/sensitive: access-controlled institutional storage under a data sharing agreement.
    • Tier 3 — confidential/personal: encrypted storage meeting UK GDPR requirements, with a Data Protection Impact Assessment on file.

    Researcher obligations and governance roles

    The policy should state researcher obligations as directives, not suggestions. At minimum, researchers are required to: complete a DMP before data collection begins; store active data only in institutionally approved systems; register externally held datasets with the institution; and provide a data access statement or citation in any publication when the underlying data are not directly deposited.

    Governance sits across three functions the policy must name individually: the research office (grant compliance, costing RDM into proposals — UKRI states that RDM costs are eligible under its funding), IT services (approved storage infrastructure and security), and the library or research data service (repository operation, metadata standards, researcher training). ARMA and INORMS provide sector benchmarking for how these research administration roles are typically distributed across institutions.

    Common questions

    What is the difference between a research data management policy and a data management plan?

    A research data management policy is an institution-wide governance document setting defaults for ownership, retention and storage. A data management plan is a project-specific document, usually required by a funder at proposal stage, that details how one project’s data will be collected, stored and shared within those institutional defaults.

    Who is responsible for research data management at an institution?

    Responsibility is shared but must be explicitly assigned. The principal investigator is typically the data steward for a given project; the institution holds underlying ownership; and the research office, IT services and library provide the supporting infrastructure, costing advice and repository services the policy commits to.

    How long should institutions retain research data?

    Most UK institutional policies set a minimum retention period of three years from project end or publication, deferring to longer funder-, sponsor- or discipline-specific requirements where they apply — for example, clinical research data typically requires substantially longer retention under separate regulatory regimes.

    What does FAIR data mean in a research data management policy?

    FAIR stands for Findable, Accessible, Interoperable and Reusable — principles defined by Wilkinson et al. (2016) that a policy should require researchers to apply when depositing data, typically through persistent identifiers, standard metadata and appropriate licensing. See the CASRAI research data dictionary for related term definitions.

    Implications for research administrators

    Institutions that only mandate DMPs at grant stage, without an underlying institutional policy, end up with inconsistent retention practice, ambiguous ownership when staff leave, and duplicated storage costs across departments running incompatible systems. Writing the institutional policy first — using the structure above — gives every subsequent DMP a consistent, auditable baseline, and gives research offices a defensible answer when a funder, ethics committee, or departing researcher asks who owns what and for how long.

    As RDM costs are increasingly built into grants and UK institutions face growing FOI and audit scrutiny of data retention, the institutional policy is the operational backbone that per-project DMPs are supposed to inherit from, not replace.