Category: Guides & Explainers

Practical how-to guides, templates, checklists, and career pathways for research administrators, authors, and institutional teams.

  • Data Sharing Agreement vs Data Processing Agreement: What Research Offices Get Wrong

    A data sharing agreement governs an exchange of personal data between two or more independent data controllers, while a data processing agreement is the contract that Article 28 UK GDPR makes mandatory whenever a controller instructs a processor to handle data on its behalf. Research offices most often need the former for multi-institution collaborations and the latter for any third-party processor, such as a survey platform, transcription service, or cloud host.

    A data sharing agreement is a contract between two or more data controllers who each independently decide how they will use a shared dataset. A data processing agreement is the contract GDPR Article 28 requires whenever a controller engages a processor that acts only on documented instructions and has no independent decision-making power over the data.

    What Is the Difference Between a Data Sharing Agreement and a Data Processing Agreement?

    The confusion research offices run into is structural, not semantic. A data sharing agreement vs data processing agreement question always comes down to one fact: who controls the data, and who merely acts on it. A data sharing agreement documents a controller-to-controller relationship. A data processing agreement documents a controller-to-processor relationship. Everything else — what clauses are mandatory, what liability attaches, whether the ICO expects to see it — follows from that single distinction.

    Two universities pooling anonymised cohort data for a joint publication are both controllers; they need a data sharing agreement. A university engaging a transcription service to convert interview recordings is the controller, and the vendor is the processor; that relationship needs a data processing agreement. The two documents are not interchangeable.

    Feature Data Sharing Agreement (DSA) Data Processing Agreement (DPA)
    Relationship Controller to controller Controller to processor
    Decision-making Each party decides its own purposes and means Processor acts only on the controller’s documented instructions
    Legal mandate Not mandatory in itself, but the ICO’s statutory code treats it as expected good practice Mandatory under Article 28 UK GDPR whenever a processor is engaged
    Typical research use Multi-institution consortia, joint publications, shared registries Survey platforms, transcription services, cloud hosting, statistical consultancies
    Governing source ICO Data Sharing Code of Practice (statutory, under s.121 Data Protection Act 2018) Article 28, UK GDPR / EU GDPR

    When Is a Data Sharing Agreement Required for a Research Collaboration?

    A data sharing agreement becomes necessary the moment two or more organisations — for example, two universities, a university and an NHS trust, or a university and an industry partner — each intend to use a shared dataset for their own research purposes. Under the ICO’s Data Sharing Code of Practice, a statutory code issued under section 121 of the Data Protection Act 2018, a formal agreement is not an absolute legal requirement, but the ICO expects one wherever routine or systematic sharing occurs between controllers, treating it as evidence of accountability under the UK GDPR.

    In practice, most collaborative research grants involving identifiable participant data — clinical cohorts, survey respondents, student records — should have a data sharing agreement in place before data changes hands, regardless of whether the grant terms mention one.

    When Is a Data Processing Agreement Legally Mandatory?

    Unlike a data sharing agreement, a data processing agreement is not discretionary. Article 28 of the UK GDPR requires a written contract wherever a controller uses a processor, and that extends down the chain: if a processor sub-contracts further, another written agreement is needed there too. For a research office, this covers any external service handling personal data on the institution’s instructions without deciding why or how it is used — a data-collection tool, a statistical analysis contractor, or a transcription vendor.

    A data processing agreement must specify the subject matter, duration, and purpose of processing, the categories of data and data subjects involved, each party’s rights and obligations, and the security and breach-notification terms the processor must meet. Missing any of these terms is a compliance gap, not a drafting preference.

    Where Do Joint Controller Arrangements Fit?

    The case research offices most commonly mishandle is not DSA-versus-DPA at all — it is where two institutions jointly determine the purposes and means of processing one dataset, rather than each independently using their own copy. That relationship is governed by Article 26 UK GDPR, which requires a joint controller arrangement setting out each party’s responsibilities, particularly around data subject rights, and requires that the “essence” of that arrangement be made available to data subjects.

    This distinction matters for consortium research funded through instruments such as Horizon Europe, where the Model Consortium Agreement typically sits alongside — not instead of — any joint-controller documentation for personal data. UKRI-funded projects carry a parallel obligation: an approved data management plan is a standard grant condition, but it is a research-governance document, not a substitute for the GDPR-compliant contract.

    A data sharing agreement and a joint controller agreement are frequently confused because both involve multiple controllers. The dividing line is independence of purpose: if each party uses the data for its own separate research question, a data sharing agreement applies; if the parties jointly decide the purpose and means of one processing activity, Article 26 applies instead.

    Frequently Asked Questions

    What Is the Difference Between a DPA and a DSA?

    A DPA (data processing agreement) governs a controller-to-processor relationship and is mandatory under Article 28 UK GDPR. A DSA (data sharing agreement) governs a controller-to-controller relationship and is not strictly mandatory, but is expected best practice under the ICO’s statutory code wherever personal data moves between independent organisations.

    Is a DPA the Same as an NDA?

    No. A data processing agreement specifically governs how personal data is processed under GDPR, including security measures and sub-processor rules. An NDA (non-disclosure agreement) protects confidential information generally — trade secrets, unpublished results, commercial terms — and carries no GDPR obligations of its own. Research collaborations frequently need both, for different purposes.

    Does the UK Use GDPR or DPA?

    Both, and the shared acronym is itself a source of confusion. The UK operates the UK GDPR alongside the Data Protection Act 2018 (DPA 2018), which supplements it domestically. Research offices should note that “DPA” means something different in each context: the Data Protection Act 2018 is UK legislation, while a data processing agreement is a specific contract required under that legislation’s GDPR framework.

    What Is the Difference Between a DPA and a Data Sharing Agreement?

    The same core distinction applies: a data processing agreement binds a processor acting on a controller’s instructions, while a data sharing agreement binds two or more controllers each pursuing their own purposes. Signing the wrong one leaves a research office either over-contracting a simple vendor relationship or under-documenting a genuine controller-to-controller data exchange.

    A Decision Checklist for Research Offices

    Before drafting either document, a research office should establish:

    • Is the other party deciding independently how to use the data, or only following our instructions? Independent use points to a data sharing agreement; instruction-only use points to a data processing agreement.
    • Are two or more institutions jointly deciding the purpose and means of a single processing activity? If so, Article 26 UK GDPR joint controller terms apply, not a standard data sharing agreement.
    • Does the collaboration involve a funder-mandated data management plan (for example, under UKRI or Horizon Europe terms)? A data management plan complements but does not replace the GDPR-compliant contract.
    • Is any processor sub-contracting further processing? Each link in that chain needs its own written data processing agreement under Article 28.
    • Does the exchange involve special category data — health records, genetic data, criminal offence data? These generally raise the bar for documented lawful basis and security terms in either agreement type.

    The Bottom Line for Research Administration

    Research offices that treat data sharing agreements and data processing agreements as interchangeable paperwork expose their institutions to two distinct risks: an unenforced Article 28 obligation with a processor, or an undocumented controller-to-controller exchange the ICO’s statutory code expects to see evidenced. Getting the classification right — controller-to-controller, controller-to-processor, or joint controller — determines which contract is legally required, which is merely good practice, and what each must contain. As multi-institution, multi-funder consortia become the norm, that classification step belongs at the front of every research office’s data governance workflow, alongside the project’s data management plan.

  • Data Papers Explained: Making Datasets Citable

    A data paper is a peer-reviewed journal article whose sole purpose is to describe a dataset — its collection methods, structure, quality controls and reuse potential — so the dataset itself becomes a citable, discoverable research output. This is fundamentally different from a data availability statement (DAS), which is only a short paragraph inside a conventional research article pointing to where supporting data can be found. Understanding the distinction matters for anyone trying to get formal academic credit for data curation work, rather than a passing mention buried in someone else’s paper.

    A data paper is best defined this way: it is a searchable, citable metadata document, published as a standalone peer-reviewed article, whose primary content is the dataset’s provenance, structure and quality rather than a hypothesis or a set of conclusions.

    What is a data paper?

    A data paper is a peer-reviewed document describing a dataset, published in a peer-reviewed journal rather than as an appendix to a conventional study. It concentrates on the “what, why and how” of the data itself — collection methodology, processing steps, structure and known limitations — rather than on testing a hypothesis.

    The format is also known as a data article, data report, data brief or data note, but the function is consistent: it converts curation effort into an indexed, citable scholarly output that gives dataset creators formal academic credit.

    How is a data paper different from a data availability statement?

    A data availability statement is a short, mandatory paragraph within a conventional research article that tells readers where and how to access the data underpinning that paper’s findings. It exists to support transparency and reproducibility of one specific study — it is not a publication in its own right and it is not independently peer reviewed as a scholarly document.

    A data paper, by contrast, is a full standalone publication. It undergoes its own peer review, receives its own DOI, and is indexed and cited independently of any related research article. The table below sets out the practical differences.

    Feature Data paper Data availability statement
    Nature Standalone, peer-reviewed journal article A short section inside another article
    Peer review Independently peer reviewed as a scholarly work Not separately reviewed
    Citability Has its own DOI and citation record Not citable as a discrete work
    Purpose Describe and credit a dataset in depth Point readers to where data for one study lives
    Typical length Several pages, structured like a journal article One to three sentences

    Since 2018, the International Committee of Medical Journal Editors (ICMJE) has required a data sharing statement in reports of clinical trials, and many funders, including UKRI, expect a data access statement in any grant output. Neither requirement is a substitute for a data paper: a DAS satisfies a transparency mandate, while a data paper is the route to scholarly recognition and independent citation of the dataset itself.

    Which journals publish data papers?

    Dedicated data journals have grown substantially since the mid-2010s. According to the Global Biodiversity Information Facility (GBIF), which tracks outlets accepting data papers, article processing charges and impact metrics vary widely by publisher.

    • Scientific Data (Nature Portfolio) — an open-access, online-only journal dedicated to descriptions of scientifically valuable datasets, with a 2024 Journal Impact Factor of 6.9 and an article processing charge of approximately EUR 1,790, per GBIF’s June 2026 tracked figures.
    • Data in Brief (Elsevier) — a multidisciplinary, open-access journal publishing short data articles that describe and give context to datasets, with a 2024 Journal Impact Factor of 1.4 and an article processing charge of approximately USD 1,010.
    • GigaByte (BGI and Oxford University Press) — a CC BY open-access journal for “big data” descriptions across the life, biomedical and environmental sciences, with a 2024 Journal Impact Factor of 1.2, a Scopus CiteScore of 3.2, and an article processing charge of approximately USD 350 — the lowest of the three.

    Discipline-specific alternatives exist too: Earth System Science Data (Copernicus) carries a 2024 CiteScore of 20.6, and Biodiversity Data Journal (Pensoft) charges from around EUR 650. Choice of outlet should follow disciplinary norms, not price alone.

    How do you publish a data paper?

    Publishing a data paper follows a broadly consistent workflow across data journals:

    1. Deposit the dataset first. Upload the data to a recognised repository (for example Dryad, Zenodo or a domain-specific archive) so it receives a persistent identifier before the manuscript is submitted.
    2. Draft the manuscript around the metadata. Describe collection methods, instrumentation, processing pipelines, quality-control steps and known limitations — some tools, such as GBIF’s Integrated Publishing Toolkit, can auto-generate a manuscript draft directly from dataset metadata.
    3. Select a journal matched to the dataset’s discipline. Compare scope, licence terms, and article processing charge against outlets such as Scientific Data, Data in Brief or GigaByte.
    4. Submit for peer review. Reviewers assess the completeness and reusability of the description, not novel findings or conclusions.
    5. Publish and cross-link. On acceptance, the data paper’s DOI should be cross-referenced with the dataset’s own DOI in the repository record, so citation tools can connect the two.

    Why do data papers matter for FAIR data and citation?

    The FAIR Guiding Principles — Findable, Accessible, Interoperable, Reusable — were formalised by Wilkinson and colleagues in a 2016 Scientific Data paper and now underpin funder and repository policy internationally. A data paper operationalises FAIR by attaching a structured, human- and machine-readable description to a dataset that would otherwise carry only minimal repository metadata.

    Dataset citation is governed by the Joint Declaration of Data Citation Principles, published by FORCE11 in 2014, which holds that data merits the same importance, persistence and formal citation treatment as literature. Registration agencies such as DataCite assign the DOIs that make this mechanically possible; a data paper gives readers the narrative context a bare DOI record cannot.

    Frequently asked questions

    What is a data paper?

    A data paper is a peer-reviewed journal article whose primary purpose is describing a dataset’s collection, structure and quality, rather than reporting findings. It gives dataset creators an indexed, independently citable scholarly output.

    How to publish a data paper?

    Deposit the dataset in a recognised repository, draft a manuscript describing its methodology, choose a journal such as Scientific Data, Data in Brief or GigaByte, then submit for peer review that assesses completeness rather than novel conclusions.

    Do you have to pay to publish a data paper?

    Most data journals are open access and charge an article processing charge, ranging from roughly USD 350 at GigaByte to around EUR 1,790 at Scientific Data. Some outlets, including several Pensoft and Copernicus titles, waive or reduce this fee.

    Implications for institutions and funders

    For research administrators, the data paper format offers a concrete way to evidence data-curation effort in tenure, promotion and grant-reporting processes, where a bare data availability statement provides none. Recording named contributions to data creation, curation and description alongside the CRediT contributor role taxonomy gives institutions a fuller, auditable account of who did the data work, distinct from who wrote up the findings.

    Funders increasingly expect both: a data availability statement in the primary research article to satisfy transparency mandates, and — where a dataset has independent reuse value — a data paper to secure its long-term discoverability. Research administrators managing compliance across these overlapping requirements may find it useful to consult a dictionary of research administration terms when mapping funder policy language to practical author guidance.

    Conclusion

    A data paper and a data availability statement solve different problems: one creates a citable, peer-reviewed scholarly record of a dataset; the other simply discloses where supporting data for a specific study can be found. As funders tighten open-data expectations and repositories mature their DOI infrastructure, treating dataset description as a first-class, citable publication — not an afterthought bolted onto a results paper — will matter more, not less, for institutions seeking to demonstrate the full value of the research data they steward.

  • Research Data Manager Job Description, Skills and Career Path

    A research data manager plans, organises and safeguards the data a research project produces — from collection through documentation, storage, sharing and long-term archiving — and is distinct from a data steward (governance-focused) or a research administrator (grants and compliance-focused). The role sits at the intersection of research support, information management and IT, typically inside a university’s library, research office or a funded project team.

    This guide sets out the research data manager job description, the skills and qualifications employers ask for, how the role differs from adjacent titles, and the realistic career path from entry-level data support through to strategic data leadership.

    What is a research data manager?

    A research data manager is the named individual responsible for a project’s or department’s data management plan, metadata standards and repository deposits. The role exists because funders increasingly require a documented, reusable dataset alongside every publication, not just the paper itself.

    The task is not new — it maps closely to the Data Curation contributor role in the CRediT taxonomy, defined as “management activity to annotate, scrub data and maintain research data for initial use and later re-use.” CASRAI originated the CRediT contributor role taxonomy in 2014; the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022, and Data Curation remains one of its 14 defined roles — evidence that the function research data managers perform has been formally recognised in scholarly attribution for over a decade.

    What does a research data manager do day to day?

    Day-to-day work centres on making a project’s data findable, well-documented and safely stored, then repeatable for the next study. Typical duties, drawn from published UK university and NHS job descriptions, include:

    • Drafting and reviewing data management plans (DMPs) for grant applications
    • Setting up and maintaining databases, spreadsheets and case report forms for a study
    • Applying metadata standards so datasets are discoverable in institutional or subject repositories
    • Coordinating deposit of datasets with DataCite-registered DOIs for citation and reuse
    • Running data quality checks, version control and access permissions across a research team
    • Training researchers and doctoral students in good data management practice
    • Advising on compliance with funder data policies and data protection legislation

    Research data manager vs data steward vs research administrator

    These three titles are frequently confused in job adverts because responsibilities overlap, but their primary focus and reporting line differ. The table below distinguishes the three roles as they typically appear in UK higher education and research institutions.

    Dimension Research Data Manager Data Steward Research Administrator
    Primary focus Lifecycle management of a specific project’s or department’s datasets Institution-wide data governance, quality rules and ownership policy Grant administration, compliance and researcher support
    Typical base Research office, library or funded project team IT services, information governance or central data office Research office, faculty or funder-facing team
    Core output Data management plans, metadata, repository deposits Data policies, classification schemes, access controls Grant applications, contracts, financial and ethics reporting
    Professional body Often affiliated with library/data-curation networks Information governance and data protection networks ARMA (UK/Ireland), EARMA, INORMS, NCURA
    Typical entry route Data science, library/information studies, life sciences degree IT governance, information management background Any discipline plus research administration training

    What skills, qualifications and training are required?

    Employers combine technical data skills with domain and communication skills, since the role requires translating funder and disciplinary requirements into practical workflows researchers will actually follow.

    • Data handling: spreadsheet and database competence; SQL, Python or R are increasingly listed as desirable
    • Standards knowledge: metadata schemas, DataCite, ORCID identifiers, and repository deposit workflows
    • Policy literacy: UK GDPR, funder data policies, and institutional research governance frameworks
    • Communication: training researchers, writing plain-English guidance, negotiating with study sponsors
    • Project management: running parallel studies to funder deadlines with limited resource

    Formal training routes include postgraduate qualifications in library and information science or data science, plus shorter dedicated courses. The Digital Curation Centre (DCC), funded by Jisc, has provided UK universities with research data management guidance and training resources since 2004 and remains the primary UK reference point for RDM practice. Institutional RDM obligations trace back to funder policy: EPSRC’s research data expectations, effective from 1 May 2015, require UK institutions receiving its funding to publish a research data management policy and a roadmap for compliance. The 2016 Concordat on Open Research Data — jointly published by Research Councils UK, Universities UK, Wellcome Trust and HEFCE — set out ten principles establishing that data management planning should be integral to research design, reinforcing why institutions now hire dedicated staff for this function rather than leaving it to individual researchers.

    What is the typical career path and salary range?

    Entry typically begins in a data assistant or data curator post supporting a research team’s day-to-day data handling, often on a fixed-term contract tied to a specific study. Real UK job postings illustrate the entry tier clearly: an NHS Research Data Manager post advertised in May 2025 by Midlands Partnership NHS Foundation Trust was graded at Agenda for Change Band 4, with a salary of £26,530 to £29,114 a year.

    Progression moves through Research Data Manager (owning DMPs and repository workflows for a department or portfolio of studies) to Senior/Lead Research Data Manager, where the postholder sets institutional RDM policy and may supervise a small team. The most senior tier — Director of Research Data Services or equivalent — sets strategic direction for an institution’s entire research data infrastructure and reports into the research office or library leadership. Unlike research administration, a PhD is not a standard requirement at any tier, though it is common among staff who progress from a research role into data management.

    Common questions about the role

    What are the responsibilities of a data manager?

    A data manager is responsible for the entire data lifecycle: collection, quality control, storage, security, documentation and eventual archiving or disposal. In a research context this extends to writing data management plans, applying metadata standards, and coordinating repository deposit so datasets remain reusable after a project ends.

    What does a research data manager do?

    A research data manager develops and implements the policies, workflows and documentation that keep a project’s or department’s datasets organised, secure and discoverable. Duties include drafting data management plans, training researchers, running quality checks, and depositing data with persistent identifiers such as DataCite DOIs for citation and reuse.

    What is the salary of a data manager?

    Salaries vary widely by sector and seniority. A UK NHS-graded entry-level research data manager post advertised in 2025 sat at Agenda for Change Band 4, paying £26,530–£29,114 a year; senior and director-level research data roles in universities and industry command substantially higher salaries, reflecting added strategic and line-management responsibility.

    What are the 4 types of research data?

    Research data is commonly grouped into primary data (collected directly for the study), secondary data (reused from existing sources), and quantitative versus qualitative data by format. A research data manager must apply appropriate metadata, storage and sharing rules to each type, since funder and ethical requirements differ across them.

    What this means for institutions and job seekers

    For institutions, the job description confusion between research data manager, data steward and research administrator is itself a risk: unclear scoping leads to duplicated effort or gaps in funder compliance. Writing role descriptions that reference recognised frameworks — the CRediT Data Curation role, DCC guidance, and funder RDM policy — gives hiring managers a defensible, standards-aligned specification rather than an ad hoc list of duties.

    For job seekers, the clearest differentiator to lead with on an application is lifecycle ownership of data, not general IT or administrative competence. As funders continue tightening open-data mandates, demand for staff who can demonstrate metadata standards knowledge, repository deposit experience and DMP authorship is likely to keep outpacing supply, making this one of the more durable specialisms within the broader research administration and support ecosystem.

    For related roles and standards context, see CASRAI’s CRediT contributor roles hub, the research administration dictionary, and the research administration pillar.

  • ADR UK Explained: Administrative Data Access for Social Scientists

    ADR UK (Administrative Data Research UK) is a UK-wide partnership that gives accredited researchers secure access to de-identified, linked government administrative data — held not in a conventional downloadable repository, but inside supervised Trusted Research Environments (TREs). For social scientists, this matters because it is a distinct access route: the data never leaves government custody, and the researcher, not the dataset, is what gets vetted and admitted.

    ADR UK is a partnership of four national bodies — ADR England, ADR Scotland, ADR Wales and ADR Northern Ireland — together with the Office for National Statistics (ONS), coordinated by a UK-wide Strategic Hub and funded by the Economic and Social Research Council (ESRC), part of UK Research and Innovation (UKRI).

    What is ADR UK?

    ADR UK is the mechanism by which public sector administrative data — records originally collected for tax, benefits, education, health or justice administration, not for research — is linked, de-identified and made available for social science research in the public interest. It commissions flagship linked datasets, funds research using them, and maintains a public data catalogue describing what is available and to whom.

    The partnership operates under the Digital Economy Act 2017, which created the legal gateway allowing UK government bodies to share de-identified data with accredited researchers for statistical research purposes. This is the statutory basis that distinguishes ADR UK access from a voluntary data-sharing agreement between two universities.

    How does ADR UK access differ from conventional repository deposit?

    Most research data infrastructure — repositories, DataCite-indexed archives, institutional data stores — is built around deposit and download: a dataset is prepared, described with metadata, and released for reuse under a licence. ADR UK’s model inverts this. The data is never released to the researcher’s own machine; instead, the researcher is admitted into a controlled environment where the data already resides.

    This is best understood as “FAIR-adjacent” rather than FAIR-compliant in the open-repository sense: the data is findable (via the catalogue) and, under approval, accessible, but interoperability and reusability are deliberately constrained by design, because the underlying records are personal and sensitive at source. The table below maps the three routes UK researchers commonly encounter.

    Route Access model Typical data Governing framework
    ADR UK Supervised Trusted Research Environment (TRE); no download Linked cross-government administrative data (education, benefits, justice, tax) Digital Economy Act 2017; Five Safes
    NHS Secure Data Environments Supervised SDE; “dissemination by exception” NHS health and social care records NHS England’s 2022 Secure Data Environment policy
    UK Data Service Deposit/download under end-user licence Social surveys, census, cross-national socioeconomic data ESRC-funded repository terms

    The practical consequence for a social scientist: an application to ADR UK is an application for supervised admission to a workspace, not a request for a file transfer.

    What is the Five Safes model and what is a Trusted Research Environment?

    ADR UK access is governed by the Five Safes model, a risk-management framework originally developed by the ONS and now used across UK administrative data infrastructure, including NHS Secure Data Environments. It manages disclosure risk across five dimensions rather than relying on a single control.

    • Safe people — only accredited, trained researchers gain access.
    • Safe projects — proposals are approved for public benefit and ethical soundness.
    • Safe data — records are de-identified before linkage.
    • Safe settings — analysis happens only inside a Trusted Research Environment, a monitored, non-internet-connected computing environment.
    • Safe outputs — every result is disclosure-checked before it can leave the TRE.

    Each of the four UK nations operates its own TRE, accessed in person at a designated safe location or via a secure remote connection, using approved statistical software such as R, Python, SPSS or Stata.

    Who is eligible, and how does accreditation work?

    Eligibility runs through the researcher, not the institution. Under the Digital Economy Act 2017 accreditation process, an applicant must complete Safe Researcher Training and pass an assessment before an accreditation panel will approve them; this status is valid for five years. Accreditation alone does not grant data access — a specific research project must then be separately approved against public-benefit, feasibility and ethics criteria before a TRE account is issued.

    For institutions supporting early-career or interdisciplinary social scientists, this two-stage gate (accredit the person, then approve the project) is the single most common point of delay administrators should plan for, since neither step can be skipped or run in parallel with data linkage preparation.

    How is ADR UK funded and governed?

    ADR UK began as an ESRC investment running from July 2018. In September 2020, UKRI, the Department for Business, Energy and Industrial Strategy and HM Treasury approved £15.3 million for the 2021/22 financial year — the first year of a planned five-year investment. In September 2021, the remaining £90.12 million of that investment was secured from UK government to extend the programme to March 2026. In July 2025, UKRI confirmed a further £168 million investment to continue the programme beyond 2026, securing its next phase.

    Governance sits with the UK-wide Strategic Hub, which coordinates the four national partnerships, engages with government departments to secure data access agreements, and administers the dedicated research grant fund — distinct from the accreditation function, which remains with the statutory panel under the Digital Economy Act 2017.

    Frequently asked questions

    Is ADR UK the same thing as “alternative dispute resolution”?

    No. ADR UK in a research-administration context refers exclusively to Administrative Data Research UK, the government-data access partnership described here. “ADR” also commonly abbreviates alternative dispute resolution in a legal context — an unrelated field covering mediation and arbitration — and searchers should check context before assuming which meaning applies.

    What kind of data does ADR UK provide access to?

    ADR UK provides access to linked, de-identified administrative data generated by government departments — including education records, benefits and employment data, and justice-system data — rather than data collected specifically for research, such as surveys. Its public data catalogue and flagship datasets list what is currently available to accredited researchers.

    Is ADR UK data FAIR or open access?

    ADR UK data is not open access and is only FAIR-adjacent: it is findable through the catalogue and accessible to accredited, approved researchers, but it cannot be freely downloaded, reused or redistributed, because the source records are personal and disclosive. Outputs, not raw data, are what eventually leave the Trusted Research Environment.

    How long does the ADR UK access process take?

    Timelines vary, but researchers should expect two sequential approval stages: Safe Researcher Training and accreditation first, then a separate project-specific approval before a Trusted Research Environment account is issued. Institutions should budget for both stages when planning grant timelines, since data linkage itself begins only after project approval.

    What this means for research administrators and institutions

    For institutions supporting quantitative social science, ADR UK access is a compliance and planning question as much as a technical one. Research offices should treat Safe Researcher Training and accreditation as a standing institutional capability — something built into PhD and postdoctoral training pipelines — rather than a one-off hurdle discovered mid-grant. Because accreditation is personal and portable across five years, institutions that pre-accredit staff gain a durable advantage in bidding for ADR UK-linked funding calls.

    The broader signal is that “FAIR-adjacent” access, governed by statute and a risk framework rather than a licence, is becoming a parallel track alongside conventional repository deposit — one that other data-holding sectors, including health, are converging on through NHS Secure Data Environments. Research administrators who understand both tracks are better placed to route projects to the correct infrastructure the first time.

  • UK Data Service vs ICPSR: Choosing an Archive

    The UK Data Service and ICPSR are the two largest social-science data archives in the English-speaking research world, and the right choice usually depends on jurisdiction and funder mandate rather than feature parity. The UK Data Service is the ESRC-funded national repository for UK social, economic and population data, while ICPSR is a US-based, membership-funded consortium archive at the University of Michigan. Researchers outside the biomedical repository ecosystem — where PubMed-linked mandates dominate — need to weigh deposit workflow, restricted-access tiers and citation practice before picking either as a home for a dataset.

    The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom, formed in October 2012 when the Economic and Social Research Council (ESRC) consolidated the UK Data Archive — established at the University of Essex in 1967 — with several university partners. ICPSR, by contrast, is a membership consortium of academic and research institutions that has archived social and behavioural science data since 1962. Both are listed in re3data.org, the global Registry of Research Data Repositories, and both hold CoreTrustSeal certification for trustworthy digital repositories.

    What Are the UK Data Service and ICPSR?

    The UK Data Service is a national data repository funded through UKRI’s Economic and Social Research Council (ESRC) and led by the UK Data Archive at the University of Essex, in partnership with the University of Manchester, Jisc, EDINA and University College London. It holds more than 6,000 datasets, including UK Census data, the Labour Force Survey, the Millennium Cohort Study and cross-national surveys such as the European Social Survey.

    ICPSR — the Inter-university Consortium for Political and Social Research — is a membership-funded archive based at the University of Michigan, serving several hundred member institutions worldwide alongside non-member depositors and users. Its holdings span large-scale US and international surveys, criminal justice, education and ageing data, and it runs openICPSR as a self-publishing companion repository for rapid dissemination.

    How Do Deposit Workflows Compare?

    Both archives run a curated deposit model rather than a bare-metal upload box: staff review documentation, check disclosure risk and enhance metadata before release. The UK Data Service’s ESRC funding creates a contractual hook — grant holders are required to offer their data for archiving as a condition of the ESRC Research Data Policy — which ICPSR’s membership model does not replicate for non-US funders.

    • UK Data Service: two routes — the main curated collection for large, complex or sensitive studies, and ReShare, a lighter self-deposit repository for smaller datasets, code and syntax files.
    • ICPSR: two routes — the standard curated deposit process, and openICPSR, a self-publishing repository for researchers who want faster turnaround with lighter-touch review.

    Depositors submitting to either service should expect a documentation checklist covering variable-level metadata, consent and ethics evidence, and a data management plan — the same categories UKRI and NSF grant terms typically require regardless of which archive receives the deposit.

    How Do Restricted-Access Tiers Differ?

    Access tiering is where the two services diverge most for researchers working with confidential or disclosive social-science data. The UK Data Service operates a published three-tier model; ICPSR uses a comparable but differently named structure built around its Virtual Data Enclave.

    Access dimension UK Data Service ICPSR
    Open tier No registration; Open Government Licence data Public-use files via free MyData account
    Standard tier Safeguarded — registration plus End User Licence Member-institution access under consortium terms
    Restricted tier Controlled — SecureLab, requiring accredited-researcher training under the Five Safes Framework Restricted-use data via secure Virtual Data Enclave or encrypted physical media, subject to a data security plan
    Governance standard Accredited under the Digital Economy Act 2017 by the UK Statistics Authority (2020) Institutional Review Board and data-use-agreement based review

    The UK Data Service’s Five Safes Framework — safe people, projects, settings, data and outputs — was developed with HMRC DataLab and the Office for National Statistics Secure Research Services, and now underpins the SafePod Network launched in 2021 for wider geographical access to sensitive data. ICPSR’s restricted-data pathway achieves an equivalent security outcome through its enclave model but does not use the Five Safes terminology, which matters for UK researchers writing data management plans against ESRC or UKRI templates that reference it explicitly.

    How Do Citation Practices Compare?

    Both archives assign persistent identifiers and expect formal data citation, but their machinery differs. The UK Data Service works with DataCite and the British Library to issue DOIs and promotes an easy-to-use citation tool, framing its approach around the FAIR data principles — Findable, Accessible, Interoperable, Reusable — and its open-source QAMyData tool, which gives depositors a health check for numeric data before release.

    ICPSR similarly issues persistent identifiers for deposited studies and expects citation in publications that reuse its data, but its emphasis sits more on bibliography-style study citations tied to its own numbering system than on a dedicated public FAIR-compliance tool. For researchers publishing in journals that enforce data-availability statements — a growing requirement under funder open-science mandates — the practical difference is smaller than the access-tier gap: both produce a citable, resolvable record, but only the UK Data Service publishes a named QA tool for pre-citation data quality.

    Which Archive Should Researchers Outside Biomedicine Choose?

    For most projects the decision is jurisdictional rather than qualitative. A research data repository choice driven by funder mandate removes ambiguity immediately: ESRC-funded UK researchers must offer data to the UK Data Service, while NSF- or NIH-adjacent US social-science grants more commonly point toward ICPSR or openICPSR.

    • Choose the UK Data Service if your funder is UKRI/ESRC, your data concerns UK administrative, census or longitudinal panel data, or you need SecureLab/Five Safes access to controlled government microdata.
    • Choose ICPSR if your institution is a consortium member, your data is US-focused or cross-national with US partners, or you want the faster openICPSR self-publishing route.
    • Consult both catalogues before depositing internationally comparable survey data (e.g. European Social Survey, Eurobarometer) — coverage overlaps, and the UK Data Service can facilitate UK-based access to ICPSR holdings.

    Institutions building or reviewing a data management plan should treat this as a data repository for research compliance question first and a discoverability question second: a technically excellent dataset deposited in the wrong repository for its funder mandate creates avoidable rework at grant closeout.

    Answer-First Questions Researchers Ask

    What Is the UK Data Service?

    The UK Data Service is the ESRC-funded national repository for UK economic, population and social research data, led by the UK Data Archive at the University of Essex. It holds over 6,000 datasets, including census, survey and longitudinal study data, and operates under the OAIS digital-preservation reference model.

    How Do You Access Data on the UK Data Service?

    Access runs through three published tiers: Open data requiring no registration, Safeguarded data requiring registration and an End User Licence, and Controlled data requiring SecureLab accreditation under the Five Safes Framework. Most researchers start with the free data catalogue and register once they identify a specific study.

    Is the UK Data Service Free?

    Yes — the service is free to data owners depositing studies and free at the point of use for non-commercial research and teaching. Commercial users may incur administrative fees, and controlled-tier access requires accredited-researcher training rather than a monetary charge.

    Implications for Research Administrators

    Data management plans reviewed by institutional research offices, ARMA and INORMS-aligned research administrators, and funder compliance teams increasingly treat repository choice as an auditable field, not a footnote. A UK-funded study archived outside the UK Data Service without documented justification can trigger ESRC compliance queries at final reporting; a US consortium study left undeposited with ICPSR can weaken an institution’s case for renewed membership funding. Neither archive competes with domain-specific biomedical repositories governed by NISO, ICMJE or COPE norms — this comparison sits squarely in the national data repository space for social science, distinct from that ecosystem.

    As open-science mandates from UKRI, cOAlition S and equivalent US funders converge on FAIR-by-default expectations, the operational gap between the UK Data Service and ICPSR is narrowing to jurisdiction, access-tier terminology and citation tooling rather than underlying trustworthiness — both hold CoreTrustSeal certification and both sit inside the CESSDA/re3data recognised-repository landscape that funders now check by default.

  • CC0 for Data: CC-BY and Custom Licence Guide

    CC0 for data means dedicating a dataset to the public domain with no attribution requirement, while CC-BY permits free reuse conditional on credit — and for structured databases, neither Creative Commons tool may be the legally correct choice. Under the FAIR Data Principles, a licence is only “Findable, Accessible, Interoperable, Reusable” if it imposes minimal friction on machine and human reuse; CC0 is the tool most repositories recommend by default, CC-BY is acceptable where attribution norms are strong, and bespoke institutional terms are usually a liability, not a safeguard.

    CC0 (Creative Commons Zero) is a public domain dedication published by Creative Commons that waives copyright and related rights “to the fullest extent permitted by law”, allowing copying, modification, and commercial reuse without permission or credit.

    Why a data licence matters for FAIR reuse

    The FAIR Data Principles — Findable, Accessible, Interoperable, Reusable — treat licensing as a core reusability criterion, not an afterthought. A dataset can be technically accessible and still fail FAIR if its licence is ambiguous, restrictive, or silent on reuse conditions.

    Without an explicit licence, the default legal position in most jurisdictions is “all rights reserved”, deterring reuse even when the depositor intended openness. Data repositories such as Dryad require a clear waiver precisely to remove this ambiguity.

    • Findability is unaffected by licence choice, but reusability collapses without one.
    • Interoperability depends on whether the licence allows combination with other datasets under different terms.
    • Reusability is maximised when the licence imposes the fewest conditions consistent with the depositor’s actual requirements.

    CC0 vs CC-BY: what actually differs

    CC0 removes all conditions, including attribution; CC-BY keeps commercial and derivative reuse rights but makes crediting the source a licence condition rather than a courtesy. The practical consequences are larger for data than for text or images.

    Aspect CC0 CC-BY 4.0
    Attribution required No (legally); expected as scholarly norm Yes, legally enforceable
    Commercial reuse Permitted Permitted
    Combining with other datasets Frictionless Can trigger “attribution stacking”
    Recommended by Dryad, GBIF, most genomics/biodiversity repositories European Commission for some research data categories
    Applies cleanly to non-copyrightable facts Yes — designed for this case Ambiguous; CC-BY presumes a copyright interest may not exist in raw data

    The CESSDA Data Management Expert Guide notes that CC0 prevents attribution stacking — the compounding burden of citing every upstream source when a new dataset merges dozens of others. This is the strongest technical argument for CC0 over CC-BY in aggregated or long-tail scientific data. Dryad’s data-services team has explained that CC0 was “crafted specifically to reduce any legal and technical impediments… to the reuse of data” — a rationale FAIR later formalised as a reusability requirement.

    Does attribution disappear entirely under CC0?

    No. CC0 removes the legal obligation to cite, but citation remains a scholarly and professional norm enforced through peer review, journal policy, and disciplinary ethics rather than licence terms. Most researchers continue citing CC0 datasets exactly as they would any other source, because academic integrity — not copyright law — is what drives the practice.

    Why custom institutional terms usually backfire

    Some institutions draft bespoke data-sharing agreements instead of adopting a standard licence, adding restrictions such as “non-commercial use only” or “notify us before reuse”. This creates three recurring problems.

    • Machine unreadability: standard CC and Open Data Commons licences carry machine-readable metadata that repositories, indexers, and rights-clearance tools recognise automatically; bespoke legal text does not.
    • Interoperability failure: a custom clause requiring prior notification or a specific attribution format is often legally incompatible with the standard licences used by the other datasets a researcher wants to combine it with.
    • Enforcement uncertainty: institutions rarely have the resources to monitor or enforce bespoke terms, so the restriction deters legitimate reuse without stopping the misuse it was meant to prevent.

    The University of California’s Office of Scholarly Communication has argued that CC-BY is “not always a good fit” for data, since its legal machinery was designed for copyrightable creative works rather than mixed factual content — and a custom clause layered on top compounds that mismatch rather than resolving it.

    Databases are a special case: ODC-By and ODbL

    Raw facts are generally not copyrightable, but a database’s structure can attract separate rights, including the EU’s sui generis database right. This is a genuine gap in most CC0-vs-CC-BY explainers: Creative Commons licences were not written for database rights, and the Open Knowledge Foundation’s Open Data Commons suite exists specifically to cover them.

    • ODC-By (Open Data Commons Attribution License): permits copying, distribution, and commercial use of a database with attribution — the database-rights equivalent of CC-BY.
    • ODbL (Open Database License): adds a share-alike condition, so derived databases must carry the same licence — the database-rights equivalent of CC-BY-SA.
    • CC0 can still be applied to a database to waive both copyright and any sui generis database right simultaneously, which is why several major repositories default to it rather than layering ODC-By on top.

    Joint guidance from Kehl University of Applied Sciences and IP specialists Maucher Jenkins explicitly separates content, software, and databases into three categories, rather than treating “data licensing” as one undifferentiated choice — a distinction most generic CC0-vs-CC-BY articles omit.

    A decision framework for choosing a licence

    Choosing correctly requires matching the licence to the data type and the reuse goal, not defaulting to whichever licence a template happens to include.

    1. Default to CC0 for raw observational data, measurements, or any dataset likely to be combined with others — this is the position taken by repositories including Dryad and GBIF and referenced in OpenAIRE’s data-sharing guidance.
    2. Use CC-BY where the deposited content includes substantial original creative or analytical framing (for example, a curated data paper’s narrative sections) and attribution is central to the scholarly reward system.
    3. Use ODC-By or ODbL where the artefact is genuinely a structured database and jurisdiction-specific database rights are a live concern, particularly for depositors working under EU law.
    4. Avoid bespoke terms unless a named legal, ethical, or funder requirement (such as personal or sensitive data restrictions) makes a standard open licence genuinely unsuitable — and even then, prefer a recognised restricted-access framework over ad hoc legal drafting.

    Whichever licence is chosen, it must be declared unambiguously in the dataset’s metadata and in any accompanying data paper, since automated harvesters and data repository platforms increasingly reject or flag submissions with missing or non-standard licence fields.

    Answer-first Q&A

    Is CC0 free for commercial use?

    Yes. CC0 places a work in the public domain, so there is no restriction on commercial exploitation, modification, or redistribution. Any user — including a company building a commercial product — may use CC0 data without seeking permission, paying a fee, or providing credit, though citing the source remains good scholarly practice.

    Are CC0 and public domain the same?

    Not exactly. The Public Domain Mark is an informational label applied when a work is already believed to be out of copyright, while CC0 is an active legal waiver used by a rightsholder to voluntarily place their own work in the public domain. CC0 changes legal status; the Public Domain Mark only describes an existing one.

    Do I have to cite CC0 data?

    Legally, no — CC0 imposes no attribution requirement. In practice, researchers should still cite the original dataset because academic norms, journal policies, and reproducibility standards expect source attribution regardless of what the licence legally mandates.

    Raw facts generally cannot be copyrighted, but a database’s original selection, arrangement, or structure can attract copyright or, in the EU, a separate sui generis database right. This is precisely why database-specific licences such as ODC-By and ODbL exist alongside Creative Commons tools.

    Implications for repositories and institutions

    Repositories that mandate CC0 by default see fewer downstream reuse disputes and cleaner automated harvesting, because ambiguity is removed at the point of deposit. Institutions drafting data-management plans should specify the licence at policy level rather than per-project, and funders increasingly expect this decision documented, not left as “to be determined”.

    Looking ahead

    As FAIR compliance becomes a formal funder and publisher requirement rather than a voluntary aspiration, licence choice will keep moving from an afterthought to a mandatory, auditable field in data-management plans. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022 — a reminder that clear, jointly governed standards, rather than bespoke institutional terms, are what let research infrastructure scale across disciplines and borders.

  • Genomic Data Repository Guide: ENA vs GEO vs SRA

    Choosing a genomic data repository comes down to three questions: what type of data you have, whether it is identifiable human data, and what your funder or journal mandates. Raw sequencing reads generally go to the European Nucleotide Archive (ENA) or the Sequence Read Archive (SRA) — two mirrored nodes of the same international collaboration — while processed gene-expression data belongs in the Gene Expression Omnibus (GEO). A genomic data repository is a persistent, publicly accessible database that assigns stable identifiers to deposited sequence or expression datasets so they can be cited, retrieved and reused under FAIR data principles.

    ENA, GEO and SRA are the three repositories researchers encounter most often when funder or journal data-sharing policies require deposition of sequencing output. They are not interchangeable: each has a different primary data type, a different metadata standard, and a different position in the international data-sharing infrastructure. This guide compares them on deposit requirements, metadata standards and journal acceptance so research administrators and authors can make a defensible, mandate-compliant choice.

    What is a genomic data repository?

    A genomic data repository is a curated, publicly accessible database that archives DNA or RNA sequence data — raw reads, assembled genomes, or processed expression tables — and assigns each dataset a stable accession number for permanent citation. Repositories exist because journals and funders increasingly require that sequence data underlying a publication be deposited somewhere reviewers, readers and future researchers can retrieve it, rather than held privately by the authors.

    The three most consulted repositories for sequencing output are the European Nucleotide Archive (ENA), the Sequence Read Archive (SRA), and the Gene Expression Omnibus (GEO). ENA and SRA are both members of the International Nucleotide Sequence Database Collaboration (INSDC), alongside Japan’s DNA Data Bank of Japan (DDBJ); records submitted to any one of the three are mirrored across all of them, typically within 24-48 hours.

    ENA vs GEO vs SRA: how do they differ?

    The single biggest distinction is data type: ENA and SRA hold raw sequence reads (FASTQ, BAM, CRAM), while GEO holds processed functional genomics results — expression matrices, normalised counts and the experimental metadata describing them — and links out to SRA for the underlying raw reads. Geography and stewardship differ too: ENA is maintained by EMBL-EBI in the UK/Europe, while SRA and GEO are both maintained by the US National Center for Biotechnology Information (NCBI).

    Feature ENA GEO SRA
    Steward EMBL-EBI (Europe) NCBI (US) NCBI (US)
    Primary data type Raw reads, assemblies, annotated sequences Processed expression data + metadata Raw sequencing reads
    INSDC member Yes No (links to SRA) Yes
    Metadata standard ENA checklists MINSEQE / MIAME INSDC submission schema
    Access model Open (controlled tier via EGA for identifiable human data) Open Open (controlled tier via dbGaP)

    A frequently overlooked distinction is access control. None of ENA, SRA or GEO is designed to hold identifiable human genomic or phenotypic data. That category of data belongs in a controlled-access archive — the European Genome-phenome Archive (EGA), jointly run by EMBL-EBI and the CRG, or NCBI’s database of Genotypes and Phenotypes (dbGaP) — where access is granted through a data access committee rather than opened to the public. Depositing identifiable clinical genomic data in an open repository such as ENA or SRA would breach both the repositories’ own policies and, in most jurisdictions, data protection law.

    What are the deposit requirements for each repository?

    Each repository sets its own submission checklist, but all three require a structured description of the experiment alongside the sequence files themselves.

    • ENA requires a study, sample, experiment and run object for each submission, described against one of ENA’s checklist templates (for example, the pathogen or invertebrate checklists), plus the raw read files.
    • SRA requires equivalent BioProject and BioSample records, submitted through NCBI’s submission portal, with reads in FASTQ or BAM/CRAM format.
    • GEO requires a MINSEQE-compliant description of the experimental design (samples, protocols, processed data matrix) and will route the corresponding raw reads to SRA as part of the same submission, generating a linked SRA accession automatically.

    Because ENA and SRA mirror each other, a dataset submitted to one is not normally resubmitted to the other — submitting twice creates duplicate, unlinked accessions rather than better coverage.

    Which metadata standards apply?

    Metadata quality, not just file deposition, is what makes a dataset FAIR — Findable, Accessible, Interoperable and Reusable, per the FAIR data principles first published by Wilkinson et al. in 2016. GEO submissions are assessed against MIAME (Minimum Information About a Microarray Experiment) for array data and MINSEQE (Minimum Information about a high-throughput Nucleotide Sequencing Experiment) for sequencing-based expression studies. ENA and SRA submissions follow INSDC’s shared sample and experiment metadata schema, supplemented by checklist-specific fields for the sample type in question.

    Consistent metadata is also what allows a dataset to be discovered through cross-repository registries such as re3data and FAIRsharing, both of which index genomic repositories alongside thousands of other subject and generalist repositories.

    Do journals and funders accept all three equally?

    Most journal data-availability policies name an INSDC-compliant repository — ENA, SRA or DDBJ — as the acceptable destination for raw sequence data, and GEO or ArrayExpress for expression data. PLOS, for example, states that authors should select field-appropriate repositories and lists ENA, SRA, GEO and DDBJ among its recommended sequencing repositories, while also pointing authors to re3data and FAIRsharing when no field-specific option exists.

    Funder policy is generally repository-agnostic within the INSDC family: the NIH Genomic Data Sharing Policy and the 2023 NIH Data Management and Sharing Policy both accept SRA, dbGaP or an equivalent controlled-access archive for human data, without mandating SRA specifically over ENA. UK and European funders operating under UKRI or Horizon Europe open-science requirements similarly accept any INSDC-affiliated repository, reflecting the FAIR data principles rather than naming a single preferred database.

    Frequently asked questions

    What is the difference between ENA, GEO and SRA?

    ENA and SRA both archive raw sequencing reads and mirror each other as INSDC members, differing mainly in which institution — EMBL-EBI or NCBI — hosts the submission. GEO instead archives processed gene-expression results and metadata, forwarding the associated raw reads to SRA automatically during submission.

    Do I need to submit data to both GEO and SRA?

    Not separately. When you submit a gene-expression study to GEO, the platform generates a linked SRA accession for the raw reads as part of the same workflow, so a single submission satisfies both repositories without duplicate uploads.

    Is ENA the same as SRA?

    No — they are separate databases run by different organisations that mirror the same underlying INSDC data. A dataset submitted to ENA in Europe becomes visible through SRA in the US within roughly one to two days, and vice versa, so researchers choose one, not both.

    Which repository do funders require for genomic data?

    Most funder policies, including NIH’s Genomic Data Sharing Policy and UKRI’s open research requirements, accept any INSDC-affiliated repository — ENA, SRA or DDBJ — for raw sequence data, plus GEO for expression data, rather than mandating one specific database.

    What this means for research administrators

    For institutions building data-management-plan templates or compliance checklists, the practical rule is to map deposition guidance to data type and access sensitivity rather than to a single named repository: raw non-identifiable reads to ENA or SRA, expression matrices to GEO, and any identifiable human genomic or clinical data to a controlled-access archive such as EGA or dbGaP. Framing repository choice this way keeps research administration guidance aligned with funder and journal policy regardless of which INSDC node an individual researcher prefers to use.

    As funder mandates increasingly cite FAIR data principles explicitly rather than naming individual repositories, the durable compliance strategy is to select any INSDC-affiliated repository appropriate to the data type, document the accession number in the manuscript, and reserve controlled-access archives strictly for identifiable human data. Research offices that build this decision logic into deposit checklists now will need far less rework as funder policy language continues to converge on FAIR terminology rather than named databases.

  • EU-US Data Privacy Framework for Research Data

    The EU-US Data Privacy Framework (DPF) is the adequacy mechanism that lets UK and EU research institutions send personal data to self-certified US collaborators without signing Standard Contractual Clauses, provided the US recipient holds active DPF status covering the right data category. Where a collaboration involves health, genetic or other sensitive research data, extra labelling duties apply before the transfer can rely on the Framework at all.

    The EU-US Data Privacy Framework is a voluntary self-certification scheme, administered by the US Department of Commerce and underpinned by the European Commission’s 10 July 2023 adequacy decision, that recognises participating US organisations as offering GDPR-equivalent protection for personal data received from the EEA. A parallel UK adequacy instrument extends the same recognition to transfers made under UK GDPR. For research offices coordinating cross-border studies, biobanks, consortium agreements or collaborative datasets with US partners post-Brexit, choosing correctly between the DPF, the UK Extension and Standard Contractual Clauses (SCCs) determines whether a transfer is lawful on day one or exposed to later challenge.

    What is the EU-US Data Privacy Framework?

    The EU-US Data Privacy Framework replaced the invalidated EU-US Privacy Shield after the Court of Justice of the European Union’s 2020 Schrems II ruling found US surveillance law did not offer equivalent protection. The European Commission’s adequacy decision of 10 July 2023 concluded that the DPF ensures an adequate level of protection for personal data transferred to certified US organisations, removing the need for Standard Contractual Clauses on covered transfers.

    Eligibility is narrower than it first appears. Only US organisations regulated by the Federal Trade Commission or the Department of Transportation may self-certify, which excludes many non-profits, banks, insurers and telecoms — categories that include some university-affiliated research foundations and repositories. Institutions must verify a partner’s active status on the official DPF list before relying on it, and confirm the certification covers the specific data category (HR or non-HR) being shared.

    How does the UK Extension (Data Bridge) work post-Brexit?

    Since Brexit, UK organisations cannot rely on the EU adequacy decision directly. The Data Protection (Adequacy) (United States of America) Regulations 2023 created a separate UK Extension — commonly called the UK-US Data Bridge — which came into force on 12 October 2023 and lets UK organisations, including universities and Gibraltar-based bodies, make restricted transfers to US businesses that have separately self-certified to the UK Extension.

    Per the Information Commissioner’s Office, a UK institution relying on the Data Bridge must confirm the US recipient has active status on the DPF list, has specifically opted into the UK Extension (not only the EU-US DPF), and that its registration covers the correct data type. Periodic re-checks are required, since a US partner can lose or withdraw certification at any point during a live research project.

    EU-US Data Privacy Framework vs Standard Contractual Clauses for research data

    Where a US collaborator is not DPF-certified — common among smaller labs, non-profits and public bodies outside FTC/DoT jurisdiction — Standard Contractual Clauses remain the fallback transfer mechanism. UK exporters use the International Data Transfer Agreement (IDTA) or the UK Addendum to the EU’s SCCs, and, following Schrems II, must complete a Transfer Risk Assessment (TRA) examining whether US law could undermine the contractual protections.

    Feature DPF / UK Extension (Adequacy) Standard Contractual Clauses (SCCs)
    Legal basis Adequacy decision (EU) / adequacy regulations (UK) Contractual safeguard under UK GDPR Art. 46 / EU GDPR Art. 46
    Recipient eligibility Limited to self-certified, FTC/DoT-regulated US organisations Any US recipient, regardless of sector
    Transfer Risk Assessment required No Yes, mandatory since Schrems II
    Sensitive/special category data Must be explicitly flagged as “sensitive” to the recipient Protections negotiated within the contract and TRA
    Ongoing obligation Periodic verification of active DPF/UK Extension status Periodic review of the TRA and supplementary measures

    Many research offices now adopt a “belt and braces” approach: relying on the Data Bridge where a partner is certified, while keeping SCCs signed as a fallback in case certification lapses mid-project — a real risk, since a US partner can be forcibly removed from the DPF list by the Department of Commerce.

    Data sharing agreement vs data processing agreement: which applies?

    A data sharing agreement (DSA) and a data processing agreement (DPA) serve different roles in a research collaboration, and confusing them is a common compliance gap. A DSA is used when two institutions each act as independent or joint controllers — for example, two universities pooling anonymised survey results for a shared analysis. A DPA (required under UK GDPR Article 28) is used when one party processes data solely on the instructions of another, such as a US cloud vendor hosting a UK institution’s research dataset.

    • Use a DSA when both parties determine the purposes of processing (joint or independent controllers).
    • Use a DPA when one party is a processor acting only on the controller’s documented instructions.
    • Either document sits alongside, not instead of, the transfer mechanism (DPF, UK Extension or SCCs) — the agreement governs the relationship; the mechanism governs the lawfulness of the cross-border movement itself.

    What special rules apply to sensitive research data?

    Research data frequently includes health records, genetic material or biobank samples — categories UK GDPR classifies broadly as special category data. The DPF’s definition of “sensitive data” is narrower: only genetic data, biometric data used for unique identification, information about sexual orientation, and criminal offence data are covered, and only if the UK or EU sender proactively identifies and marks them as sensitive before transfer.

    This is a frequently overlooked gap for research consortia: personal data revealing ethnicity, religion, trade union membership or health status more broadly is special category data under UK GDPR but is not automatically treated as sensitive under the DPF unless explicitly flagged. Institutions transferring such data should apply a persistent classification (metadata tags or labelling) that survives onward sharing by the US recipient, and document this step in the study’s data management plan.

    Frequently asked questions

    What is the EU-U.S. Data Privacy Framework?

    The EU-U.S. Data Privacy Framework is a self-certification scheme allowing US organisations to receive personal data from the EEA under an EU adequacy decision. It replaced the invalidated Privacy Shield and removes the need for Standard Contractual Clauses for covered, certified transfers.

    What happened to the EU-US Privacy Shield?

    The Privacy Shield was invalidated in July 2020 by the Court of Justice of the EU in Schrems II, which found US surveillance access to personal data was not sufficiently limited. The Data Privacy Framework was negotiated as its successor and adopted in 2023.

    What is the status of the EU-U.S. Data Privacy Framework?

    As of mid-2026 the DPF remains in force, with the EU adequacy decision, the UK Extension and the Swiss-US DPF all active, though the mechanism continues to face legal challenges in the European courts, as its predecessors did.

    Implications for research institutions

    For research administrators managing international collaborations, the practical task is procedural discipline: verify DPF or UK Extension status before every transfer, not just at project setup; classify sensitive data explicitly; and keep SCCs and a completed Transfer Risk Assessment on file as a contingency. Given the DPF’s contested legal history, institutions that treat adequacy as a convenience rather than a permanent guarantee will be best placed to keep collaborations lawful if the Framework is narrowed or challenged again.

    These obligations sit within the broader compliance landscape that research administration teams increasingly own alongside funders, ethics committees and legal counsel — making data transfer literacy as core to running an international study as the science itself.

  • Research Data Management Policy: Not Just a DMP

    A research data management policy is an institution-wide governance document that sets ownership, retention, storage and researcher-responsibility rules for all research data an organisation produces — distinct from a data management plan (DMP), which is a project-specific document written for a single grant. Confusing the two leaves institutions with fragmented practice: strong per-grant DMPs but no consistent rule for what happens to data once a project, or a researcher, moves on.

    A research data management policy is the institutional framework; the DMP is one project’s implementation of it. This article sets out the structural difference and gives a template for writing the institutional-level document, covering ownership, retention tiers, storage classes and researcher obligations.

    What is a research data management policy?

    A research data management (RDM) policy is a formally approved institutional document — typically ratified by a university executive, senate or research committee — that defines how all research data created, collected or reused at that institution must be handled across its lifecycle: creation, active use, retention, sharing and disposal.

    Unlike guidance notes or web pages, a policy carries institutional authority: it assigns accountability, sets minimum retention periods, and states what happens by default when a researcher leaves or a grant closes. The UKRI Concordat on Open Research Data (2016, updated 2020), signed by UK Research and Innovation, Universities UK and the Wellcome Trust among others, sets out common principles — including that research data are a public good and that costs of good data management are legitimate, fundable research costs. Most UK institutional RDM policies, including those at Edinburgh, Southampton and Manchester, cite the Concordat directly as their basis.

    Research data management policy vs a data management plan

    The policy and the DMP operate at different scopes and answer different questions. The policy answers “what does this institution require of everyone, always?” The DMP answers “how will this specific project handle its specific data?” A DMP written for a UKRI or Horizon Europe grant should reference and comply with the institutional policy, not substitute for it.

    Dimension Institutional RDM policy Data management plan (DMP)
    Scope Whole institution, all research Single project or grant
    Author Research office, library, IT, governance committee Principal investigator / research team
    Trigger Approved once, reviewed periodically Written at proposal stage, revised through project life
    Contains Ownership defaults, retention minimums, storage tiers, roles Dataset types, volumes, specific repositories, embargo dates
    Enforcement Institutional compliance / disciplinary framework Funder compliance check at reporting/audit
    Review cycle Every 3-5 years (Edinburgh’s policy specifies five) Reviewed and updated within the life of one project

    A well-run institution needs both, in that order: the policy first, so every subsequent DMP inherits a consistent set of defaults — retention minimums, approved repositories, data protection procedures — rather than each research team inventing its own.

    Template structure for an institutional RDM policy

    Reviewing current UK institutional policies (Edinburgh, Southampton, Manchester, Birmingham, Cambridge) shows a consistent structural skeleton. A new or revised policy should include, in order:

    • Purpose and scope — why the policy exists, and which staff, students and data types it covers.
    • Definition of research data — the institution’s own working definition (the UKRI Concordat’s is a common starting point: digital or analogue information collected, observed or created to validate research findings).
    • Roles and responsibilities — who is the data owner by default (usually the institution), who is the data steward (usually the principal investigator), and what the research office, IT services and library each provide.
    • Data management planning requirement — a mandate that a DMP must exist for every funded (and, ideally, every unfunded) research project, and where that requirement sits relative to ethics approval.
    • Storage and security tiers — approved storage classes mapped to data sensitivity.
    • Retention and disposal — minimum retention period, and the trigger for review or deletion.
    • Sharing, access and FAIR compliance — the institution’s default position on open data, exceptions for confidentiality, and adherence to the FAIR principles (Findable, Accessible, Interoperable, Reusable), as defined by Wilkinson et al. in Scientific Data (2016).
    • Legal and ethical compliance — UK GDPR and Data Protection Act 2018 obligations for personal data, plus any sector-specific requirements.
    • Review cycle and ownership of the policy itself — who revises it and how often.

    This ordering matters: policies that lead with storage and IT detail before establishing roles tend to read as IT documents rather than governance ones, which weakens researcher buy-in.

    Retention, ownership and storage tiers

    Retention should be set as a minimum, not a target. A commonly cited UK baseline is three years from project end or publication, with the caveat that funder, sponsor or disciplinary requirements specifying longer periods take precedence — clinical and health-related data, for example, routinely requires 10-15 year retention under separate regulatory regimes.

    Ownership defaults matter because researchers move institutions far more often than data does. Most UK institutional policies assign underlying ownership of research data to the institution as the legal entity that employed the researcher and typically held the grant, while the principal investigator retains stewardship responsibility — the practical duty of care — during and after the project. This split must be stated explicitly, not left implicit, because it is the clause institutions rely on when a departing researcher wants to take data with them.

    Storage tiers should be mapped to data sensitivity rather than treated as one undifferentiated pool. A workable minimum is three tiers:

    • Tier 1 — open/shareable: deposited in a Re3data-listed, CoreTrustSeal-certified repository with a DOI via DataCite.
    • Tier 2 — restricted/sensitive: access-controlled institutional storage under a data sharing agreement.
    • Tier 3 — confidential/personal: encrypted storage meeting UK GDPR requirements, with a Data Protection Impact Assessment on file.

    Researcher obligations and governance roles

    The policy should state researcher obligations as directives, not suggestions. At minimum, researchers are required to: complete a DMP before data collection begins; store active data only in institutionally approved systems; register externally held datasets with the institution; and provide a data access statement or citation in any publication when the underlying data are not directly deposited.

    Governance sits across three functions the policy must name individually: the research office (grant compliance, costing RDM into proposals — UKRI states that RDM costs are eligible under its funding), IT services (approved storage infrastructure and security), and the library or research data service (repository operation, metadata standards, researcher training). ARMA and INORMS provide sector benchmarking for how these research administration roles are typically distributed across institutions.

    Common questions

    What is the difference between a research data management policy and a data management plan?

    A research data management policy is an institution-wide governance document setting defaults for ownership, retention and storage. A data management plan is a project-specific document, usually required by a funder at proposal stage, that details how one project’s data will be collected, stored and shared within those institutional defaults.

    Who is responsible for research data management at an institution?

    Responsibility is shared but must be explicitly assigned. The principal investigator is typically the data steward for a given project; the institution holds underlying ownership; and the research office, IT services and library provide the supporting infrastructure, costing advice and repository services the policy commits to.

    How long should institutions retain research data?

    Most UK institutional policies set a minimum retention period of three years from project end or publication, deferring to longer funder-, sponsor- or discipline-specific requirements where they apply — for example, clinical research data typically requires substantially longer retention under separate regulatory regimes.

    What does FAIR data mean in a research data management policy?

    FAIR stands for Findable, Accessible, Interoperable and Reusable — principles defined by Wilkinson et al. (2016) that a policy should require researchers to apply when depositing data, typically through persistent identifiers, standard metadata and appropriate licensing. See the CASRAI research data dictionary for related term definitions.

    Implications for research administrators

    Institutions that only mandate DMPs at grant stage, without an underlying institutional policy, end up with inconsistent retention practice, ambiguous ownership when staff leave, and duplicated storage costs across departments running incompatible systems. Writing the institutional policy first — using the structure above — gives every subsequent DMP a consistent, auditable baseline, and gives research offices a defensible answer when a funder, ethics committee, or departing researcher asks who owns what and for how long.

    As RDM costs are increasingly built into grants and UK institutions face growing FOI and audit scrutiny of data retention, the institutional policy is the operational backbone that per-project DMPs are supposed to inherit from, not replace.

  • DMPonline vs DMPTool vs Argos: DMP Tool Guide

    DMPonline, DMPTool and Argos are the three leading platforms for writing a data management plan (DMP): DMPonline (Digital Curation Centre, UK) and DMPTool (California Digital Library, US) share the same open-source DMP Roadmap codebase, while Argos (OpenAIRE) is built for machine-actionable, European open-science workflows. The right choice depends on your funder’s templates, whether your institution offers a branded instance, and whether you need structured API export.

    A data management plan tool is software that walks a researcher through funder- and institution-specific questions, stores the resulting answers as a structured document, and — increasingly — exports that document in a machine-readable format rather than as static prose. DMPonline is the Digital Curation Centre’s web-based DMP-writing service, built on the open-source DMP Roadmap platform it co-develops with the California Digital Library. This guide compares it against DMPTool and Argos on the three factors that actually decide adoption: funder-template coverage, institutional branding, and API export.

    What is DMPonline, and who runs it?

    DMPonline is a free web application, developed and hosted by the Digital Curation Centre (DCC), based at the University of Edinburgh. It supports researchers in producing a data management plan against a specific funder or institutional template, with embedded guidance text at each question. It is the standard reference tool for UK Research and Innovation (UKRI) grant-holders and is widely adopted across UK and European universities.

    Many institutions run their own branded instance rather than sending researchers to the generic service — the University of Manchester, University of Sheffield, University of Plymouth and University of Exeter all operate dedicated DMPonline subdomains with local templates and guidance layered on top of the shared DCC platform.

    DMPonline vs DMPTool: same codebase, different communities

    DMPonline and DMPTool are not separate products built by rival teams — they run on the same open-source codebase, DMP Roadmap, jointly developed by the DCC and the California Digital Library (CDL). The practical difference is community and funder coverage, not underlying functionality.

    DMPTool, operated by the CDL (part of the University of California system), is the default choice for US-based researchers, carrying templates for agencies such as the National Science Foundation (NSF) and National Institutes of Health (NIH). DMPonline carries the equivalent depth for UK and European funders, including UKRI’s constituent research councils and Wellcome Trust. Because both draw on the same codebase, a plan exported from either tool follows a broadly comparable data model — the divergence sits in which templates, guidance text and institutional branding are pre-loaded, not in the software itself.

    What is Argos, and how does it differ?

    Argos is a DMP-writing platform developed within OpenAIRE, the European open-science infrastructure, rather than from the DMP Roadmap lineage. Argos was designed from the outset around machine-actionable output, producing plans as structured objects intended to connect into the wider European research-information graph rather than sit as a standalone PDF.

    Its templates lean towards Horizon Europe and European Research Council (ERC) requirements, and its architecture emphasises linking a DMP’s contents — datasets, repositories, funders, organisations — to persistent identifiers already circulating in the OpenAIRE Research Graph. For a European-funded project embedded in that ecosystem, this integration is a genuine functional difference, not just a branding one.

    Funder-template coverage: which tool fits your funder

    Template coverage is usually the deciding factor, since a funder-specific template determines exactly which questions a plan must answer. The table below summarises where each platform’s template strength lies.

    Platform Steward Strongest funder coverage Typical user base
    DMPonline Digital Curation Centre UKRI councils, Wellcome Trust, UK institutional templates UK and European universities
    DMPTool California Digital Library NSF, NIH, US federal agency templates US universities and research institutes
    Argos OpenAIRE Horizon Europe, ERC, EOSC-aligned funders European open-science projects

    None of the three restricts researchers to their “home” funder templates — DMPonline hosts non-UK institutional templates, and DMPTool lists non-US funders too — but the depth of guidance and the freshness of template maintenance concentrate where each tool’s steward organisation has direct funder relationships.

    Institutional branding and API export compared

    Beyond templates, two practical factors distinguish the tools for an institution deciding which one to adopt.

    • Institutional branding. Both DMPonline and DMPTool support institution-specific branded sub-sites — a university can present its own logo, guidance text and curated template list under its own subdomain while the underlying platform remains centrally maintained. Argos, built for the OpenAIRE/EOSC ecosystem, is more typically deployed as a shared service with organisation profiles rather than fully white-labelled institutional instances.
    • API and machine-actionable export. All three platforms are converging on the RDA DMP Common Standard, developed by the Research Data Alliance’s working group on machine-actionable DMPs, which defines a shared JSON structure for exporting plan content. This is what allows a plan written in one tool to be read, in principle, by a funder system, a repository, or a research-information system rather than only by a human reader.

    For research administrators evaluating tools as part of broader research administration workflows, the practical question is less “which tool is best” and more “which tool’s export format and branding options integrate with our existing repository, CRIS and grants-management systems”.

    Common questions about choosing a DMP tool

    Do I need a data management plan?

    Most major funders — including UKRI, Wellcome Trust, the NSF, the NIH and Horizon Europe — require a data management plan as a condition of funding. If your grant application names one of these funders, you need a DMP, and using DMPonline, DMPTool or Argos is the fastest route to a compliant one.

    How do I write a data management plan?

    Writing a DMP means working through a funder-specific template — covering what data you will create, how it will be documented, where it will be stored, and how it will be shared or preserved. DMPonline, DMPTool and Argos each provide the relevant template with embedded guidance, rather than requiring you to draft one from a blank page.

    What is included in a data management plan?

    A DMP typically covers the types of data to be produced, the metadata and documentation standards used, access and sharing policies, and the plan for long-term archiving and preservation. Machine-actionable tools structure these elements so they can be exported and reused by other systems, not just read once.

    Choosing a tool: what the decision actually hinges on

    Because DMPonline, DMPTool and Argos are all converging on the same RDA DMP Common Standard for export, the choice between them is rarely a compatibility question. It comes down to fit: which platform already carries deep templates for your funder, whether your institution operates a branded instance you are expected to use, and whether your downstream systems consume RDA-conformant JSON export.

    For a UK or European researcher working with UKRI or Wellcome funding, DMPonline is the default starting point. For a US researcher working with NSF or NIH funding, DMPTool serves the equivalent role. For a Horizon Europe or ERC-funded project deeply embedded in the EOSC ecosystem, Argos’s machine-actionability and graph integration make it the stronger fit. As the RDA Common Standard matures further, expect the practical differences between the three to narrow to templates and branding alone, with export interoperability becoming a solved problem rather than a selection criterion.