Tag: HDR UK

  • Trusted Research Environments and the Five Safes: working safely with sensitive data

    Some of the most valuable research data in existence — linked health records, administrative data about whole populations, tax and benefit records, detailed information about individuals’ lives — is also some of the most sensitive. It can answer questions nothing else can, yet it cannot responsibly be copied, emailed or downloaded to an analyst’s machine, because doing so would scatter highly personal information across uncontrolled devices and betray the trust of the people it describes. The dominant answer to this dilemma is to invert the usual model: instead of bringing the data to the researcher, bring the researcher to the data. This is what a Trusted Research Environment does, and the Five Safes framework is the structure that lets everyone reason about whether such an arrangement is genuinely safe. Both sit within the research security domain of the CASRAI Dictionary.

    What a Trusted Research Environment is

    A Trusted Research Environment (TRE), also called a Secure Data Environment or secure data enclave, is a controlled computing setting in which approved researchers can analyse sensitive data without being able to remove it. The data stays inside; the analyst logs in remotely and works on it through the environment’s own tools. Code and queries run against the data within the secure walls, and only checked, aggregated results — never the raw records — are permitted to leave. The shift is profound. In the old model, access meant possession; once you held a copy, the custodian had lost control of it. In the TRE model, access is separated from possession: researchers can do everything they need to do with the data while never holding it. That separation is what makes it possible to grant meaningful access to genuinely sensitive material without accepting the risk that it leaks.

    The Five Safes framework

    A secure environment is necessary but not sufficient. Safe use of sensitive data depends on far more than the technology, and the Five Safes framework, developed originally at the UK’s Office for National Statistics and now used internationally, captures the full set of dimensions that have to be managed together:

    • Safe people. Are the researchers trustworthy, trained and accountable? Access is granted to vetted, often accredited individuals who understand and accept their obligations.
    • Safe projects. Is the proposed use appropriate, lawful and in the public interest? Each project is assessed before access is granted, not waved through.
    • Safe settings. Does the environment itself prevent unauthorised access or removal of data? This is the TRE’s technical and physical security.
    • Safe data. Has the data been treated to reduce risk — minimised, de-identified or otherwise protected to the degree the project requires?
    • Safe outputs. Are the results that leave the environment checked to ensure they cannot reveal anything about an individual? This is statistical disclosure control on the way out.

    The power of the framework is that it makes risk a property of the whole system rather than any single control. A relaxation on one dimension can be balanced by tightening another; a weakness on one is not hidden by strength elsewhere. It gives data custodians, researchers and the public a shared language for asking, and answering, “is this safe?”

    Real environments in practice

    These ideas are not theoretical. Several established environments demonstrate the model at scale. The ONS Secure Research Service provides accredited researchers with secure access to de-identified data for projects serving the public good. The SAIL Databank in Wales links and provides anonymised population data within a trusted environment for health and population research. OpenSAFELY took the principle further during a period of intense need: rather than moving records into an environment at all, it lets researchers run analysis code against electronic health records inside the secure systems where those records already live, with all the code published openly for scrutiny. Bodies such as Health Data Research UK (HDR UK) have worked to align practice across such environments so that they meet common expectations rather than each inventing its own rules. Together these show that the model works — that society can extract enormous research value from sensitive data while keeping faith with the people behind it.

    Transparency as a safeguard

    One feature of the more advanced environments deserves emphasis, because it marks a real advance in trustworthiness: transparency of analysis. When the code that runs against sensitive data is itself published, anyone can see exactly what was done. This serves two ends at once. It makes the research reproducible and auditable, which is good scientific practice. And it provides public accountability for the use of data the public has entrusted to researchers — people can see what is being done with information about them. Transparency does not weaken security; it strengthens the social licence on which the whole enterprise depends. The most defensible position is not secrecy about what is done with sensitive data, but openness about the method combined with strict control of the data itself.

    How TREs relate to open data

    It would be a mistake to read TREs as a retreat from open research; they are better understood as the mechanism that lets sensitive data participate in good data practice at all. The metadata describing what a TRE holds can be openly published, so researchers know the data exists and can apply to use it; the analysis code can be open; the results, once disclosure-checked, are shared. What stays controlled is only the irreducible core — the personal records themselves. This is the familiar principle of being as open as possible and as closed as necessary, made operational. The wider questions of working with controlled material are explored in our writing on research administration.

    A consistent vocabulary for safe access

    For TREs to interoperate and for the Five Safes to be applied consistently, the terms involved — access conditions, accreditation status, output-checking requirements, data sensitivity categories — must mean the same thing across institutions and environments. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the governance information surrounding sensitive data is understood identically wherever it travels. And because analysing data within a TRE is genuine, recognisable research contribution, the work can be described using the same framework as any other — the CRediT taxonomy and its full set of contribution roles. Trusted Research Environments and the Five Safes together show that protecting people and enabling discovery are not opposing goals but two halves of doing sensitive research well.