Tag: lawful basis

  • GDPR and research data: lawful bases, consent and pseudonymisation

    An enormous amount of research depends on data about people — their health, their behaviour, their genetics, their opinions, their lives. Wherever such data identify or could identify individuals, they fall within data protection law, and in Europe and the United Kingdom that law is the General Data Protection Regulation (GDPR), supplemented in the UK by the UK GDPR and the Data Protection Act 2018. For researchers the GDPR is sometimes experienced as a thicket of obligations. But its core ideas are coherent, and it contains specific provisions designed to enable responsible research rather than obstruct it. Understanding lawful bases, the special rules for sensitive data, the research exemptions, and the distinction between anonymisation and pseudonymisation is part of doing data-driven research properly. This article offers an orientation, drawing on the compliance and regulatory domain of the CASRAI Dictionary. It is general guidance, not legal advice.

    You need a lawful basis

    The first principle is that processing personal data is not permitted by default; it requires a lawful basis. Article 6 of the GDPR sets out the possible bases, several of which can be relevant to research. Many researchers assume the answer is always consent, but for research by public institutions a basis such as the performance of a task carried out in the public interest is often more appropriate. The choice matters because different bases carry different consequences for the rights individuals can exercise. The key point is that a researcher must be able to identify and justify the lawful basis on which they process personal data — good intentions and scientific value do not by themselves make processing lawful.

    Special category data and Article 9

    Much research data is not merely personal but sensitive — data about health, genetics, ethnicity, sexual life, religious or political beliefs, and so on. The GDPR calls these special categories and gives them extra protection under Article 9, which prohibits their processing unless a specific additional condition is met. Among those conditions are explicit consent and, importantly for research, processing necessary for scientific research purposes subject to appropriate safeguards. This means that to process sensitive data lawfully, a researcher must satisfy both a lawful basis under Article 6 and a condition under Article 9. The heightened protection reflects the heightened risk: misuse of health or genetic data can cause serious harm, and the law accordingly demands a stronger justification and stronger safeguards before such data may be used.

    The research provisions

    The GDPR explicitly recognises the value of research and contains provisions, centred on Article 89, intended to facilitate it while protecting individuals. These measures allow certain flexibilities under conditions — for example, data collected for one purpose may in some circumstances be further processed for scientific research without that being treated as incompatible with the original purpose, and certain individual rights may be adjusted where they would seriously impair research objectives. Crucially, these provisions are not a free pass. They are conditioned on appropriate safeguards for the rights and freedoms of individuals — safeguards that the regulation specifically associates with techniques such as data minimisation and, prominently, pseudonymisation. The research exemptions, in other words, come bundled with the expectation that researchers will take concrete measures to protect the people in their data.

    Anonymisation versus pseudonymisation

    One distinction does more practical work in research than almost any other, and it is frequently misunderstood: the difference between anonymisation and pseudonymisation.

    • Anonymisation means rendering data such that individuals are no longer identifiable, by anyone, taking account of all means reasonably likely to be used. Genuinely anonymous data falls outside the scope of the GDPR altogether, because it is no longer personal data. Achieving true anonymisation is harder than it sounds, because seemingly innocuous combinations of fields can re-identify people.
    • Pseudonymisation means processing data so that it can no longer be attributed to an individual without additional information — for example, replacing names with a code, while keeping the key that links code to identity separate and secure. Pseudonymised data remains personal data and remains within the GDPR’s scope, because re-identification is still possible with the key.

    The error to avoid is treating pseudonymised data as if it were anonymous and therefore outside the law. Pseudonymisation is a valuable safeguard — indeed the GDPR commends it — but it reduces risk rather than removing the data from regulation. Knowing which one you have done determines what obligations still apply.

    Accountability and impact assessments

    The GDPR is built on accountability: it is not enough to comply, one must be able to demonstrate compliance. For research using personal data this brings practical obligations — documenting the lawful basis and Article 9 condition, being transparent with participants, applying data minimisation, and securing the data. Where processing is likely to result in a high risk to individuals — as large-scale processing of sensitive data often will — a data protection impact assessment (DPIA) may be required, identifying the risks and planning mitigations before processing begins. The DPIA is not merely a form to file; it is the moment at which a team thinks systematically about how its use of personal data could affect people and how to reduce that effect.

    A consistent vocabulary for compliance

    Data protection touches institutions, funders, ethics committees and repositories alike, and for the relevant information to be handled consistently across them, the terms involved — lawful basis, consent type, special category, pseudonymised, anonymised, retention — must mean the same thing everywhere. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the compliance metadata describing how personal data may be used is understood identically wherever it appears, supporting the broader machinery of research administration. And because stewarding personal data responsibly is genuine contribution, that work can be described within the same framework as any other — the CRediT taxonomy and its full set of contribution roles. The GDPR is not the enemy of research; properly understood, it is the framework within which research that depends on people’s data can be done in a way that keeps faith with them.