CASRAI Dictionary

Tag: data protection

GDPR Enforcement 2025: How DPAs Applied the Rules
The EU General Data Protection Regulation (GDPR) has been in force since 2018, and its enforcement is carried out by independent national data-protection authorities (DPAs) across the EU and EEA, coordinated through the European Data Protection Board (EDPB). This article offers a neutral, aggregate recap of the themes that characterised GDPR enforcement through 2025. It deliberately discusses patterns and principles rather than naming particular organisations or framing specific outcomes as accusations, and it is not legal advice.

How GDPR enforcement is structured

GDPR is enforced primarily by national DPAs, each supervising organisations within its jurisdiction. For cross-border processing, the regulation uses a one-stop-shop mechanism: a lead supervisory authority, usually where the organisation has its main establishment, coordinates with other concerned authorities. Where authorities disagree, the EDPB can issue binding decisions to ensure consistent application. For the underlying framework, see our overview of the GDPR.

This structure matters because it shapes how enforcement unfolds: many significant cross-border matters involve coordination between a lead authority and others, and EDPB consistency mechanisms help align interpretation across countries.

Recurring themes in enforcement

Across the body of enforcement activity, several themes recur as areas where authorities have focused. Described in aggregate, these include:
- Lawful basis and transparency: whether organisations correctly identify and communicate the legal basis for processing, and whether privacy information is clear and accessible.
- Consent: whether consent, where relied upon, is freely given, specific, informed and unambiguous, and as easy to withdraw as to give.
- Data-subject rights: how organisations handle requests for access, erasure, rectification and objection within required timeframes.
- Security and breach handling: whether appropriate technical and organisational measures are in place, and whether breaches are notified appropriately. See our explainer on data breaches.
- International transfers: the safeguards applied when personal data move outside the EEA.
These themes reflect the GDPR’s core principles — lawfulness, fairness and transparency; purpose limitation; data minimisation; accuracy; storage limitation; integrity and confidentiality; and accountability — and enforcement activity tends to cluster around them.

The role of the EDPB and consistency

A defining feature of recent years has been the EDPB’s role in promoting consistent interpretation. Through guidelines, opinions and, where necessary, binding decisions in dispute-resolution procedures, the Board has helped align how authorities approach questions such as the calculation of administrative fines and the assessment of cross-border cases. The EDPB has, for example, issued guidance intended to harmonise the methodology authorities use when determining the level of fines, supporting a more consistent approach across the bloc.

This coordination is significant for organisations operating in multiple member states, because it reduces — though does not eliminate — divergence in how the same rules are applied in different countries.

Tools beyond fines

Administrative fines attract the most attention, but DPAs have a wider toolkit. Authorities can issue warnings and reprimands, order an organisation to bring processing into compliance, impose temporary or definitive limitations on processing (including bans), and order the rectification or erasure of data. In many matters, corrective orders — requiring changes to how data are handled — are as consequential as monetary penalties, because they directly alter business practices. Describing enforcement only in terms of fine totals therefore understates the range of regulatory action.

What organisations took from it

In aggregate, the enforcement picture through 2025 reinforced the importance of demonstrable accountability: maintaining records of processing, conducting data-protection impact assessments where required, ensuring a valid lawful basis, honouring data-subject rights promptly, and being able to evidence appropriate security measures. The accountability principle — being able to show compliance, not merely assert it — runs through the regulation and through how authorities assess organisations.

For those seeking to understand the rules themselves rather than commentary on outcomes, the authoritative sources are the regulation’s own text, national DPA guidance, and EDPB materials published at edpb.europa.eu. Neutral definitions of related privacy terms are collected in our standards dictionary.

Reading enforcement data carefully

A final neutral note concerns how enforcement statistics should be read. Aggregate figures — numbers of decisions, total penalty amounts, or counts of complaints — circulate widely, but they require context. A high total in one period may reflect a small number of large matters rather than a broad pattern; a low total may reflect a focus on corrective orders rather than fines. Differences between member states can stem from caseload, the nature of the organisations established in a jurisdiction, or procedural timing rather than from differing strictness. For this reason, responsible analysis treats enforcement data as one input among several and avoids inferring conclusions about any individual organisation from aggregate trends. The constructive takeaway for organisations is forward-looking: align practices with the regulation’s principles and maintain the documentation needed to demonstrate that alignment.

The accountability principle in focus

If a single idea characterises how authorities approach assessment, it is accountability. The GDPR does not merely require organisations to comply; it requires them to be able to demonstrate compliance. In practice this means maintaining a record of processing activities, documenting the lawful basis for each processing purpose, conducting and recording data-protection impact assessments for higher-risk processing, and keeping evidence of the technical and organisational measures in place. When authorities examine an organisation, the ability to produce this documentation is often as important as the underlying practices themselves.

Accountability also shapes governance. Many organisations are required to designate a data-protection officer, and the regulation encourages structured governance such as data-protection-by-design and by-default, where privacy considerations are built into systems from the outset. These structural expectations recur across enforcement themes because they underpin every other obligation — a lawful basis, honoured rights and adequate security all depend on having the governance to manage them.

A neutral bottom line

GDPR enforcement in 2025 is best understood not through individual headline cases but through the patterns: sustained attention to lawful basis, transparency, consent, data-subject rights, security and international transfers; growing consistency driven by the EDPB; and a corrective toolkit that extends well beyond fines. The regulation’s principles remained the constant reference point against which authorities assessed organisations.
June 22, 2026
Anonymising research data: k-anonymity, differential privacy and the re-identification risk
Much of the most valuable research data is also the most sensitive: health records, survey responses, administrative data about individuals. Sharing it advances science, but sharing it carelessly can expose the very people it describes. The discipline that sits between these two goods, anonymisation, is more technical and more fragile than the word suggests. Done well, it allows safe reuse; done casually, it offers a false reassurance that data is protected when in fact individuals can be picked back out.

Anonymisation is not pseudonymisation

The first distinction is legal and practical. Pseudonymisation replaces direct identifiers, such as names, with a key or token, but the link back to the individual still exists, held separately. Under data-protection law, including the UK GDPR, pseudonymised data remains personal data, because re-identification is possible by anyone with access to the key. It is a valuable security measure, but it does not remove a record from the scope of data-protection obligations.

True anonymisation aims to render data no longer personal at all, such that an individual cannot be identified by any party reasonably likely to try, taking account of other information that may be available. If genuinely achieved, anonymised data falls outside the core of data-protection law. The catch is in the words reasonably likely: anonymisation is not a binary state achieved by deleting a name, but a judgement about residual risk in a specific context, which is why it is hard to get right and easy to overstate.

The privacy models

Researchers draw on a small family of formal models to reason about that residual risk.
- k-anonymity. A dataset is k-anonymous if every record is indistinguishable from at least k minus one others with respect to the quasi-identifiers, the attributes such as age, postcode or occupation that, in combination, could single someone out. Achieving it usually means generalising values, for example reporting an age band instead of an exact age, or suppressing rare values. k-anonymity guards against picking out a single individual, but it has a known weakness: if all the records in a group share the same sensitive value, an attacker learns that value without needing to identify the specific person.
- l-diversity. This extends k-anonymity to address that weakness by requiring that each group of indistinguishable records contains a diversity of sensitive values, so that membership of a group does not reveal a sensitive attribute. It is a refinement aimed squarely at the homogeneity problem that k-anonymity alone does not solve.
- Differential privacy. A fundamentally different and more rigorous approach, differential privacy adds carefully calibrated statistical noise to results or data so that the presence or absence of any single individual makes almost no difference to what is released. Its formal guarantee is about the mechanism, not just the output: it bounds how much can be learned about any one person regardless of what auxiliary information an attacker holds. This makes it powerful for releasing aggregate statistics, though the added noise trades some accuracy for that protection.
These models are complementary rather than competing. k-anonymity and l-diversity reason about the structure of a released microdata table; differential privacy reasons about the process that generates released figures. Choosing among them depends on what is being shared and to whom.

UKAN and the ICO code

Formal models need to be translated into practice, and in the United Kingdom two sources do that work. The UK Anonymisation Network (UKAN) provides practical guidance, training and a structured way of thinking about anonymisation as a context-dependent risk-management activity rather than a one-off technical fix. Its framework stresses that the same data can be safe to share in one environment and unsafe in another, so decisions must consider the data, the recipients and the controls around access together.

The Information Commissioner’s Office (ICO), the UK data-protection regulator, has likewise produced guidance on anonymisation and pseudonymisation that explains the legal status of each and what organisations must consider. The throughline of both is the same: anonymisation is a spectrum of risk, judged against who might reasonably try to re-identify and what else they could bring to bear, not a switch that is simply flipped to off.

The re-identification risk

The reason all this caution is warranted is that re-identification has repeatedly proved easier than data holders expected. Datasets stripped of obvious identifiers have been re-identified by linking them to other available information, because the combination of a few seemingly innocuous attributes, a date, a location, a rare characteristic, can be unique to one person. This is the linkage attack, and it is why quasi-identifiers, not just direct identifiers, must be managed. The lesson is that data does not become safe simply because the names are gone; safety depends on how unique the remaining combinations are and on what an adversary could plausibly match them against.

For researchers, the practical implications are clear. Treat anonymisation as a risk assessment specific to the data and the sharing context, not a checkbox. Prefer formally grounded methods, choosing k-anonymity and l-diversity for microdata releases and differential privacy where strong, attacker-agnostic guarantees on aggregate outputs are needed. Combine technical measures with controls on who can access the data and under what terms, in the spirit of safe-environment approaches. And remember that pseudonymised data remains personal data with all the obligations that entails. Handled this way, sensitive data can be shared responsibly, supporting the reuse goals of FAIR data without trading away the privacy of the individuals whose lives the data describes. Consistent definitions, of the kind a CASRAI data dictionary promotes, help ensure that everyone in the chain means the same thing by anonymised, pseudonymised and identifiable.
June 21, 2026
Sensitive and controlled-access data: FAIR for data that cannot be fully open

The push for open research data has been one of the defining movements in scholarly practice, and rightly so: openly available data is easier to verify, reuse and build upon. But an unqualified call to make all data open runs into an immovable obstacle. A great deal of research data is sensitive — patient records, genetic information, data about vulnerable people, commercially confidential material, data whose release could cause harm — and such data cannot simply be posted on the open web without breaching the law, betraying participants’ trust, or endangering people. The challenge is not to choose between openness and protection but to honour both: to make sensitive data as accessible as it responsibly can be while keeping it as protected as it must be. This article looks at how that balance is struck, drawing on the compliance and regulatory domain of the CASRAI Dictionary.

As open as possible, as closed as necessary

The principle that has come to govern this territory is captured in a single phrase: data should be “as open as possible, as closed as necessary”. The phrase does real work. It establishes openness as the default and the goal — the burden falls on reasons to restrict, not on reasons to share. But it also acknowledges, plainly, that necessity sometimes requires closure, and that protecting people and honouring legal and ethical obligations is not a failure of openness but a condition of doing research responsibly. The aim, then, is not a binary of open versus closed but a spectrum of access arrangements, each calibrated to what a particular dataset requires. Sensitive data does not fall off the map of good data practice; it occupies a different, carefully governed part of it.

FAIR does not mean open

A common misconception is that the FAIR principles — Findable, Accessible, Interoperable, Reusable — are a synonym for “open”. They are not, and the distinction matters most for sensitive data. FAIR is about good stewardship and discoverability, not unconditional availability. Sensitive data can and should be made findable: its existence, described by rich metadata, can be advertised openly even when the data itself is restricted, so that researchers know it exists and could request it. It can be made accessible in the FAIR sense — meaning that the procedure for obtaining access is clearly defined and the conditions are transparent — even when access is granted only to approved requesters under controlled conditions. And it can be made interoperable and reusable through standardised description and clear licensing. The key move is to separate the metadata, which can be fully open, from the data, whose access is controlled. Open metadata over protected data is the architecture that lets sensitive data participate in the FAIR ecosystem without being exposed.

Controlled access and data-access committees

The mechanism that delivers this is controlled access. Rather than downloading the data freely, a researcher applies for it, stating who they are, what they intend to do, and agreeing to conditions on use. The application is assessed — often by a data-access committee, a body charged with deciding whether a proposed use is legitimate, ethical, and consistent with the consent under which the data were collected. Approved access typically comes with safeguards: data-use agreements that bind the recipient, restrictions on re-identification and onward sharing, and increasingly the requirement to analyse the data within a secure environment rather than taking a copy away. These arrangements let valuable data be reused while keeping the people behind it protected and the original consent respected. The committee and the agreement are not bureaucratic obstacles for their own sake; they are the means by which trust is maintained between research and the people whose data make it possible.

Synthetic data as a bridge

One increasingly important technique deserves attention: synthetic data. Synthetic data is artificially generated to resemble a real dataset’s structure and statistical properties without containing any real individual’s information. Because it contains no real records, it can often be shared far more openly than the sensitive data it mirrors. Its value is practical: researchers can develop and test their analysis code against synthetic data, others can understand a dataset’s shape before applying for the real thing, and methods can be demonstrated without exposing anyone. Synthetic data is not a perfect substitute — conclusions must ultimately be drawn from real data, and a poorly generated synthetic set can mislead — but as a bridge between the need to share and the duty to protect, it is a genuinely useful addition to the toolkit.

The role of secure infrastructure

Making controlled access work at scale depends on the infrastructure that supports it: trusted repositories that hold sensitive data securely, secure analysis environments where data can be worked on without being copied out, and the identifier and metadata systems that let restricted data be described openly and cited when used. This is the territory of the data infrastructure domain, and it is what turns the principle of controlled access from an aspiration into a practical reality. Without secure places to hold the data and clear ways to describe it, the careful balance of access and protection cannot be maintained.

A consistent vocabulary for access and protection

For all of this to function across institutions, funders and repositories, the terms involved must mean the same thing everywhere. Access conditions, consent categories, licence terms and protection requirements have to be described consistently, or a dataset marked as controlled-access in one system will be misunderstood in another — with real consequences when the data are sensitive. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the metadata describing how sensitive data may be accessed and reused is understood identically wherever it appears. And because reusing controlled-access data is genuine, recognisable contribution, the work of curating and stewarding it can be described using the same framework as any other — the CRediT taxonomy and its full set of contribution roles. Sensitive data is not a problem to be hidden but a resource to be governed; done well, governance is what lets research honour both openness and the people it serves.

June 11, 2026