Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Definition · Plain-language

De-identification

De-identification is the process of removing or obscuring identifiers from a dataset so that it no longer identifies the individuals it describes.

CASRAI research-methods explainer — De-identification

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

The two HIPAA methods

US HIPAA describes two routes to de-identification. The Safe Harbor method requires removing 18 specified categories of identifiers — names, geographic detail below state level, most dates, contact details, and various account and device numbers — and having no actual knowledge that the remainder could identify someone. Expert Determination instead relies on a person with appropriate statistical or scientific expertise certifying that the risk of re-identification is very small. Both aim at the same outcome: data that no longer reasonably identifies individuals.

De-identification, anonymisation and pseudonymisation

These related terms are often confused. De-identification is the broad process of reducing identifiability. Anonymisation generally implies that re-identification is no longer reasonably possible, while pseudonymisation replaces identifiers with a key that can, with separate information, be reversed. De-identified data may sit anywhere on this spectrum depending on the method and the residual risk. The distinction matters because different regimes attach different obligations to data depending on how strongly it has been protected.

Why de-identification supports data sharing

De-identification is central to responsible research-data sharing. By lowering identifiability, it lets teams publish or reuse datasets while limiting privacy risk to participants, supporting open and FAIR data practices. It is rarely a one-off step: residual re-identification risk depends on what other data exists, so de-identification is best understood as risk management rather than a guarantee. Documenting the method used helps downstream users understand how the data may responsibly be handled.

Key facts

At a glance

  • Definition: removing or obscuring identifiers so data no longer identifies a person
  • HIPAA methods: Safe Harbor and Expert Determination
  • Safe Harbor: removes 18 identifier categories
  • Expert Determination: certified very small re-identification risk
  • Related: anonymisation and pseudonymisation sit on the same spectrum
  • Purpose: enables responsible data sharing and reuse

Common misconceptions

What people often get wrong

Often heard: De-identification and anonymisation mean exactly the same thing.

Actually: De-identification is the broad process of reducing identifiability; anonymisation specifically implies re-identification is no longer reasonably possible. De-identified data may still carry some residual risk depending on method.

Often heard: Safe Harbor just means deleting the name and address.

Actually: Safe Harbor under HIPAA requires removing 18 specified identifier categories — including most dates, geographic detail and device or account numbers — plus having no actual knowledge that the rest could identify someone.

Often heard: Once data is de-identified, re-identification is impossible.

Actually: De-identification reduces risk rather than eliminating it. Residual re-identification can remain possible when other linkable data exists, which is why methods are framed around a very small risk, not zero.

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →