Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Definition · Plain-language

Personally identifiable information (PII)

Personally identifiable information (PII) is any data that can identify a specific individual, either on its own or when combined with other information.

CASRAI research-methods explainer — Personally identifiable information (PII)

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

Direct versus indirect identifiers

PII is usually split into direct and indirect identifiers. Direct identifiers, such as a full name, passport number or email address, point to one person without further information. Indirect (or quasi-) identifiers — postcode, date of birth, job title, gender — may not single out a person alone, but combine to do so. A frequently cited illustration is that a surprising share of a population can be re-identified from postcode, date of birth and sex together. Because identifiability is contextual, whether a given field counts as PII can depend on what other data is available alongside it.

PII in US guidance

PII is principally a United States concept. The National Institute of Standards and Technology defines it in Special Publication 800-122 as information that can distinguish or trace an individual’s identity, alone or combined with other linkable data. US agencies and sector rules (for example, financial and education statutes) each scope PII slightly differently, so the precise boundary varies by context. The unifying idea is identifiability: data is PII when it can reasonably be tied back to a particular human being rather than treated as anonymous.

Why PII matters for research data

Research datasets routinely contain PII in participant records, consent logs and survey responses. Recognising which fields are identifying drives decisions about access controls, de-identification and data sharing. Standards-aligned data management — for example, FAIR principles applied to sensitive data — treats identifiability as a governance question, not just a technical one. Removing or obscuring PII through de-identification or anonymisation is what allows datasets to be shared more openly while protecting the people they describe.

Key facts

At a glance

  • Definition: data that can identify a specific individual, alone or combined
  • Region: US-originated term (contrast EU “personal data”)
  • Primary source: NIST SP 800-122
  • Two types: direct identifiers and indirect/quasi-identifiers
  • Context-dependent: identifiability shifts with linkable data available
  • Mitigation: de-identification and anonymisation reduce identifiability

Common misconceptions

What people often get wrong

Often heard: PII only means obvious identifiers like name and national insurance number.

Actually: PII also includes indirect identifiers — postcode, date of birth, gender and similar attributes — that can identify someone when combined, even though none singles a person out on its own.

Often heard: PII and the GDPR term “personal data” are exactly the same thing.

Actually: They overlap heavily but are not identical. PII is the US-originated term, while “personal data” is the broader EU GDPR concept covering any information relating to an identified or identifiable person.

Often heard: Once you delete the name column, a dataset no longer contains PII.

Actually: Removing direct names is rarely sufficient. Remaining quasi-identifiers can still allow re-identification, which is why formal de-identification or anonymisation methods exist.

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →