Definition · Plain-language
Personally identifiable information (PII)
Personally identifiable information (PII) is any data that can identify a specific individual, either on its own or when combined with other information.
The step most authors miss
Doing CRediT right? Don’t stop at the statement.
A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.
Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.
Direct versus indirect identifiers
PII is usually split into direct and indirect identifiers. Direct identifiers, such as a full name, passport number or email address, point to one person without further information. Indirect (or quasi-) identifiers — postcode, date of birth, job title, gender — may not single out a person alone, but combine to do so. A frequently cited illustration is that a surprising share of a population can be re-identified from postcode, date of birth and sex together. Because identifiability is contextual, whether a given field counts as PII can depend on what other data is available alongside it.
PII in US guidance
PII is principally a United States concept. The National Institute of Standards and Technology defines it in Special Publication 800-122 as information that can distinguish or trace an individual’s identity, alone or combined with other linkable data. US agencies and sector rules (for example, financial and education statutes) each scope PII slightly differently, so the precise boundary varies by context. The unifying idea is identifiability: data is PII when it can reasonably be tied back to a particular human being rather than treated as anonymous.
Why PII matters for research data
Research datasets routinely contain PII in participant records, consent logs and survey responses. Recognising which fields are identifying drives decisions about access controls, de-identification and data sharing. Standards-aligned data management — for example, FAIR principles applied to sensitive data — treats identifiability as a governance question, not just a technical one. Removing or obscuring PII through de-identification or anonymisation is what allows datasets to be shared more openly while protecting the people they describe.
Key facts
At a glance
- Definition: data that can identify a specific individual, alone or combined
- Region: US-originated term (contrast EU “personal data”)
- Primary source: NIST SP 800-122
- Two types: direct identifiers and indirect/quasi-identifiers
- Context-dependent: identifiability shifts with linkable data available
- Mitigation: de-identification and anonymisation reduce identifiability
Common misconceptions
What people often get wrong
Often heard: PII only means obvious identifiers like name and national insurance number.
Actually: PII also includes indirect identifiers — postcode, date of birth, gender and similar attributes — that can identify someone when combined, even though none singles a person out on its own.
Often heard: PII and the GDPR term “personal data” are exactly the same thing.
Actually: They overlap heavily but are not identical. PII is the US-originated term, while “personal data” is the broader EU GDPR concept covering any information relating to an identified or identifiable person.
Often heard: Once you delete the name column, a dataset no longer contains PII.
Actually: Removing direct names is rarely sufficient. Remaining quasi-identifiers can still allow re-identification, which is why formal de-identification or anonymisation methods exist.
Going deeper







