Definition · Plain-language
Data classification
Data classification is the practice of categorising data by its sensitivity or type — such as public, internal, confidential and restricted — so that appropriate security, privacy and handling controls can be applied.
The step most authors miss
Doing CRediT right? Don’t stop at the statement.
A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.
Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.
How data is classified
Data classification assigns each dataset to a category reflecting how sensitive it is and how it must be handled. A common scheme uses tiers such as public, internal, confidential and restricted, with each level carrying defined controls for access, encryption, storage, sharing and retention. Some schemes classify by type as well — for example flagging personal or regulated data. The aim is to make protection proportionate: the most sensitive data receives the strongest controls, while open data is not burdened unnecessarily.
Why classification matters
Without classification, an organisation cannot apply consistent protection, because it does not reliably know which data is sensitive. Classification is what lets access controls, encryption requirements and retention rules be applied automatically and consistently. It is central to regulatory compliance, since data-protection regimes impose specific obligations on personal and sensitive data. It also reduces risk by ensuring restricted information is not accidentally exposed, shared externally or stored in inappropriate locations.
Classification in governance
Classification is a governance control that connects policy to practice. Governance defines the classification scheme and the handling rules for each level; data stewards apply classifications to their assets, often recorded as metadata in the data catalogue so that controls follow the data wherever it goes. Done well, classification is largely automated and consistent; done informally, it becomes inconsistent and unreliable. Keeping classifications current as data and regulations change is itself an ongoing stewardship responsibility.
Key facts
At a glance
- Definition: categorising data by sensitivity or type
- Common levels: public, internal, confidential, restricted
- Purpose: apply proportionate security and handling controls
- Drives: access control, encryption, retention, sharing rules
- Recorded as: metadata, often in the data catalogue
- Key driver: data-protection and regulatory compliance
Common misconceptions
What people often get wrong
Often heard: Data classification is purely an information-security task.
Actually: Security applies the controls, but classification is a governance activity. The business and data stewards decide sensitivity and handling rules; security enforces them based on the assigned level.
Often heard: Once data is classified, the classification never needs revisiting.
Actually: Sensitivity changes as data is combined, regulations evolve and uses shift. Classifications must be reviewed and updated, which is an ongoing stewardship responsibility.
Often heard: More classification levels always mean better protection.
Actually: Overly complex schemes confuse users and get misapplied. A small set of clear, well-understood levels with consistent handling rules usually protects data more reliably than many fine-grained tiers.
Going deeper







