Definition · Plain-language

Data classification

Data classification is the practice of categorising data by its sensitivity or type — such as public, internal, confidential and restricted — so that appropriate security, privacy and handling controls can be applied.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

How data is classified

Data classification assigns each dataset to a category reflecting how sensitive it is and how it must be handled. A common scheme uses tiers such as public, internal, confidential and restricted, with each level carrying defined controls for access, encryption, storage, sharing and retention. Some schemes classify by type as well — for example flagging personal or regulated data. The aim is to make protection proportionate: the most sensitive data receives the strongest controls, while open data is not burdened unnecessarily.

Why classification matters

Without classification, an organisation cannot apply consistent protection, because it does not reliably know which data is sensitive. Classification is what lets access controls, encryption requirements and retention rules be applied automatically and consistently. It is central to regulatory compliance, since data-protection regimes impose specific obligations on personal and sensitive data. It also reduces risk by ensuring restricted information is not accidentally exposed, shared externally or stored in inappropriate locations.

Classification in governance

Classification is a governance control that connects policy to practice. Governance defines the classification scheme and the handling rules for each level; data stewards apply classifications to their assets, often recorded as metadata in the data catalogue so that controls follow the data wherever it goes. Done well, classification is largely automated and consistent; done informally, it becomes inconsistent and unreliable. Keeping classifications current as data and regulations change is itself an ongoing stewardship responsibility.

Key facts

At a glance

Definition: categorising data by sensitivity or type
Common levels: public, internal, confidential, restricted
Purpose: apply proportionate security and handling controls
Drives: access control, encryption, retention, sharing rules
Recorded as: metadata, often in the data catalogue
Key driver: data-protection and regulatory compliance

Common misconceptions

What people often get wrong

Often heard: Data classification is purely an information-security task.

Actually: Security applies the controls, but classification is a governance activity. The business and data stewards decide sensitivity and handling rules; security enforces them based on the assigned level.

Often heard: Once data is classified, the classification never needs revisiting.

Actually: Sensitivity changes as data is combined, regulations evolve and uses shift. Classifications must be reviewed and updated, which is an ongoing stewardship responsibility.

Often heard: More classification levels always mean better protection.

Actually: Overly complex schemes confuse users and get misapplied. A small set of clear, well-understood levels with consistent handling rules usually protects data more reliably than many fine-grained tiers.

Going deeper

Related CASRAI guidance

Data governance →Data stewardship →Data catalog →Data governance framework →Standards dictionary →Plain-language explainers →