Definition · Plain-language

Data quality

Data quality is the degree to which data is fit for its intended purpose, assessed across dimensions such as accuracy, completeness, consistency, timeliness, validity and uniqueness.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

The dimensions of data quality

Data quality is conventionally broken into measurable dimensions. Accuracy is whether data correctly describes the real world; completeness is whether required values are present; consistency is whether data agrees across systems; timeliness is whether it is current enough for its use; validity is whether it conforms to defined formats and rules; and uniqueness is whether the same entity is recorded only once. Assessing data against these dimensions turns a vague notion of good data into something that can be measured and improved.

Fitness for purpose

A defining principle of data quality is that it is relative to use. The same dataset may be perfectly adequate for a high-level trend report yet unfit for individual customer billing. Quality is therefore judged against the requirements of the task, not against a single absolute standard. This is why data quality begins with agreeing the requirements and acceptable thresholds for each use, then measuring how well the data meets them — work that depends on clear definitions from the data dictionary.

Managing data quality

Sustaining data quality is a continuous discipline, not a one-off cleanse. It involves profiling data to find issues, defining rules and thresholds, monitoring quality over time and remediating problems at source where possible. Data stewards typically own quality for their domain, supported by governance policies and quality dashboards. Because errors often originate upstream, lasting improvement comes from fixing root causes in how data is captured, rather than repeatedly cleaning the same downstream defects.

Key facts

At a glance

Definition: the degree to which data is fit for its intended purpose
Dimensions: accuracy, completeness, consistency, timeliness, validity, uniqueness
Key principle: quality is relative to the intended use
Owned by: data stewards, under governance policy
Approach: profile, define rules, monitor, remediate at source
Related standard: ISO 8000 data quality

Common misconceptions

What people often get wrong

Often heard: Data quality just means the data has no obvious errors.

Actually: It is multidimensional. Data can be error-free yet incomplete, out of date or inconsistent across systems. Quality is assessed across several dimensions, including timeliness, consistency and uniqueness.

Often heard: There is one absolute standard of good-quality data.

Actually: Quality is fitness for purpose, so it is relative to use. Data adequate for a trend report may be unfit for billing; thresholds are set against the requirements of each task.

Often heard: A one-off data cleanse fixes data quality permanently.

Actually: Quality degrades as new data arrives and systems change. Lasting improvement requires continuous monitoring and fixing root causes at the point of capture, not repeated downstream cleansing.

Going deeper

Related CASRAI guidance

Data stewardship →Master data management →Data lineage →Data governance →Standards dictionary →FAIR data →