Definition · Plain-language

Reference data

Reference data is the set of permitted values and code sets — such as country codes, currency codes and status lists — used to classify and standardise other data across an organisation. It is a closely managed subset of master data.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

What reference data is

Reference data is the controlled set of values used to classify other data — the code lists and lookup values that keep records consistent. Common examples include country codes, currency codes, units of measure, language codes and status or category lists. It changes relatively slowly and is shared widely across systems. Its defining feature is that it constrains what other data may contain: a country field, for instance, should hold only a value from the agreed country code list, not free text.

Reference data and master data

Reference data is generally treated as a subset of master data. Both are stable, widely shared data that describe rather than record transactions, but reference data specifically provides the permitted values that classify other data, whereas master data covers the core business entities themselves such as customers and products. Because reference data underpins consistency everywhere it is used, errors or divergence in it propagate widely, which is why it is governed and version-controlled carefully.

Managing reference data

Good reference-data management means defining each code set authoritatively, basing it on recognised external standards where they exist, and controlling changes through governance so that all systems use the same values. Using established standards — such as international codes for countries, currencies and languages — improves interoperability with partners and avoids reinventing well-defined vocabularies. Reference data is often documented in the data dictionary and distributed centrally so that downstream systems share a single authoritative version.

Key facts

At a glance

Definition: permitted values and code sets used to classify other data
Examples: country codes, currency codes, units, status lists
Relationship: a subset of master data
Key property: constrains the values other data may contain
Best practice: base on recognised external standards
Managed via: governance, version control and the data dictionary

Common misconceptions

What people often get wrong

Often heard: Reference data and master data are completely separate categories.

Actually: Reference data is generally a subset of master data. Both are stable, shared and descriptive; reference data specifically supplies the permitted values that classify other data.

Often heard: Reference data is trivial and does not need governance.

Actually: Because reference data constrains values across many systems, divergence in it propagates widely. It must be defined authoritatively and changed under control, or inconsistencies spread everywhere it is used.

Often heard: It is best to invent your own code lists for each system.

Actually: Recognised external standards exist for countries, currencies, languages and more. Basing reference data on them improves interoperability and avoids the cost and divergence of bespoke, system-specific lists.

Going deeper

Related CASRAI guidance

Master data management →Data dictionary →Data quality →Data governance →Standards dictionary →Plain-language explainers →