Editorial · CASRAI · Research data infrastructure

Census and Population Data: Sources and Standards

A census is a complete enumeration of a population at a defined moment, the backbone of official population statistics. This guide explains how the data are collected and standardised, the roles of national statistics offices and the UN, and the shift toward register-based methods.

ByCASRAI Editorial Board

Published 18 Jun 2026· 6 minute read

A census is the official, complete enumeration of a population within a defined territory at a defined point in time, recording counts and key characteristics such as age, sex and location. It is the foundational data source for population statistics, supplying the denominators used in rates across health, social and economic research. A census aims for total coverage rather than a sample, which is what distinguishes it from surveys and gives it a unique role in the data infrastructure of a country.

Almost every population-based measure ultimately depends on a credible count of who lives where. When that count is accurate and well documented, the rates built on top of it can be trusted and compared; when it is not, every downstream statistic inherits the error. This is why census methodology and the standards around it receive so much attention.

How population data are collected

Traditional censuses gather data through field enumeration, postal or online self-completion, or a combination of these. National statistics offices design questionnaires, define the reference moment, run extensive field operations, and then process, edit and impute the returns. Editing resolves inconsistent responses, while imputation fills gaps where information is missing, both following documented rules so the adjustments are reproducible. Because total coverage is rarely perfect, a post-enumeration survey is often used to estimate undercount and overcount and to adjust the published figures accordingly.

Two counting concepts shape the results and must be stated explicitly.

Concept	Who is counted	Typical use
De jure	People at their usual place of residence	Resident population, service planning
De facto	People physically present on census night	Presence-based counts, some operational needs

The choice between de jure and de facto counts affects comparability, so metadata must record which basis was used and how groups such as students, visitors and people with multiple residences were treated. This kind of definitional clarity is exactly what the CASRAI dictionary exists to support, and it prevents two figures that look comparable from quietly measuring different populations.

The role of national statistics offices and the UN

National statistics offices, such as the UK Office for National Statistics and the US Census Bureau, design and run censuses within their territories and publish the official population figures. They are responsible for the methodology, the confidentiality protections applied to individual records, and the quality assurance that gives the results authority. International comparability is supported by the United Nations, whose statistical guidance on population and housing censuses sets out recommended concepts, classifications and topics so that national outputs can be aligned and compared across borders.

Standardisation matters because researchers frequently combine population data across regions and years. Shared definitions for residence, age reporting, household composition and geography reduce the risk of comparing inconsistent populations, a recurring theme across data infrastructure work. Without that common framework, a cross-country analysis can be derailed by differences in how each country defined a basic concept rather than by any real difference in the populations themselves.

Geography and small-area estimation

One of the distinctive strengths of a census is that it can produce statistics for very small geographic areas, because it aims to count everyone rather than a sample. This fine geographic detail underpins the allocation of resources, the design of electoral boundaries and the study of local variation in health and social conditions. Researchers rely on consistent geographic standards, stable area boundaries and clear hierarchies of nested areas, so that data can be aggregated upward and compared over time. When boundaries change between censuses, statistics offices publish lookups so that older data can be re-expressed on current geographies. Between censuses, small-area population estimates are produced by updating the last census base with administrative indicators of births, deaths and migration, and these estimates carry more uncertainty the further they are from the census year, which users should keep in mind when interpreting recent small-area rates.

Uses in research

Census data provide the population-at-risk denominators behind most epidemiological and demographic measures. They underpin life expectancy calculations and the standard populations used to compute age-standardised death rates. They also supply the denominators for incidence and prevalence and for a wide range of social indicators. Without an accurate population base, rates derived from event counts cannot be interpreted reliably, because the same number of events can imply very different risks depending on the size and structure of the population it is measured against.

Confidentiality and disclosure control

Because a census records information about identifiable individuals and households, statistics offices apply statistical disclosure control before releasing detailed tables. The risk is that a combination of characteristics in a small geographic area could single out a person even without a name attached. Techniques to manage this include aggregating small areas, rounding or perturbing cell counts, and limiting the level of detail published for small populations. These protections are a legal and ethical obligation, and they shape what census outputs researchers can obtain: highly detailed cross-tabulations for tiny areas may be unavailable or only accessible through secure environments. Documenting which disclosure-control methods were applied is part of responsible metadata, because perturbation can affect very small counts and analysts need to know when figures have been adjusted for confidentiality rather than measured directly.

The move to register-based and administrative data

Several countries are shifting from the decennial field census toward register-based and administrative data approaches. Instead of a single large enumeration, population estimates are assembled from continuously maintained administrative sources such as population, tax and health registers, sometimes combined with targeted surveys to capture characteristics the registers do not hold. The aim is more frequent, lower-burden and potentially more timely estimates, though the approach introduces challenges around data linkage quality, register coverage, and the governance and legal basis for combining administrative sources.

This transition reinforces the need for transparent metadata and documented methods, so that users understand how a published population figure was produced and which sources contributed to it. Researchers describing population sources in their work should follow good reporting practice, including the guidance for authors, and should state clearly whether figures derive from a traditional census, a register-based system, or a hybrid of the two.

Frequently asked questions

What is the difference between a census and a survey?

A census attempts to enumerate the entire population, while a survey collects data from a sample and generalises to the whole. Censuses provide complete-coverage denominators with detailed geography; surveys provide richer or more frequent estimates at lower cost but carry sampling uncertainty.

Why does de jure versus de facto matter?

The two concepts count different groups: usual residents versus people present on census night. Mixing them produces inconsistent population bases, so the counting basis must be recorded as metadata for any valid comparison across places or over time.

What is a register-based census?

It is a method that derives population statistics from continuously maintained administrative registers rather than a single field enumeration. It allows more frequent updates and lower respondent burden, but depends on the coverage, quality and lawful linkage of the underlying administrative sources.

Related editorial in this domain

More on Research data infrastructure

21 Jun 2026

Identifiers for Things, Not Just Papers: IGSN and PIDINST

Persistent identifiers are familiar for articles, datasets, and people, but the physical objects of research, the rock cores, water samples, and the instruments that measure them, have long lacked stable references. The IGSN for samples and the PIDINST work for instruments extend persistent identification to the physical world, making physical research objects findable, citable, and connectable to the data they produce.

21 Jun 2026

Anonymising research data: k-anonymity, differential privacy and the re-identification risk

Sharing data about people without exposing the people themselves is one of the hardest problems in research data management. This article distinguishes anonymisation from pseudonymisation, explains the privacy models researchers actually use, k-anonymity, l-diversity and differential privacy, and introduces the practical guidance from the UK Anonymisation Network (UKAN) and the ICO’s anonymisation code. It also confronts the uncomfortable reality that re-identification is often easier than it looks.

20 Jun 2026

Big Data and the Vs of Data Explained for Research

Big data describes datasets so large, fast or varied that traditional tools cannot handle them. This guide explains the defining Vs, from volume and velocity to veracity and value, how distributed processing copes, and what big data means for research and FAIR data.