Tag: demography

  • Census and Population Data: Sources and Standards

    A census is the official, complete enumeration of a population within a defined territory at a defined point in time, recording counts and key characteristics such as age, sex and location. It is the foundational data source for population statistics, supplying the denominators used in rates across health, social and economic research. A census aims for total coverage rather than a sample, which is what distinguishes it from surveys and gives it a unique role in the data infrastructure of a country.

    Almost every population-based measure ultimately depends on a credible count of who lives where. When that count is accurate and well documented, the rates built on top of it can be trusted and compared; when it is not, every downstream statistic inherits the error. This is why census methodology and the standards around it receive so much attention.

    How population data are collected

    Traditional censuses gather data through field enumeration, postal or online self-completion, or a combination of these. National statistics offices design questionnaires, define the reference moment, run extensive field operations, and then process, edit and impute the returns. Editing resolves inconsistent responses, while imputation fills gaps where information is missing, both following documented rules so the adjustments are reproducible. Because total coverage is rarely perfect, a post-enumeration survey is often used to estimate undercount and overcount and to adjust the published figures accordingly.

    Two counting concepts shape the results and must be stated explicitly.

    Concept Who is counted Typical use
    De jure People at their usual place of residence Resident population, service planning
    De facto People physically present on census night Presence-based counts, some operational needs

    The choice between de jure and de facto counts affects comparability, so metadata must record which basis was used and how groups such as students, visitors and people with multiple residences were treated. This kind of definitional clarity is exactly what the CASRAI dictionary exists to support, and it prevents two figures that look comparable from quietly measuring different populations.

    The role of national statistics offices and the UN

    National statistics offices, such as the UK Office for National Statistics and the US Census Bureau, design and run censuses within their territories and publish the official population figures. They are responsible for the methodology, the confidentiality protections applied to individual records, and the quality assurance that gives the results authority. International comparability is supported by the United Nations, whose statistical guidance on population and housing censuses sets out recommended concepts, classifications and topics so that national outputs can be aligned and compared across borders.

    Standardisation matters because researchers frequently combine population data across regions and years. Shared definitions for residence, age reporting, household composition and geography reduce the risk of comparing inconsistent populations, a recurring theme across data infrastructure work. Without that common framework, a cross-country analysis can be derailed by differences in how each country defined a basic concept rather than by any real difference in the populations themselves.

    Geography and small-area estimation

    One of the distinctive strengths of a census is that it can produce statistics for very small geographic areas, because it aims to count everyone rather than a sample. This fine geographic detail underpins the allocation of resources, the design of electoral boundaries and the study of local variation in health and social conditions. Researchers rely on consistent geographic standards, stable area boundaries and clear hierarchies of nested areas, so that data can be aggregated upward and compared over time. When boundaries change between censuses, statistics offices publish lookups so that older data can be re-expressed on current geographies. Between censuses, small-area population estimates are produced by updating the last census base with administrative indicators of births, deaths and migration, and these estimates carry more uncertainty the further they are from the census year, which users should keep in mind when interpreting recent small-area rates.

    Uses in research

    Census data provide the population-at-risk denominators behind most epidemiological and demographic measures. They underpin life expectancy calculations and the standard populations used to compute age-standardised death rates. They also supply the denominators for incidence and prevalence and for a wide range of social indicators. Without an accurate population base, rates derived from event counts cannot be interpreted reliably, because the same number of events can imply very different risks depending on the size and structure of the population it is measured against.

    Confidentiality and disclosure control

    Because a census records information about identifiable individuals and households, statistics offices apply statistical disclosure control before releasing detailed tables. The risk is that a combination of characteristics in a small geographic area could single out a person even without a name attached. Techniques to manage this include aggregating small areas, rounding or perturbing cell counts, and limiting the level of detail published for small populations. These protections are a legal and ethical obligation, and they shape what census outputs researchers can obtain: highly detailed cross-tabulations for tiny areas may be unavailable or only accessible through secure environments. Documenting which disclosure-control methods were applied is part of responsible metadata, because perturbation can affect very small counts and analysts need to know when figures have been adjusted for confidentiality rather than measured directly.

    The move to register-based and administrative data

    Several countries are shifting from the decennial field census toward register-based and administrative data approaches. Instead of a single large enumeration, population estimates are assembled from continuously maintained administrative sources such as population, tax and health registers, sometimes combined with targeted surveys to capture characteristics the registers do not hold. The aim is more frequent, lower-burden and potentially more timely estimates, though the approach introduces challenges around data linkage quality, register coverage, and the governance and legal basis for combining administrative sources.

    This transition reinforces the need for transparent metadata and documented methods, so that users understand how a published population figure was produced and which sources contributed to it. Researchers describing population sources in their work should follow good reporting practice, including the guidance for authors, and should state clearly whether figures derive from a traditional census, a register-based system, or a hybrid of the two.

    Frequently asked questions

    What is the difference between a census and a survey?

    A census attempts to enumerate the entire population, while a survey collects data from a sample and generalises to the whole. Censuses provide complete-coverage denominators with detailed geography; surveys provide richer or more frequent estimates at lower cost but carry sampling uncertainty.

    Why does de jure versus de facto matter?

    The two concepts count different groups: usual residents versus people present on census night. Mixing them produces inconsistent population bases, so the counting basis must be recorded as metadata for any valid comparison across places or over time.

    What is a register-based census?

    It is a method that derives population statistics from continuously maintained administrative registers rather than a single field enumeration. It allows more frequent updates and lower respondent burden, but depends on the coverage, quality and lawful linkage of the underlying administrative sources.

  • Life Expectancy: How It Is Calculated and the Data Behind It

    Life expectancy is a summary measure of mortality that expresses the average number of additional years a person of a given age could expect to live if current age-specific death rates remained unchanged. It is calculated from a life table, a statistical model that converts observed mortality rates into survival probabilities across the lifespan. Life expectancy is a methodological construct describing a hypothetical population, not a prediction for any individual.

    Because it condenses a population’s entire mortality experience into a single, comparable number, life expectancy is one of the most widely cited indicators in demography and public health. Understanding it correctly means understanding the life table that produces it and the data that feed that table.

    How a life table works

    A life table is the engine behind life expectancy. It takes age-specific mortality rates for a population and translates them into a hypothetical cohort, conventionally 100,000 people, that is followed from birth to death. At each age interval the table records the probability of dying, the number surviving, the number of deaths and the years of life lived within the interval. Summing the years lived above each age and dividing by the survivors at that age yields life expectancy for that age.

    The key inputs are age-specific death rates, usually derived by dividing recorded deaths in an age band by the corresponding population at risk. These observed rates are converted into probabilities of dying for each interval, and those probabilities drive the survivorship column. Because the method aggregates the entire age structure, life expectancy at birth is sensitive to mortality at every age, not only old age; a fall in deaths among the very young, for example, can raise life expectancy at birth substantially. Statistical agencies publish the full life table so the calculation is transparent and reproducible, allowing analysts to inspect every column rather than trusting a single headline figure. The intervals are usually single years of age or five-year age groups, and an open-ended final interval covers the oldest ages, where special methods are used because deaths there are few and the population is small. Abridged life tables, which use grouped age intervals, are common when detailed single-year data are unavailable, and they yield very similar results to full tables while being simpler to compute and publish.

    Period versus cohort life expectancy

    The most important distinction is between two ways of assembling the death rates that feed the table.

    Measure Death rates used Interpretation
    Period life expectancy Age-specific rates observed in a single reference period A snapshot assuming current mortality conditions hold for life
    Cohort life expectancy Rates a real birth cohort actually experiences as it ages Reflects mortality change over the cohort’s lifetime, partly projected

    Period life expectancy is the figure most national statistics offices report routinely, because it requires only recent data: it asks what would happen to a hypothetical group exposed for life to the death rates of a single period. Cohort life expectancy instead tracks an actual generation, using the rates each age group truly experiences as the years pass. It therefore accounts for expected future improvements or deteriorations in mortality, but it depends on projections for the years not yet lived, which introduces assumptions. The two measures can differ substantially, especially when mortality is changing quickly, which is one reason cross-source comparisons require care and explicit labelling of which measure is being used.

    Data sources

    Life-table construction draws on two foundational data streams. The numerator is mortality records from civil or vital registration systems, ideally capturing every death with age and, where available, cause. The denominator is the population at risk, estimated from a census or a continuously maintained population register and updated for births, deaths and migration between census points.

    The quality of life expectancy therefore depends on the completeness of death registration and the accuracy of population estimates. Where registration is incomplete, or where population estimates drift between census years, statisticians apply documented adjustment methods rather than leaving the gaps unaddressed. This makes data provenance central: a life expectancy figure is only as trustworthy as the death counts and population denominators behind it. Treating data lineage as first-class metadata is consistent with the principles in the CASRAI dictionary and with broader work on data infrastructure.

    Healthy life expectancy and related measures

    Life expectancy is sometimes extended into measures that combine length of life with quality of life. Healthy life expectancy partitions the years a population can expect to live into those spent in good health and those spent in poor health, by combining the life table with survey data on health status. This requires an additional data source describing the prevalence of ill health by age, layered onto the same mortality-based life table. The result answers a different question from life expectancy alone: not only how long people live, but how much of that time is lived in good health. Because such measures depend on subjective or survey-based definitions of health, the definitions used must be documented even more carefully than for the underlying life table, and they are not directly comparable across populations that defined health differently.

    Common misinterpretations

    Several misreadings recur and are worth naming explicitly. Period life expectancy at birth is not a forecast that today’s newborns will live exactly that long; it describes a hypothetical cohort under fixed current rates that no real generation will ever experience unchanged. A rise or fall between periods reflects changes in age-specific mortality across the population, not a guaranteed individual outcome. Historical life expectancy figures that look low are also often misread, because they were heavily pulled down by mortality in infancy and early childhood rather than implying that few people reached older ages.

    Comparing life expectancy across populations requires consistent age-specific inputs, because differences in data collection, registration completeness or population estimation can produce artefactual gaps that have nothing to do with real differences in survival. Because life expectancy depends on the population at risk, it connects directly to census and population data sources, and because it is built from mortality rates it is closely related to death rate and mortality statistics. Researchers reporting derived measures should document their methods and data sources clearly, as encouraged by guidance for authors, so that readers can see exactly which life table and which assumptions produced a given figure.

    Frequently asked questions

    Is life expectancy a prediction for an individual?

    No. It is a population-level average produced by a life table under stated assumptions. It does not forecast how long any specific person will live, because individual longevity depends on many factors the model does not represent, and because the period version assumes rates that will not stay constant.

    Why do period and cohort figures differ?

    Period life expectancy assumes today’s death rates persist unchanged, while cohort life expectancy follows a real generation and incorporates expected future changes in mortality. When mortality is improving over time, cohort figures are typically higher than period figures for the same reference year.

    What data are needed to calculate life expectancy?

    Two streams are required: age-specific mortality records from vital registration, and population-at-risk estimates from a census or population register. The ratio of deaths to population at each age produces the death rates that drive the life table, so both streams must be complete and consistent.