Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI
Dictionary termTrack AStablev2026.2

Synthetic data

Data generated by a model or algorithm rather than collected from real-world observations or experiments, designed to mimic the statistical structure of real data for purposes such as augmentation, privacy-preservation, or model training.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    GAN-generated chest X-rays used to augment a limited real training set

  • Is an instance

    LLM-generated patient narratives used to test a triage classifier

Counter-examples

Looks similar, but isn't

  • Not an instance

    Bootstrap resamples of an empirical dataset are not synthetic data in this sense (they are resamples of real observations)

Editorial commentary

Synthetic data must be disclosed as such in any analysis where the distinction matters (e.g., training-set composition, statistical inference, claims of empirical support). The generating model, its parameters, and the validation procedure used to demonstrate fidelity to real data should be reported.

References

  • Jordon et al. 2022 ‘Synthetic Data — what, why and how?’ Royal Society
  • OECD Synthetic Data Guidance (2023)

Also known as

Generated data · Simulated data

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Synthetic data"
      vocab-term-identifier="https://casrai.org/dictionary/term/synthetic-data" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Synthetic data",
  "identifier": "https://casrai.org/dictionary/term/synthetic-data",
  "description": "Data generated by a model or algorithm rather than collected from real-world observations or experiments, designed to mimic the statistical structure of real data for purposes such as augmentation, privacy-preservation, or model training.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/generative-ai-use-and-disclosure/",
  "url": "https://casrai.org/dictionary/term/synthetic-data",
  "sameAs": [
    "Generated data",
    "Simulated data"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}
LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →