Skip to main content
v2026.1714 entries · CC-BY 4.0
Dictionary termTrack AStablev2026.2

Synthetic data

Data generated by a model or algorithm rather than collected from real-world observations or experiments, designed to mimic the statistical structure of real data for purposes such as augmentation, privacy-preservation, or model training.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    GAN-generated chest X-rays used to augment a limited real training set

  • Is an instance

    LLM-generated patient narratives used to test a triage classifier

Counter-examples

Looks similar, but isn't

  • Not an instance

    Bootstrap resamples of an empirical dataset are not synthetic data in this sense (they are resamples of real observations)

Editorial commentary

Synthetic data must be disclosed as such in any analysis where the distinction matters (e.g., training-set composition, statistical inference, claims of empirical support). The generating model, its parameters, and the validation procedure used to demonstrate fidelity to real data should be reported.

References

  • Jordon et al. 2022 ‘Synthetic Data — what, why and how?’ Royal Society
  • OECD Synthetic Data Guidance (2023)

Also known as

Generated data · Simulated data

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Synthetic data"
      vocab-term-identifier="https://casrai.org/dictionary/term/synthetic-data" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Synthetic data",
  "identifier": "https://casrai.org/dictionary/term/synthetic-data",
  "description": "Data generated by a model or algorithm rather than collected from real-world observations or experiments, designed to mimic the statistical structure of real data for purposes such as augmentation, privacy-preservation, or model training.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/generative-ai-use-and-disclosure/",
  "url": "https://casrai.org/dictionary/term/synthetic-data",
  "sameAs": [
    "Generated data",
    "Simulated data"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}

Adopted by research universities worldwide

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • Massachusetts Institute of Technology logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo

View CASRAI adoption →