Skip to main content
v2026.1714 entries · CC-BY 4.0
Dictionary termTrack CStablev2026.2

Datasheet for datasets

A structured document accompanying a machine-learning dataset that records its motivation, composition, collection process, pre-processing, intended uses, distribution, and maintenance, modelled on electronic-component datasheets.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    A face-recognition benchmark distributed with a datasheet listing demographic composition, consent procedures, and recommended uses.

  • Is an instance

    An NLP corpus accompanied by a datasheet documenting source domains and crawling rules.

Counter-examples

Looks similar, but isn't

  • Not an instance

    A dataset README containing only file format and column descriptions.

  • Not an instance

    A model card (covers the model, not the dataset).

Editorial commentary

Gebru et al. (2021) proposed datasheets to make dataset provenance and limitations visible to downstream model builders. Topics covered include consent and licensing of subjects, sampling and labelling procedures, demographic composition, known biases, and recommended/cautioned uses. Datasheets are complementary to model cards.

References

  • Gebru et al., 'Datasheets for datasets' (Communications of the ACM, 2021).

Also known as

dataset datasheet

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Datasheet for datasets"
      vocab-term-identifier="https://casrai.org/dictionary/term/datasheet-for-datasets" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Datasheet for datasets",
  "identifier": "https://casrai.org/dictionary/term/datasheet-for-datasets",
  "description": "A structured document accompanying a machine-learning dataset that records its motivation, composition, collection process, pre-processing, intended uses, distribution, and maintenance, modelled on electronic-component datasheets.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
  "url": "https://casrai.org/dictionary/term/datasheet-for-datasets",
  "sameAs": [
    "dataset datasheet"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}

Adopted by research universities worldwide

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • Massachusetts Institute of Technology logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo

View CASRAI adoption →