Examples
Worked examples
- Is an instance
A face-recognition benchmark distributed with a datasheet listing demographic composition, consent procedures, and recommended uses.
- Is an instance
An NLP corpus accompanied by a datasheet documenting source domains and crawling rules.
Counter-examples
Looks similar, but isn't
- Not an instance
A dataset README containing only file format and column descriptions.
- Not an instance
A model card (covers the model, not the dataset).
Editorial commentary
Gebru et al. (2021) proposed datasheets to make dataset provenance and limitations visible to downstream model builders. Topics covered include consent and licensing of subjects, sampling and labelling procedures, demographic composition, known biases, and recommended/cautioned uses. Datasheets are complementary to model cards.
References
- Gebru et al., 'Datasheets for datasets' (Communications of the ACM, 2021).
Also known as
dataset datasheet
Machine-readable encodings
Use in your systems
<role vocab="credit"
vocab-identifier="https://casrai.org/dictionary/"
vocab-term="Datasheet for datasets"
vocab-term-identifier="https://casrai.org/dictionary/term/datasheet-for-datasets" />{
"@context": "https://schema.org",
"@type": "DefinedTerm",
"name": "Datasheet for datasets",
"identifier": "https://casrai.org/dictionary/term/datasheet-for-datasets",
"description": "A structured document accompanying a machine-learning dataset that records its motivation, composition, collection process, pre-processing, intended uses, distribution, and maintenance, modelled on electronic-component datasheets.",
"inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
"url": "https://casrai.org/dictionary/term/datasheet-for-datasets",
"sameAs": [
"dataset datasheet"
],
"license": "https://creativecommons.org/licenses/by/4.0/"
}







