documentation – CASRAI Dictionary

For most of the history of the scholarly record, the unit of documentation was the paper. A piece of empirical research was described, peer-reviewed, and citable as an article; the underlying data and code were, at best, supplementary. Machine-learning research has been quietly rewriting that assumption. A trained model and the dataset it learned from are research outputs in their own right, and the community has developed its own documentation conventions for them: the model card and the datasheet for datasets. This piece sets out what they are, where they came from, and why they belong in the formal research record that CASRAI’s AI/ML research outputs domain is designed to describe.

Model cards: a short, structured account of a model

The model card was proposed by Margaret Mitchell and colleagues in their 2019 paper Model Cards for Model Reporting. The idea is disarmingly simple: every trained model should ship with a short, structured document that answers the questions a responsible user would need to ask before relying on it. Who built it and when? What is it intended to do, and what is it explicitly not intended to do? What data was it trained on? How was it evaluated, and on which populations or subgroups? What are its known limitations, failure modes, and ethical considerations?

The motivating insight was that aggregate performance numbers conceal more than they reveal. A model that is 95% accurate overall can be 99% accurate for one group and 70% for another. A model card’s evaluation section is expected to report performance disaggregated across relevant factors, so that the user can see where the model works and where it does not. This is documentation in service of accountability, not marketing.

Model cards have since become near-ubiquitous in practice. The Hugging Face Hub, the dominant model registry, attaches a model card to every hosted model as its README, and the convention has spread to internal model registries across industry and academia. The format is loose enough to suit a small fine-tuned classifier or a large foundation model, but the core sections — intended use, training data, evaluation, limitations — are stable.

Datasheets for datasets: provenance for the data

The companion convention for data is the datasheet for datasets, proposed by Timnit Gebru and colleagues in 2018 (revised and published in Communications of the ACM in 2021). The analogy in the title is to the datasheets that accompany electronic components: a structured specification that lets an engineer decide whether a part is fit for their purpose.

A datasheet works through a dataset’s full lifecycle in a series of question prompts. Motivation: why was the dataset created, and by whom? Composition: what does each instance represent, are there labels, are there sensitive subpopulations? Collection process: how was the data acquired, was consent obtained, were people aware they were being recorded? Preprocessing and cleaning: what was done to the raw data, and is the raw data preserved? Uses: what has the dataset been used for, and what uses should be avoided? Distribution and maintenance: how is it licensed, who maintains it, and how will errors be corrected?

The point of the datasheet is to make the provenance and limitations of a dataset legible to people who did not collect it. A dataset reused without understanding its collection context is a well-documented source of downstream harm; the datasheet is the mechanism for transmitting that context with the data.

Why these belong in the research record

It is tempting to treat model cards and datasheets as engineering hygiene — useful, but not scholarly in the way a paper is. We think that view is mistaken, for three reasons.

They are how ML researchers are increasingly evaluated. A well-constructed datasheet or a rigorous disaggregated model card represents real intellectual labour: the careful articulation of provenance, intended use, and limitation. Under responsible-assessment regimes such as the narrative CV, this kind of output is exactly the contribution a researcher should be able to claim.
They are the documentation layer that makes a model or dataset FAIR. A trained model with a DataCite DOI but no model card is findable and accessible but not meaningfully reusable. The card supplies the metadata that the FAIR principles require for reuse.
They carry the accountability that the research record is supposed to preserve. When a model is later found to behave badly, the model card is the contemporaneous record of what its builders claimed and disclosed. That is precisely the function the published record has always served for empirical claims.

How persistent identifiers apply

For a model card or datasheet to function as a citable research output, it needs the same identifier infrastructure as any other output. The pattern that has emerged, and that CASRAI’s guidance on persistent identifiers recommends, is straightforward.

The dataset or model receives a DataCite DOI, minted by a generalist repository (Zenodo, Figshare) or a domain-specific one. The datasheet or model card is published as part of that deposit, so that resolving the DOI reaches both the artefact and its documentation. Where source code is involved, a Software Heritage ID pins the exact code state. Contributors are identified by ORCID iD and institutions by ROR ID, so that the people and organisations behind the artefact are unambiguous. Where the model or dataset belongs to a larger project, a RAiD ties it to the project record. The model card’s documentation of its training data should, ideally, cite the dataset’s DOI directly — closing the provenance loop between model and data.

How CRediT applies

Contributorship for these outputs maps onto CRediT better than one might expect, though not perfectly. The person who designed the data-collection protocol is doing Methodology; the people who collected, cleaned, and annotated the data are doing Investigation and Data curation; the person who trained the model is doing Software and, where the training method is itself novel, Methodology; the person who built and ran the evaluation suite is doing Validation. We have written separately about the friction points in this mapping — the Software role in particular tends to absorb too much — but the basic correspondence holds, and a model or dataset deposit should carry a CRediT statement just as a paper does.

Quality varies, and that matters

A note of realism. Because model cards and datasheets are not yet enforced by peer review in the way a methods section is, their quality varies enormously. A thorough datasheet that honestly documents consent gaps and known biases is a genuine contribution; a model card that lists only headline accuracy and a boilerplate licence is documentation theatre. The value of folding these artefacts into the formal research record — with identifiers, contributorship, and eventually review — is precisely that it creates the incentive and the scrutiny to make them good.

What to do now

For researchers releasing a model or dataset: write the model card or datasheet using the established Mitchell et al. and Gebru et al. templates; deposit it with the artefact under a DataCite DOI; attach a CRediT statement and ORCID iDs; and cite the dataset’s DOI from the model card where the model was trained on a citable dataset. For institutions and funders: recognise these outputs in CRIS systems and assessment processes as first-class, identifier-bearing research outputs, not as supplementary material.

Tag: documentation

Model cards and datasheets: documenting AI/ML research outputs