Every impressive machine-learning result rests on a foundation of human work that is almost never named. Before a model can be trained, someone has to gather, clean and organise the data; someone has to annotate and label examples, often by hand and at scale; and after training, someone has to evaluate what the model actually does, judging its outputs against criteria that only people can apply. This is real intellectual labour, demanding domain knowledge, careful judgement and sustained attention — and it is frequently invisible, performed by people whose names appear nowhere in the paper that depends on their work. As AI and machine learning become central to research, the question of how to recognise this hidden labour has become a matter of fairness and of accuracy in the scholarly record. This article examines it through the AI and ML research outputs domain of the CASRAI Dictionary.
The work that does not make the byline
It is worth being concrete about what this labour involves, because its invisibility partly stems from it being taken for granted. Data annotation and labelling — marking up images, transcribing audio, tagging text — is the painstaking process that gives supervised learning something to learn from, and the quality of a model is bounded by the quality of these labels. Data curation — selecting, cleaning, documenting and organising the data — shapes everything that follows and embeds countless consequential decisions. Evaluation — assessing model outputs, designing test sets, identifying failure modes — is where human expertise determines whether a system actually works. None of this is mechanical. Each requires judgement, and each materially affects the result. Yet the reward structures of research, organised around authorship and citation, have tended to treat all of it as plumbing rather than contribution.
Why recognition matters here
The case for recognising this work is partly about fairness to the people who do it, but it is also about the integrity of the record. When the labour behind a dataset or an evaluation is invisible, two things go wrong. First, the people responsible — often early-career researchers, students, or specialist data workers — are denied credit for substantial, skilled contributions, with real consequences for their careers. Second, the research itself becomes harder to understand and trust, because the decisions embedded in annotation and curation — which are exactly the decisions that determine bias, coverage and validity — are hidden from view. Recognising the labour and documenting the choices are two sides of the same coin: both bring into the open the human work that determines what a model is and does.
How CRediT captures these contributions
A structured account of who did what is the most direct route to making this labour visible, and the CRediT taxonomy already contains roles that fit it well. Data curation explicitly covers the management activities of annotating, scrubbing and maintaining research data — the very heart of the annotation and labelling work that machine learning depends on. Investigation covers conducting the research and data-collection process, which includes the hands-on work of producing labelled examples. Validation covers verifying results and assessing reproducibility, which maps onto the evaluation of model outputs. Software recognises those who build the tooling that makes annotation and evaluation possible at scale. The full set is described in our overview of the CRediT roles. The point is that the vocabulary for crediting this work largely already exists; what has often been missing is the will to apply it, and the recognition that annotation and evaluation are contributions worth naming rather than chores to be absorbed silently.
Documenting the data and the decisions
Recognition of people goes hand in hand with documentation of process. The movement to document datasets properly — through structured records that describe how a dataset was created, by whom, with what labelling procedures, and with what known limitations — makes the hidden labour visible as part of the dataset’s own description. Approaches such as datasheets for datasets and data statements ask creators to record the provenance of the data, the annotation process, the people involved and the judgements made. This documentation serves recognition directly: a dataset that records who annotated it and how is one in which that labour is acknowledged rather than erased. It also serves responsible AI, because the same record that credits the annotators is the record that lets others understand the dataset’s biases and boundaries. Good documentation is thus both an ethical and a scientific instrument — it names the people and exposes the decisions in a single act.
Data work as a labour question
There is a broader dimension the research community has had to confront. Much annotation and labelling is performed by data workers — within research teams or through external labour, often poorly paid and rarely credited — whose conditions have become a focus of responsible-AI discussion. Recognising annotation as genuine contribution is connected to recognising it as genuine work, with the dignity, fair treatment and acknowledgement that implies. The contributor metadata that records who did this work is not a clerical detail; it is a statement about whose labour the research is built on.
A consistent vocabulary for AI contributions
For the contributions behind AI and ML research to be recognised consistently — across institutions, publishers, dataset repositories and reporting systems — the way they are described must mean the same thing everywhere. Annotation, curation, evaluation and the roles that capture them have to travel without losing their meaning. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the hidden labour behind a model or a dataset, once made visible, is understood identically wherever it is recorded. And because this work is part of the wider research enterprise, recognising it well also serves the goals of fair research administration, explored in our research administration resources. The most advanced model is, at bottom, an artefact of human judgement applied to data; crediting the people who supply that judgement is simply an honest account of how the research was done.