Crediting contributions to AI/ML research: data, code, models and evaluation

Machine-learning research distributes its intellectual labour differently from a conventional empirical study. The work that determines whether a result is any good is spread across data collection and annotation, code, model training, and evaluation — and the people who do each of those things are often different people. So how well do the 14 roles of CRediT describe who did what on an AI/ML paper? Better than one might fear, with a few well-understood friction points. This article walks through the mapping, role by role, for the benefit of anyone writing a CRediT author statement for ML work.

Start from the lifecycle, not the role list

The cleanest way to assign CRediT roles to ML work is to walk the lifecycle and ask, at each stage, who contributed and which role names that contribution. A typical AI/ML project moves through: framing the problem and research goals; designing the method or model architecture; assembling, cleaning, and annotating data; implementing and training; evaluating; and writing it up. Each stage has a natural CRediT home.

Conceptualization and Methodology: the ideas and the design

The framing of the research question — what problem the model is meant to solve, what would count as success — is Conceptualization, exactly as in any other field. The design of the method is where ML gets its own texture. A genuinely novel architecture, training objective, or learning algorithm is Methodology in the canonical sense: “development or design of methodology; creation of models.” The phrase “creation of models” sits slightly oddly here, because in ML “model” can mean either the conceptual method or the concrete trained weights; the CRediT definition means the former. Designing the experimental protocol — what gets held out, how runs are seeded, what ablations are performed — is also Methodology.

Data curation and Investigation: the part that decides the result

In ML, data quality usually matters more than model cleverness, and the people who do data work are frequently undercredited. CRediT offers two relevant roles. Investigation covers “performing the experiments, or data/evidence collection” — the gathering of the raw data, the running of the training experiments themselves. Data curation covers “management activities to annotate (produce metadata), scrub data and maintain research data… for initial use and later re-use” — which is an almost exact description of dataset cleaning, labelling, deduplication, and the construction of the documented, reusable dataset.

The practical advice is to use both roles deliberately and not to let Investigation swallow everything. The person who designed the annotation scheme and produced the dataset’s metadata is doing Data curation, and saying so makes visible a contribution that is otherwise invisible — and that, by the field’s own lights, often determines the outcome. The datasheet for the dataset is, in effect, a written artefact of that Data curation work.

Software: central, and overloaded

Almost all ML work involves code, so Software — “programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components” — is the most frequently assigned role. It is also the most overloaded. On a real project, “Software” can cover the researcher who implemented the novel method, the engineer who built the training pipeline, the person who wrote the data-loading code, and whoever maintains the evaluation harness. CRediT gives all of them the same role name.

This is the same limitation we have documented for software papers: the Software role lacks sub-roles for implementation, testing, infrastructure, and maintenance. The current best practice is to use the degree-of-contribution qualifier (lead / equal / supporting) to differentiate, and to carry finer-grained per-component contributorship in the repository’s own metadata — a CITATION.cff file or the model card’s authorship section — rather than trying to force it all into the paper’s CRediT statement.

Validation: evaluation is its own contribution

The single most useful point in this whole mapping is that Validation exists and should be used. Its definition — “verification… of the overall replication/reproducibility of results/experiments and other research outputs” — fits the work of building and running an evaluation suite almost perfectly. The person who designed the evaluation, guarded against test-set contamination, ran the baselines, and confirmed that the reported numbers reproduce is doing Validation, and in ML that is frequently the difference between a trustworthy result and a misleading one.

Because evaluation is so central to ML and so often distinct from the modelling work, assigning Validation as a lead role to the person who owned evaluation is one of the highest-value things a CRediT statement for ML can do. It is also under-used, because the habit of treating evaluation as an undifferentiated part of “the experiments” persists.

The remaining roles

The rest map without surprises. Producing figures, training curves, and visualisations is Visualization. Providing compute — “computing resources… or other analysis tools” is explicitly in the Resources definition — is Resources; on compute-intensive projects, the contribution of whoever secured and managed the GPU allocation is real and namable. Writing the paper is Writing – original draft and Writing – review & editing. Leading the project is Supervision and Project administration; securing the grant is Funding acquisition.

Where AI assistance fits, and where it does not

One thing CRediT deliberately does not represent is the use of AI tools to do the work — an AI coding assistant that helped write the training code, or a model that drafted prose. That is a disclosure matter, not a contributorship matter: AI systems are not contributors, a position the community has settled, and the prevailing view is that AI use should be tracked as a separate dimension rather than as a CRediT role. CASRAI has written separately on authorship and AI; the short version is that a human who used an AI tool to discharge a role still gets that role, and the AI use is disclosed elsewhere.

A worked statement

A. Okonkwo: Conceptualization, Methodology (lead), Writing – original draft. B. Lindqvist: Data curation (lead), Investigation. C. Nakamura: Software (lead), Methodology (supporting). D. Rossi: Validation (lead), Software (supporting). E. Mwangi: Visualization, Writing – review & editing. F. Schmidt: Resources, Supervision, Funding acquisition.

Read off, this says: someone designed the method and wrote the paper; someone else built the dataset; someone else implemented the system; someone else owned evaluation; someone made the figures and edited; and someone provided the compute and led the project. That is a far truer account of an ML project than “six authors,” and it is exactly what CRediT is for.

What to do now

Use the full role set, not just Software and Writing. Credit Data curation and Validation explicitly — they are where ML results are won or lost. Use the degree-of-contribution qualifier to differentiate within overloaded roles, and push fine-grained software contributorship into the repository’s own metadata. Disclose AI use separately from contributorship. CASRAI’s author-statement guidance has the templates.

Related reading

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *