Tag: validation

  • Reporting Analytical Methods Reproducibly

    Reporting an analytical method reproducibly means describing the measurement in enough detail that an independent researcher could repeat it and obtain comparable results. A method that cannot be reproduced from its description undermines the findings built on it, however careful the original work. This article sets out what belongs in an analytical methods report: the instrument and its settings, calibration and reference materials, validation, and the reporting guidelines and protocol repositories that structure good practice. It is guidance on documentation, applicable across techniques.

    Why the methods section carries the weight

    Every measurement technique converts a property of a sample into a signal through a chain of physical steps, each governed by parameters the operator chooses. Whether the instrument is an MRI scanner, a mass spectrometer or a thermal cycler running a PCR, the result depends on settings that another laboratory cannot guess. The methods section is where those choices are recorded. If it omits them, the experiment is effectively unrepeatable, and the published result becomes a claim rather than a verifiable observation.

    What to record about the instrument and parameters

    A reproducible report identifies the instrument precisely and lists the settings that affect the measurement. The level of detail should be enough that a competent reader could configure their own equipment to match.

    Category Examples of what to report
    Instrument Make, model and relevant configuration of the apparatus
    Acquisition settings The technique-specific parameters that govern signal generation
    Sample preparation How the sample was prepared, stored and presented to the instrument
    Data processing Software, versions, transforms and any filtering applied to raw data
    Environment Conditions such as temperature that materially affect the result

    Reporting the data-processing chain matters as much as the acquisition. Many techniques apply substantial mathematical transformation between raw signal and reported value, and an undocumented processing step can change results as much as a hardware setting. Naming software and versions makes the analysis traceable.

    Calibration and reference materials

    An instrument’s raw output is meaningful only against a known scale. Calibration ties the measurement to a reference, and reporting how and when calibration was performed lets others judge and reproduce the accuracy. Where certified reference materials exist, samples of known composition or known value, citing them anchors a method to a community-agreed standard and allows cross-laboratory comparison. A report should state what was used to calibrate, the reference materials employed, and how often calibration was checked over the course of the work.

    Validation and uncertainty

    Validation establishes that a method does what it claims under the conditions of use. Depending on the technique this may include assessing the limit of detection, the range over which response is reliable, repeatability between runs, and the method’s sensitivity to small changes in conditions. Reporting these characteristics, together with an honest statement of measurement uncertainty, tells readers how much weight a number can bear. A value quoted without any indication of its uncertainty invites overinterpretation and is difficult to reproduce meaningfully.

    Reporting guidelines and protocol repositories

    Researchers rarely have to design a reporting structure from scratch. Many fields maintain community reporting guidelines that enumerate the minimum information a methods section should contain for a given type of study, reducing the risk of leaving out a critical parameter. Alongside these, protocol repositories such as protocols.io let authors publish a step-by-step procedure as a citable object, with a persistent identifier, separate from the constraints of a paper’s word limit. Linking a manuscript to a deposited protocol gives readers the full operational detail and a stable reference. Using a recognised guideline and depositing the detailed protocol together address the two failure modes of methods reporting: omission and lack of granularity.

    Consistent terminology supports all of this; the CASRAI dictionary standardises the vocabulary used to describe research outputs and processes, and our reproducibility coverage explores related practices. Practical author-facing guidance is collected in our guidance for authors.

    Frequently asked questions

    How much detail is enough?

    The working test is whether a competent independent researcher could repeat the measurement from the description alone and expect comparable results. If any parameter that materially affects the outcome would have to be guessed, the report is incomplete. Depositing a full protocol alongside the paper is a reliable way to reach that bar.

    Why report data processing as well as acquisition?

    Many techniques transform raw signal substantially before producing a reported value, through Fourier transforms, baseline corrections, filtering or thresholding. An undocumented processing step can alter results as much as a hardware change, so software, versions and transforms should be recorded as part of the method.

    What role do reference materials play?

    Certified reference materials provide a known value against which an instrument can be calibrated and across which laboratories can compare. Citing them anchors a method to a shared standard, which is central to making measurements comparable and reproducible across sites.

    Where do reporting guidelines and protocols.io fit?

    Reporting guidelines define the minimum information a methods section should contain, guarding against omission. Protocol repositories such as protocols.io let authors publish granular, citable, versioned procedures that exceed a paper’s space limits. Used together they cover both completeness and detail, as discussed across our research lifecycle coverage.

  • Crediting contributions to AI/ML research: data, code, models and evaluation

    Machine-learning research distributes its intellectual labour differently from a conventional empirical study. The work that determines whether a result is any good is spread across data collection and annotation, code, model training, and evaluation — and the people who do each of those things are often different people. So how well do the 14 roles of CRediT describe who did what on an AI/ML paper? Better than one might fear, with a few well-understood friction points. This article walks through the mapping, role by role, for the benefit of anyone writing a CRediT author statement for ML work.

    Start from the lifecycle, not the role list

    The cleanest way to assign CRediT roles to ML work is to walk the lifecycle and ask, at each stage, who contributed and which role names that contribution. A typical AI/ML project moves through: framing the problem and research goals; designing the method or model architecture; assembling, cleaning, and annotating data; implementing and training; evaluating; and writing it up. Each stage has a natural CRediT home.

    Conceptualization and Methodology: the ideas and the design

    The framing of the research question — what problem the model is meant to solve, what would count as success — is Conceptualization, exactly as in any other field. The design of the method is where ML gets its own texture. A genuinely novel architecture, training objective, or learning algorithm is Methodology in the canonical sense: “development or design of methodology; creation of models.” The phrase “creation of models” sits slightly oddly here, because in ML “model” can mean either the conceptual method or the concrete trained weights; the CRediT definition means the former. Designing the experimental protocol — what gets held out, how runs are seeded, what ablations are performed — is also Methodology.

    Data curation and Investigation: the part that decides the result

    In ML, data quality usually matters more than model cleverness, and the people who do data work are frequently undercredited. CRediT offers two relevant roles. Investigation covers “performing the experiments, or data/evidence collection” — the gathering of the raw data, the running of the training experiments themselves. Data curation covers “management activities to annotate (produce metadata), scrub data and maintain research data… for initial use and later re-use” — which is an almost exact description of dataset cleaning, labelling, deduplication, and the construction of the documented, reusable dataset.

    The practical advice is to use both roles deliberately and not to let Investigation swallow everything. The person who designed the annotation scheme and produced the dataset’s metadata is doing Data curation, and saying so makes visible a contribution that is otherwise invisible — and that, by the field’s own lights, often determines the outcome. The datasheet for the dataset is, in effect, a written artefact of that Data curation work.

    Software: central, and overloaded

    Almost all ML work involves code, so Software — “programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components” — is the most frequently assigned role. It is also the most overloaded. On a real project, “Software” can cover the researcher who implemented the novel method, the engineer who built the training pipeline, the person who wrote the data-loading code, and whoever maintains the evaluation harness. CRediT gives all of them the same role name.

    This is the same limitation we have documented for software papers: the Software role lacks sub-roles for implementation, testing, infrastructure, and maintenance. The current best practice is to use the degree-of-contribution qualifier (lead / equal / supporting) to differentiate, and to carry finer-grained per-component contributorship in the repository’s own metadata — a CITATION.cff file or the model card’s authorship section — rather than trying to force it all into the paper’s CRediT statement.

    Validation: evaluation is its own contribution

    The single most useful point in this whole mapping is that Validation exists and should be used. Its definition — “verification… of the overall replication/reproducibility of results/experiments and other research outputs” — fits the work of building and running an evaluation suite almost perfectly. The person who designed the evaluation, guarded against test-set contamination, ran the baselines, and confirmed that the reported numbers reproduce is doing Validation, and in ML that is frequently the difference between a trustworthy result and a misleading one.

    Because evaluation is so central to ML and so often distinct from the modelling work, assigning Validation as a lead role to the person who owned evaluation is one of the highest-value things a CRediT statement for ML can do. It is also under-used, because the habit of treating evaluation as an undifferentiated part of “the experiments” persists.

    The remaining roles

    The rest map without surprises. Producing figures, training curves, and visualisations is Visualization. Providing compute — “computing resources… or other analysis tools” is explicitly in the Resources definition — is Resources; on compute-intensive projects, the contribution of whoever secured and managed the GPU allocation is real and namable. Writing the paper is Writing – original draft and Writing – review & editing. Leading the project is Supervision and Project administration; securing the grant is Funding acquisition.

    Where AI assistance fits, and where it does not

    One thing CRediT deliberately does not represent is the use of AI tools to do the work — an AI coding assistant that helped write the training code, or a model that drafted prose. That is a disclosure matter, not a contributorship matter: AI systems are not contributors, a position the community has settled, and the prevailing view is that AI use should be tracked as a separate dimension rather than as a CRediT role. CASRAI has written separately on authorship and AI; the short version is that a human who used an AI tool to discharge a role still gets that role, and the AI use is disclosed elsewhere.

    A worked statement

    A. Okonkwo: Conceptualization, Methodology (lead), Writing – original draft. B. Lindqvist: Data curation (lead), Investigation. C. Nakamura: Software (lead), Methodology (supporting). D. Rossi: Validation (lead), Software (supporting). E. Mwangi: Visualization, Writing – review & editing. F. Schmidt: Resources, Supervision, Funding acquisition.

    Read off, this says: someone designed the method and wrote the paper; someone else built the dataset; someone else implemented the system; someone else owned evaluation; someone made the figures and edited; and someone provided the compute and led the project. That is a far truer account of an ML project than “six authors,” and it is exactly what CRediT is for.

    What to do now

    Use the full role set, not just Software and Writing. Credit Data curation and Validation explicitly — they are where ML results are won or lost. Use the degree-of-contribution qualifier to differentiate within overloaded roles, and push fine-grained software contributorship into the repository’s own metadata. Disclose AI use separately from contributorship. CASRAI’s author-statement guidance has the templates.

    Related reading