Tag: provenance

  • Stem Cell Research Registries, Provenance and Reporting Governance

    Stem cells are cells capable of dividing to renew themselves and of giving rise to more specialised cell types, and in research they are tracked through cell-line registries that record provenance and reporting metadata. This article scopes stem cells strictly to research, registries and governance — it does not address therapies, treatments or clinical use.

    For research-data infrastructure, the key questions are definitional and administrative: what type of cell line is being used, where it came from, under what consent, and how its use is reported so that studies remain transparent and reproducible.

    Types of stem cells at a definitional level

    Stem cells are commonly grouped into three broad categories used in research. The distinctions matter for registries because provenance and governance requirements differ by type.

    Type Definitional description
    Embryonic stem cells Derived from early-stage embryos in a research setting; broad capacity to give rise to many cell types
    Induced pluripotent stem cells (iPSCs) Adult cells reprogrammed in the laboratory to a pluripotent-like state
    Adult (tissue) stem cells Found within tissues; more limited in the cell types they typically generate

    These are definitional categories rather than clinical claims. Recording the precise type — and the laboratory line identifier — is essential metadata, much like the controlled terms catalogued in the CASRAI dictionary.

    Cell-line registries and persistent identifiers

    A stem-cell registry is a curated database that records standardised information about research cell lines, including a stable identifier, the line’s origin and the conditions under which it was derived. The concept exemplified by resources such as a human pluripotent stem cell registry (the hPSCreg concept) is to give each line a persistent, citable identifier and a consistent metadata record.

    Persistent identifiers for cell lines play the same role they play across the research ecosystem: they disambiguate one line from another and link it to the studies that used it. This mirrors the wider identifier landscape described in our overview of persistent identifiers in 2026.

    Provenance: tracking where a line came from

    Provenance is the documented history of a cell line — its derivation, the consent under which source material was obtained, and any ethical approvals associated with its creation and use. Robust provenance is a compliance requirement as much as a scientific one, ensuring that the line’s permitted uses are clear and auditable.

    Because consent and ethical-approval terms govern how a line may be used and shared, this provenance metadata must accompany the line through the research lifecycle. The same governance logic underpins responsible data exchange in our guide to genomic data-sharing standards.

    Reporting and governance for reproducibility

    Transparent reporting of which cell line was used, with its registry identifier and provenance, lets independent researchers interpret and build on a study correctly. Misidentified or undocumented lines are a known source of irreproducibility, so registries and clear reporting requirements directly support the goals covered in our reproducibility news. For practical advice on documenting research resources, see our guidance for authors.

    Frequently asked questions

    What are the main types of stem cells used in research?

    At a definitional level, research commonly distinguishes embryonic stem cells, induced pluripotent stem cells reprogrammed from adult cells, and adult tissue stem cells. Each category carries different provenance and governance requirements.

    What is a stem-cell registry?

    A stem-cell registry is a curated database that gives each research cell line a persistent identifier and a standardised record of its origin, derivation conditions and consent, supporting transparent and citable reporting.

    Why does provenance matter for stem-cell lines?

    Provenance documents a line’s derivation, consent and ethical approvals, which together define how the line may be used and shared. Without it, permitted uses are unclear and studies are harder to reproduce or audit.

    Does this guide cover stem-cell therapies?

    No. This guide is scoped to research, registries, provenance and governance. It does not address therapies, treatments or clinical applications.

  • Electronic lab notebooks and structured record-keeping across the research lifecycle

    When we picture the scholarly record, we tend to think of its end products: the published paper, the deposited dataset, the citation. But each of those is the visible tip of a much larger body of work — the active, day-to-day conduct of research, where experiments are designed and run, samples processed, instruments operated and observations recorded. For generations this working phase was captured, if at all, in the paper laboratory notebook: a bound book on a bench, legible only to its author, locked in a drawer, and disconnected from everything else. An immense amount of crucial information about how research is actually done remained invisible to the wider record. The electronic lab notebook and the structured record-keeping practices around it are changing that. This article looks at how, drawing on the research-lifecycle domain of the CASRAI Dictionary.

    What an electronic lab notebook is

    An electronic lab notebook, or ELN, is software that replaces the paper notebook as the place where researchers record their day-to-day work: experiments, protocols, observations, results and the reasoning behind decisions. At its simplest, an ELN offers obvious practical advantages over paper — it is searchable, backed up, shareable, and resistant to the coffee stains and illegible handwriting that have plagued laboratory science forever. But its deeper significance is that it makes the working record digital and therefore connectable. A paper notebook is an island; an electronic one can be linked to the protocols it follows, the instruments and samples it references, the data files it produces and the people who did the work. The ELN is the point at which the active phase of research enters the connected world that the rest of the record already inhabits.

    Capturing the active phase as connected metadata

    This is the central idea: the ELN lets the active phase of research be captured as connected metadata rather than disappearing into a drawer. When work is recorded electronically and linked properly, a rich web of relationships can be built around it — this experiment used that protocol; it was performed by these people on that instrument; it consumed these samples and produced these data files; it belongs to this project and contributes to that publication. The working phase stops being a black box between the start of a project and its outputs, and becomes a documented, navigable part of the record. This matters for reproducibility, because others can see exactly how a result was produced; for collaboration, because the record is shared rather than siloed; and for integrity, because the chain from question to result is visible rather than reconstructed after the fact.

    FAIR principles for the working record

    The same FAIR principles — Findable, Accessible, Interoperable, Reusable — that govern published data apply, with equal force, to the records created during the active phase. An ELN that captures structured, well-described records makes the working record findable and reusable in a way a paper notebook never could be. The principle is that good data management should not begin at the moment of deposit, when a project ends, but should run through the entire lifecycle, starting at the bench. If records are created in a structured, connected form from the outset, preparing data for deposit becomes a matter of harvesting and tidying what already exists, rather than reconstructing it. Good record-keeping during the active phase is, in this sense, the foundation of good data management overall.

    Provenance: the PROV standard

    A particular strength of structured electronic record-keeping is its capacity to capture provenance — the record of how something came to be: what data was used, what processes acted on it, what agents (people, software, instruments) were involved, and in what order. Provenance is the basis of trust in a result, because it lets others trace exactly how that result was produced and verify each step. The PROV standard provides a formal, machine-readable model for expressing provenance — describing the entities, activities and agents in a process and the relationships between them — so that the chain of how a result was produced can be recorded consistently and understood across systems. An ELN that captures provenance in line with such a standard turns the working record into something far more powerful than a diary: a verifiable account of how knowledge was made.

    Identifying the work itself: activity identifiers

    If the active phase is to be connected to the rest of the research landscape, the work itself needs to be identifiable. Persistent identifiers have transformed how we refer to outputs and people; the same logic is now being applied to research activities. RAiD (the Research Activity Identifier) is a persistent identifier for research projects and activities, providing a stable handle for the work itself — not just its eventual outputs. With an activity identifier, the records captured in an ELN, the data produced, the people involved and the resulting publications can all be tied to a single, persistent identity for the project. The whole arc of a piece of research — from the work as it happens to the products it yields — can then be traced as a connected whole rather than a set of disconnected fragments.

    A consistent vocabulary across the lifecycle

    For records created at the bench to connect with everything downstream — data repositories, CRIS platforms, publications — the elements they contain must mean the same thing everywhere: what a protocol, a sample, an instrument or an activity denotes. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the record captured in an electronic lab notebook is understood identically wherever it flows. And because the work recorded there — investigation, data curation, methodology — is genuine contribution, it can be described in the same framework used for every output, the CRediT taxonomy and its full set of contribution roles. The electronic lab notebook brings the most hands-on phase of research into the connected record; structured record-keeping, provenance and activity identifiers let that phase take its rightful place in the story of how knowledge is made.

  • AI agents and autonomous research: attribution and accountability

    For most of the history of science, the tools of research — however sophisticated — did the bidding of the people using them. A telescope or a statistical package extended human capability but did not decide what to investigate. That assumption is now being tested. AI agents capable of a degree of autonomy are beginning to appear in research: systems that can generate hypotheses, design experiments, and in some cases run them through automated laboratory equipment, iterating with limited human intervention. Autonomous experimentation of this kind raises a question scholarship was never built to answer: when an AI system materially contributes to a discovery, how should that contribution be attributed, and who is accountable for it? This article examines those questions, drawing on the AI and ML research-outputs domain of the CASRAI Dictionary.

    What autonomous research looks like

    The systems in question share a common feature: they make consequential choices in the research process rather than merely executing instructions. An AI agent might propose which compounds to test next, design the sequence of experiments, control the apparatus that performs them, and analyse the results to decide what to try next — a loop that can continue with the human supervisor stepping in only occasionally. The appeal is obvious: such systems can explore vast spaces of possibility far faster than people, accelerating discovery from materials science to drug development. But the autonomy that makes them powerful is what unsettles the established account of who does research and who answers for it. The agent is no longer just a tool; it is participating in the intellectual work. That shift forces the questions of attribution and accountability.

    Why an AI cannot be an author

    The clearest and most settled point in this debate is also the most important: an AI system cannot be an author of a research work. This is not technophobia or an arbitrary rule; it follows directly from what authorship means. Authorship carries accountability. An author is someone who can take responsibility for the integrity of the work, vouch for its honesty, defend it when questioned, and be answerable if it proves flawed or fraudulent. An AI system can do none of these things; it cannot be held responsible or called to account. The major editorial and integrity bodies have converged firmly on this position: AI tools, however capable, cannot meet the criteria for authorship, because the defining quality of an author — answerability — is one a machine cannot possess. The principles of authorship rest on responsibility, and responsibility is irreducibly human.

    Accountability stays with people

    If the AI cannot be accountable, accountability does not vanish — it remains with the humans involved. The researchers who deploy an autonomous system, decide to use its outputs, design the study it operates within and interpret and publish the results are responsible for that work, including for the AI’s contributions to it. This has a sharp consequence: a researcher cannot disclaim responsibility for an error or fabrication by pointing to the machine. If an AI agent generates a flawed hypothesis and a researcher publishes it as sound, the failure is the researcher’s, because the duty to verify and stand behind the work was theirs. Far from diluting human responsibility, autonomous systems concentrate it: the more capable the tool, the more important the human judgement about whether and how to trust it. Autonomy in the tool does not mean autonomy from accountability for the people.

    Disclosure and the provenance of AI contributions

    If an AI agent cannot be credited as an author but did genuinely contribute, the honest course is to describe what it did transparently. This is a matter of disclosure and provenance rather than authorship. A research report should be clear about the role an autonomous system played — which hypotheses it generated, which experiments it designed, which analyses it performed — so readers can understand how the work was produced and judge it accordingly. Recording the provenance of AI contributions serves several ends at once:

    • Transparency. Readers and reviewers can see where machine judgement entered the work and weigh it appropriately.
    • Reproducibility. Knowing which system was used, and how, is part of being able to reproduce the result.
    • Accountability. Clear provenance makes plain which choices were the system’s and which the researchers’, keeping responsibility traceable.

    Disclosure does not credit the machine; it documents it — an entirely different and appropriate act.

    The limits of CRediT

    It is natural to ask whether a contribution taxonomy could simply add the AI as a contributor. Here it is worth being precise about what the CRediT taxonomy is for. CRediT describes the contributions of people to a research work; it is a vocabulary for human roles, anchored in the assumption that contributors are accountable agents. An autonomous system is not a contributor in that sense, because it cannot bear the responsibility contributorship implies. The right place for AI involvement is therefore not the contributor list but the methods and disclosure sections, where its use can be described as part of how the work was done. What CRediT continues to do well is capture the human contributions around the AI — the conceptualisation, methodology, investigation and interpretation that remain human even when a machine assists. The taxonomy’s limits here are not a defect; they reflect the correct distinction between a tool that is used and a person who is answerable.

    A consistent vocabulary for a changing landscape

    As autonomous systems become more common, describing their involvement consistently — what was used, for what, and where human responsibility sat — will matter increasingly across journals and institutions. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so a statement about how AI contributed to a piece of research, and who is accountable for it, is understood the same way wherever it is recorded. AI agents may transform the pace of discovery; the durable principles — that authorship means accountability, that responsibility stays with people, and that AI contributions are disclosed rather than credited — are what keep research trustworthy as the tools grow more powerful.