physical samples – CASRAI Dictionary

Over the past two decades, research has built an impressive web of persistent identifiers. Articles have DOIs, datasets have DOIs, researchers have ORCID iDs, organisations have ROR identifiers, and grants and projects are increasingly identified too. Follow any one of these and you can traverse the others — this person wrote that paper, which used this dataset, funded by that grant. But there have long been two conspicuous gaps in this graph, both at the point where research meets the physical world: the instruments that generate measurements, and the physical samples from which data are drawn. Two community efforts — PIDINST for instruments and IGSN for samples — are now closing those gaps. This article explains both and where they fit, drawing on the persistent identifiers domain of the CASRAI Dictionary.

Why instruments and samples need identifiers

Consider a measurement. To interpret it properly — to reproduce it, to compare it with another, to assess its reliability — you need to know what produced it: which spectrometer, which sensor, which sequencer, in what configuration and with what calibration history. And to know what the measurement is of, you need to identify the physical sample: which rock core, which water sample, which tissue specimen, collected where and when. Traditionally this provenance was described in prose, in ways that were inconsistent between papers and impossible to resolve automatically. Two papers might use the same instrument or analyse splits of the same sample without any way to know it. Persistent identifiers for instruments and samples make that provenance explicit, resolvable and connectable to the rest of the PID graph.

PIDINST: persistent identifiers for instruments

PIDINST is a community framework, developed under the auspices of the Research Data Alliance, for assigning persistent identifiers to research instruments and describing them with a shared metadata schema. The idea is that a significant instrument — a telescope, a mass spectrometer, a research vessel’s sensor array — receives a persistent identifier and a structured description covering attributes such as its owner, manufacturer, model, and where it is located or operated. Once an instrument has a resolvable identifier, data it produces can cite it, the instrument can be linked to the people and institutions responsible for it, and its outputs can be aggregated across studies. PIDINST is deliberately infrastructure-agnostic: it defines the metadata and the principle of persistent identification rather than mandating a single issuing body, allowing existing identifier systems to carry instrument PIDs.

IGSN: identifiers for physical samples

On the samples side, the IGSN — originally the International Geo Sample Number, now stewarded as a global sample identifier — provides persistent, resolvable identifiers for physical specimens. An IGSN identifies a particular sample: a sediment core, a mineral specimen, a biological sample, with metadata describing what it is, where and when it was collected, and how it relates to parent samples and sub-samples. This last point matters enormously in practice, because samples are routinely split, sub-sampled and distributed; IGSN can express the relationships between a parent sample and its derivatives, so that analyses performed on different splits can be traced back to a common origin. The IGSN system has been integrated with the DataCite infrastructure, aligning sample identifiers with the same resolution and metadata ecosystem used for datasets — which means a sample can be cited and linked just as a dataset can.

A note on RRIDs

Related to the question of identifying research resources are Research Resource Identifiers (RRIDs), which identify key biological resources used in research — antibodies, cell lines, model organisms, and software tools — so that the exact resource behind a result can be unambiguously named and found. RRIDs address a different layer from PIDINST and IGSN: not the instrument that measured or the unique physical specimen, but the catalogued, often commercially available resources whose precise identity is essential to reproducibility. Together, instrument PIDs, sample identifiers and resource identifiers fill in the parts of the provenance picture that dataset and article DOIs never reached.

Completing the provenance chain

The power of these identifiers is realised when they are connected. Picture a fully linked record: a dataset (DOI) was produced by an instrument (PIDINST) operated by a researcher (ORCID) at an institution (ROR), measuring a sample (IGSN) collected on a particular expedition, using a reagent identified by an RRID, all under a grant (grant ID). Each link is resolvable; the whole forms a provenance chain that a machine can traverse and a human can audit. That is a qualitatively better basis for reproducibility and reuse than a methods section written in prose, because every node can be verified against an authoritative record rather than taken on trust.

Using them in practice

For researchers, adopting these identifiers is becoming more straightforward as repositories and data-collection workflows build in support. The practical advice is to assign and cite instrument and sample identifiers at the point of data creation rather than retrofitting them later, and to record the relationships — instrument to data, parent sample to sub-sample — while they are still known. Our guidance on persistent identifiers for authors covers how to incorporate these into the research record, and the consistent definitions that let an instrument PID or sample identifier mean the same thing across systems are maintained in the CASRAI Dictionary. As with people and outputs, recognising the contributions of those who build and steward instruments and sample collections is part of a complete record, and structured contribution through the CRediT taxonomy helps make that work visible too.

Tag: physical samples

Identifying instruments and samples: PIDINST and IGSN