Tag: PIDINST

  • Identifiers for Things, Not Just Papers: IGSN and PIDINST

    When researchers think about persistent identifiers, they usually picture DOIs on papers and datasets or ORCID iDs on people. Yet a great deal of research turns on physical things: a sediment core drilled from a lake bed, a tissue specimen in a biobank, a water sample from a particular depth on a particular day, or the spectrometer that analysed it. These physical research objects have historically been referred to by inconsistent local labels, if they were referred to at all. Two complementary efforts, the IGSN for samples and the PIDINST work for instruments, set out to give them stable, global identifiers.

    Why physical objects need PIDs

    The case for identifying physical objects mirrors the case for identifying any research output. A persistent identifier lets a sample or instrument be referred to unambiguously across publications, datasets, and laboratories. It allows the measurements derived from a sample to be linked back to the sample itself, and onward to the instrument that produced them. Without such links, reuse and verification become difficult: a reader cannot easily tell whether two studies analysed the same specimen, or whether a calibration problem on a particular instrument might affect a body of results. Persistent identification turns scattered physical objects into nodes in a connected research graph, supporting the goals of FAIR data.

    IGSN: identifiers for samples

    The IGSN began in the geosciences as the International Geo Sample Number, a way to give individual physical samples a globally unique identifier so that specimens could be tracked and cited across the literature. As the approach proved useful beyond geology, the system evolved. The IGSN is now implemented as an IGSN ID, issued through DataCite, which brought sample identification into the same DOI-based infrastructure used for datasets and other outputs. This alignment means a sample can carry a resolvable identifier, a landing page, and structured metadata describing what the sample is, where and when it was collected, and how it relates to other objects.

    The practical effect is that a physical specimen becomes a citable entity. A paper can reference the exact sample it analysed; a dataset can link each measurement to the sample it came from; and a repository can expose the provenance of its holdings. For disciplines that depend on irreplaceable physical material, from earth science to the life sciences, this is a meaningful advance in traceability.

    PIDINST: identifiers for instruments

    Where IGSN addresses samples, the PIDINST working group, convened under the Research Data Alliance, addressed the instruments themselves. The group developed a metadata schema for persistent identification of measuring instruments, so that a microscope, sensor, telescope, or analytical device can be referenced by a persistent identifier and described in a consistent way. The schema captures the kind of information that makes an instrument identifiable and useful to cite: what it is, who owns or operates it, its model and configuration, and identifiers for related entities such as the institution that hosts it.

    Identifying instruments matters because the measuring apparatus is part of the methods. When the data from an experiment can be linked to the specific instrument that produced them, it becomes possible to assess instrument-related effects, to credit the facilities that maintain expensive equipment, and to trace a result from a published figure all the way back to the device on a laboratory bench.

    Connecting the chain of provenance

    The real power of these identifiers appears when they are used together. Imagine a measurement linked to the instrument that produced it via a PIDINST identifier, the sample it was taken from via an IGSN ID, the dataset it belongs to via a DataCite DOI, and the researchers responsible via their ORCID iDs. Each link is a small piece of metadata, but together they describe an unbroken chain of provenance from a published claim back to the physical objects and people behind it. That is precisely the kind of connected, machine-actionable record that modern research infrastructure aspires to.

    Towards a fully identified research record

    Extending persistent identification to samples and instruments fills two of the larger gaps in the research record. Articles, data, organisations, and people increasingly carry stable identifiers; physical objects and the apparatus that measures them have lagged behind. By bringing samples into the DataCite ecosystem as IGSN IDs and by giving instruments a shared metadata schema through PIDINST, the community is steadily closing those gaps. The vocabularies and crosswalks that hold such a record together are the kind of standards work catalogued in the CASRAI data dictionary, and they complement contributor frameworks such as CRediT by anchoring the human contributions to the physical things they acted upon.

  • Identifying instruments and samples: PIDINST and IGSN

    Over the past two decades, research has built an impressive web of persistent identifiers. Articles have DOIs, datasets have DOIs, researchers have ORCID iDs, organisations have ROR identifiers, and grants and projects are increasingly identified too. Follow any one of these and you can traverse the others — this person wrote that paper, which used this dataset, funded by that grant. But there have long been two conspicuous gaps in this graph, both at the point where research meets the physical world: the instruments that generate measurements, and the physical samples from which data are drawn. Two community efforts — PIDINST for instruments and IGSN for samples — are now closing those gaps. This article explains both and where they fit, drawing on the persistent identifiers domain of the CASRAI Dictionary.

    Why instruments and samples need identifiers

    Consider a measurement. To interpret it properly — to reproduce it, to compare it with another, to assess its reliability — you need to know what produced it: which spectrometer, which sensor, which sequencer, in what configuration and with what calibration history. And to know what the measurement is of, you need to identify the physical sample: which rock core, which water sample, which tissue specimen, collected where and when. Traditionally this provenance was described in prose, in ways that were inconsistent between papers and impossible to resolve automatically. Two papers might use the same instrument or analyse splits of the same sample without any way to know it. Persistent identifiers for instruments and samples make that provenance explicit, resolvable and connectable to the rest of the PID graph.

    PIDINST: persistent identifiers for instruments

    PIDINST is a community framework, developed under the auspices of the Research Data Alliance, for assigning persistent identifiers to research instruments and describing them with a shared metadata schema. The idea is that a significant instrument — a telescope, a mass spectrometer, a research vessel’s sensor array — receives a persistent identifier and a structured description covering attributes such as its owner, manufacturer, model, and where it is located or operated. Once an instrument has a resolvable identifier, data it produces can cite it, the instrument can be linked to the people and institutions responsible for it, and its outputs can be aggregated across studies. PIDINST is deliberately infrastructure-agnostic: it defines the metadata and the principle of persistent identification rather than mandating a single issuing body, allowing existing identifier systems to carry instrument PIDs.

    IGSN: identifiers for physical samples

    On the samples side, the IGSN — originally the International Geo Sample Number, now stewarded as a global sample identifier — provides persistent, resolvable identifiers for physical specimens. An IGSN identifies a particular sample: a sediment core, a mineral specimen, a biological sample, with metadata describing what it is, where and when it was collected, and how it relates to parent samples and sub-samples. This last point matters enormously in practice, because samples are routinely split, sub-sampled and distributed; IGSN can express the relationships between a parent sample and its derivatives, so that analyses performed on different splits can be traced back to a common origin. The IGSN system has been integrated with the DataCite infrastructure, aligning sample identifiers with the same resolution and metadata ecosystem used for datasets — which means a sample can be cited and linked just as a dataset can.

    A note on RRIDs

    Related to the question of identifying research resources are Research Resource Identifiers (RRIDs), which identify key biological resources used in research — antibodies, cell lines, model organisms, and software tools — so that the exact resource behind a result can be unambiguously named and found. RRIDs address a different layer from PIDINST and IGSN: not the instrument that measured or the unique physical specimen, but the catalogued, often commercially available resources whose precise identity is essential to reproducibility. Together, instrument PIDs, sample identifiers and resource identifiers fill in the parts of the provenance picture that dataset and article DOIs never reached.

    Completing the provenance chain

    The power of these identifiers is realised when they are connected. Picture a fully linked record: a dataset (DOI) was produced by an instrument (PIDINST) operated by a researcher (ORCID) at an institution (ROR), measuring a sample (IGSN) collected on a particular expedition, using a reagent identified by an RRID, all under a grant (grant ID). Each link is resolvable; the whole forms a provenance chain that a machine can traverse and a human can audit. That is a qualitatively better basis for reproducibility and reuse than a methods section written in prose, because every node can be verified against an authoritative record rather than taken on trust.

    Using them in practice

    For researchers, adopting these identifiers is becoming more straightforward as repositories and data-collection workflows build in support. The practical advice is to assign and cite instrument and sample identifiers at the point of data creation rather than retrofitting them later, and to record the relationships — instrument to data, parent sample to sub-sample — while they are still known. Our guidance on persistent identifiers for authors covers how to incorporate these into the research record, and the consistent definitions that let an instrument PID or sample identifier mean the same thing across systems are maintained in the CASRAI Dictionary. As with people and outputs, recognising the contributions of those who build and steward instruments and sample collections is part of a complete record, and structured contribution through the CRediT taxonomy helps make that work visible too.