Editorial · CASRAI · Research data infrastructure

Identifiers for Things, Not Just Papers: IGSN and PIDINST

Persistent identifiers are familiar for articles, datasets, and people, but the physical objects of research, the rock cores, water samples, and the instruments that measure them, have long lacked stable references. The IGSN for samples and the PIDINST work for instruments extend persistent identification to the physical world, making physical research objects findable, citable, and connectable to the data they produce.

ByCASRAI Editorial Board

Published 21 Jun 2026· 4 minute read

When researchers think about persistent identifiers, they usually picture DOIs on papers and datasets or ORCID iDs on people. Yet a great deal of research turns on physical things: a sediment core drilled from a lake bed, a tissue specimen in a biobank, a water sample from a particular depth on a particular day, or the spectrometer that analysed it. These physical research objects have historically been referred to by inconsistent local labels, if they were referred to at all. Two complementary efforts, the IGSN for samples and the PIDINST work for instruments, set out to give them stable, global identifiers.

Why physical objects need PIDs

The case for identifying physical objects mirrors the case for identifying any research output. A persistent identifier lets a sample or instrument be referred to unambiguously across publications, datasets, and laboratories. It allows the measurements derived from a sample to be linked back to the sample itself, and onward to the instrument that produced them. Without such links, reuse and verification become difficult: a reader cannot easily tell whether two studies analysed the same specimen, or whether a calibration problem on a particular instrument might affect a body of results. Persistent identification turns scattered physical objects into nodes in a connected research graph, supporting the goals of FAIR data.

IGSN: identifiers for samples

The IGSN began in the geosciences as the International Geo Sample Number, a way to give individual physical samples a globally unique identifier so that specimens could be tracked and cited across the literature. As the approach proved useful beyond geology, the system evolved. The IGSN is now implemented as an IGSN ID, issued through DataCite, which brought sample identification into the same DOI-based infrastructure used for datasets and other outputs. This alignment means a sample can carry a resolvable identifier, a landing page, and structured metadata describing what the sample is, where and when it was collected, and how it relates to other objects.

The practical effect is that a physical specimen becomes a citable entity. A paper can reference the exact sample it analysed; a dataset can link each measurement to the sample it came from; and a repository can expose the provenance of its holdings. For disciplines that depend on irreplaceable physical material, from earth science to the life sciences, this is a meaningful advance in traceability.

PIDINST: identifiers for instruments

Where IGSN addresses samples, the PIDINST working group, convened under the Research Data Alliance, addressed the instruments themselves. The group developed a metadata schema for persistent identification of measuring instruments, so that a microscope, sensor, telescope, or analytical device can be referenced by a persistent identifier and described in a consistent way. The schema captures the kind of information that makes an instrument identifiable and useful to cite: what it is, who owns or operates it, its model and configuration, and identifiers for related entities such as the institution that hosts it.

Identifying instruments matters because the measuring apparatus is part of the methods. When the data from an experiment can be linked to the specific instrument that produced them, it becomes possible to assess instrument-related effects, to credit the facilities that maintain expensive equipment, and to trace a result from a published figure all the way back to the device on a laboratory bench.

Connecting the chain of provenance

The real power of these identifiers appears when they are used together. Imagine a measurement linked to the instrument that produced it via a PIDINST identifier, the sample it was taken from via an IGSN ID, the dataset it belongs to via a DataCite DOI, and the researchers responsible via their ORCID iDs. Each link is a small piece of metadata, but together they describe an unbroken chain of provenance from a published claim back to the physical objects and people behind it. That is precisely the kind of connected, machine-actionable record that modern research infrastructure aspires to.

Towards a fully identified research record

Extending persistent identification to samples and instruments fills two of the larger gaps in the research record. Articles, data, organisations, and people increasingly carry stable identifiers; physical objects and the apparatus that measures them have lagged behind. By bringing samples into the DataCite ecosystem as IGSN IDs and by giving instruments a shared metadata schema through PIDINST, the community is steadily closing those gaps. The vocabularies and crosswalks that hold such a record together are the kind of standards work catalogued in the CASRAI data dictionary, and they complement contributor frameworks such as CRediT by anchoring the human contributions to the physical things they acted upon.

Related editorial in this domain

More on Research data infrastructure

21 Jun 2026

Anonymising research data: k-anonymity, differential privacy and the re-identification risk

Sharing data about people without exposing the people themselves is one of the hardest problems in research data management. This article distinguishes anonymisation from pseudonymisation, explains the privacy models researchers actually use, k-anonymity, l-diversity and differential privacy, and introduces the practical guidance from the UK Anonymisation Network (UKAN) and the ICO’s anonymisation code. It also confronts the uncomfortable reality that re-identification is often easier than it looks.

20 Jun 2026

Big Data and the Vs of Data Explained for Research

Big data describes datasets so large, fast or varied that traditional tools cannot handle them. This guide explains the defining Vs, from volume and velocity to veracity and value, how distributed processing copes, and what big data means for research and FAIR data.

20 Jun 2026

Cloud Computing for Research Infrastructure

Cloud computing delivers on-demand, elastic, measured computing resources over a network. This explainer defines it using the NIST model, distinguishes IaaS, PaaS and SaaS, and weighs its role in reproducible research alongside cost and governance considerations.