Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Editorial · CASRAI · The persistent identifier ecosystem

The five-PID stack: ORCID, ROR, RAiD, DOI and IGSN working together

Persistent identifiers are most powerful not in isolation but as a connected stack. How ORCID, ROR, RAiD, DOI and IGSN each identify one kind of entity, and how they interlock into a research graph.

ByCASRAI Editorial Board
Published 9 Jun 2026· 5 minute read

Persistent identifiers are often introduced one at a time — here is ORCID for researchers, here is the DOI for publications — as if each solved its own isolated problem. That framing undersells them badly. The real power of persistent identifiers is not in any single one but in how they interlock. Each identifies one kind of entity; together they form a connected graph of the research enterprise. This article looks at five identifiers that, taken as a stack, cover the core entities of research — people, organisations, projects, outputs, and physical samples — and shows how they fit together. It draws on the persistent-identifiers domain.

One identifier per kind of thing

The organising insight is that each major identifier answers a different question. Get the right identifier for each kind of entity, and the entities can be linked unambiguously.

  • ORCID iD identifies a person — an individual researcher — with a persistent identifier issued by ORCID. It answers “who?” and resolves the perennial problem of name ambiguity, where one researcher’s work is scattered across spelling variants and shared surnames.
  • ROR ID identifies an organisation — a research institution — through the Research Organization Registry. It answers “where?” and collapses the dozens of ways a single university’s name can be written into one canonical, resolvable identifier.
  • RAiD identifies a project — a research activity — under the ISO 23527:2022 standard. It answers “what undertaking?” and gives the connecting activity its own identity rather than leaving it implicit in the outputs.
  • DOI identifies an output — a publication, dataset, or piece of software. Issued through registration agencies such as Crossref (for publications) and DataCite (for data and software), it answers “what was produced?”
  • IGSN identifies a physical sample — the International GeoSample Number, now governed within the DataCite ecosystem. It answers “which specimen?” and extends persistent identification from the digital world to the physical materials that research is done on.

Five identifiers, five kinds of entity: person, organisation, project, output, sample. Between them they cover the entities that nearly every piece of research involves.

How they interlock

The value appears when the identifiers reference one another. Consider a single field campaign. A team of researchers, each with an ORCID iD, based at institutions each with a ROR ID, conducts a project with a RAiD. In the field they collect rock samples, each registered with an IGSN. They analyse the samples and publish a paper and a dataset, each with a DOI. Now watch the connections: the paper’s DOI metadata lists the authors by ORCID and their affiliations by ROR; the dataset’s DOI references the IGSNs of the samples it describes; both outputs link to the project’s RAiD; and the RAiD record, in turn, aggregates the people, the institutions, the samples, and the outputs.

The result is a graph in which you can start from any node and traverse to the others. From a sample’s IGSN you can reach the dataset that measured it, the paper that interpreted it, the project that collected it, the people who did the work, and the institutions they belong to — all by following identifier references, with no name-matching guesswork. This is the PID graph: the network of relationships formed by linking persistent identifiers, and the substrate on which automated systems can reason across the research enterprise.

Why the stack beats any single identifier

Any one of these identifiers is useful on its own, but each has a ceiling that only the stack removes. A DOI tells you an output exists, but matching its authors to real people requires ORCID; matching its affiliations to real institutions requires ROR; placing it in the context of a project requires RAiD; and connecting it to the physical materials behind it requires IGSN. The DOI’s metadata is only as connected as the identifiers it can reference. The same is true of every identifier in the set: each becomes dramatically more powerful when the entities it points to are themselves identified.

This is why the maturation of the whole ecosystem, rather than any single scheme, has been the significant development of recent years. ROR reached near-universal adoption and gave organisations a clean identifier; RAiD became an ISO standard and filled the project-shaped hole in the middle of the graph; IGSN moved into the DataCite ecosystem and aligned physical-sample identification with digital-output identification. The pieces stopped being five separate good ideas and started being one connected fabric.

The supporting cast

The five-PID stack is the core, but it does not stand entirely alone, and it is worth knowing the adjacent identifiers it connects to. Software Heritage IDs (SWHIDs) pin exact source-code states, complementing the DataCite DOIs that make software citable. The Crossref Funder Registry and Crossref grant IDs identify funders and individual awards, so that the funding behind a project’s RAiD is itself identified. DMP IDs identify data-management plans. These extend the graph further into the lifecycle, but the five core identifiers are the ones that cover the entities every project has.

Where the dictionary fits

Most research administrators do not yet hold a clear mental model of how these identifiers fit together — which is the single most common gap the persistent-identifier ecosystem now presents. The schemes are mature; the understanding is not. A dictionary that defines each identifier operationally, makes its relationships explicit, and shows how the stack interlocks is exactly the integrative reference the ecosystem is missing. Providing that map — and federating each definition back to its authoritative steward, from ORCID to DataCite to ARDC — is the role the CASRAI dictionary is built to play.

What to do now

For researchers: register for an ORCID iD, ensure your institution’s ROR ID is used on your outputs, mint DOIs for your datasets and software as well as your papers, and use IGSNs for physical samples where your discipline supports them. For institutions: drive identifier coverage across all five entity types, because the graph is only as connected as its sparsest identifier. For the ecosystem: keep federating, so that an identifier minted in one scheme can reference an identifier in another without friction.

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →