A persistent identifier is useful on its own: an ORCID iD reliably points to a researcher, a DOI to an output, a ROR identifier to an organisation, a RAiD to a research project or activity. But the real power of persistent identifiers is not in any single one of them — it is in the connections between them. When the identifier for a person is linked to the identifiers for their outputs, and those to the identifier for the project that produced them, and that to the identifier for the funding organisation behind it, something new emerges: a navigable web of relationships, a PID graph, that lets the research landscape be traversed as a connected whole rather than queried as a heap of separate records. This article explains the PID graph and why it depends on open metadata, drawing on the persistent-identifiers domain of the CASRAI Dictionary.
From individual identifiers to a graph
Each persistent identifier solves a problem of ambiguity. ORCID disambiguates researchers, following them across name changes and institutions. ROR disambiguates organisations, giving each a single identifier despite its many name variants. DOIs give outputs — articles, datasets, software — stable, citable identity. RAiD identifies the project or activity itself, the container within which work happens over time. Individually each is valuable, but each identifier also carries metadata describing its relationships: an output’s metadata can record its authors’ ORCID iDs, their organisations’ ROR identifiers and the project’s RAiD. When these relationship statements are gathered across the whole system, the separate identifiers knit together into a graph — nodes for people, organisations, outputs and projects, joined by edges that say authored, affiliated with, part of, funded by. The graph is the system of identifiers seen as a connected structure rather than a set of isolated points.
What the FREYA project established
The concept of the PID graph was developed and matured substantially through the European FREYA project, in which DataCite and partners advanced the infrastructure and thinking behind connecting persistent identifiers. FREYA articulated the PID graph as an explicit idea: that the relationships recorded in PID metadata constitute a graph that can be built, queried and traversed, unlocking capabilities no single registry can offer alone. Its lasting contribution was to show that persistent identifiers are not merely labels but the nodes of a knowledge graph waiting to be connected, and that the metadata linking them is the substance from which the graph is made.
What a PID graph makes possible
A connected graph of identifiers enables things that disconnected registries cannot:
- Following the threads of a career or project. From a researcher’s ORCID iD one can reach their outputs, their affiliations and the projects they worked on, assembling a complete picture automatically rather than by hand.
- Answering “what came from this funding?” By following links from a funder or project to its outputs, the graph answers the question funders most want answered, without each grant holder reporting everything manually.
- Reducing reporting burden. Because relationships are recorded once in open metadata, systems can reuse them instead of asking researchers to re-enter the same information.
- Building new services. Discovery tools, analytics and national research-information systems can be built on top of the graph rather than reconstructing the same connections independently.
Why open metadata is the foundation
The single most important condition for a useful PID graph is that the connecting metadata be open. The graph is built from the relationship statements carried in identifier metadata, and those statements are only useful for building a shared, navigable graph if they are openly available for anyone to harvest, link and reuse. This is why the open-metadata commitments of organisations such as Crossref and DataCite — the principal DOI registration agencies for scholarly outputs and research data respectively — matter so much. When the metadata describing outputs, their authors, affiliations and funding is open, the relationships within it can be assembled into a graph by anyone. When metadata is closed or paywalled, the connections are locked away, and the graph that could have been built from them cannot be. Open metadata is not a nicety here; it is the raw material of the graph, and the degree of openness directly determines how complete the graph can be.
Open infrastructure and POSI
A research knowledge graph that the whole community depends on must rest on infrastructure the community can trust to remain open and stable. This is the concern of the Principles of Open Scholarly Infrastructure (POSI), which set out commitments around governance, sustainability and insurance that help ensure shared infrastructure serves the community over the long term rather than being captured or quietly enclosed. Several of the key identifier organisations have engaged with POSI precisely because the value of the PID graph depends on its constituent registries remaining open, governed in the community interest and durable. A graph built on infrastructure that might close, paywall its metadata or disappear is a fragile foundation; POSI articulates what it takes for that foundation to be trustworthy. The openness of the metadata and of the infrastructure that holds it are two aspects of the same requirement.
A consistent vocabulary for the graph
For identifiers and their relationships to connect across the many systems that hold them, the relationship types and the entities they join must mean the same thing everywhere — what it means to be an author, an affiliation or a part of a project. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so that the metadata forming the graph’s edges is understood identically wherever it originated. The contributions recorded in that metadata can be described with the CRediT taxonomy, adding who did what to the connections between people and the outputs they produced; to see how the major identifiers relate, our comparison material sets them side by side. Persistent identifiers solve ambiguity one entity at a time; the PID graph turns those solved ambiguities into a connected map of research, and open metadata is what keeps that map whole.







