A single research project leaves traces in many places. Its articles appear in journals and repositories; its datasets land in data archives; its software ends up in code repositories; the grant that funded it lives in a funder’s database; and the researchers behind it have profiles and identifiers of their own. Each of these traces is recorded somewhere, but they are scattered across thousands of separate systems that do not naturally talk to one another. Reassembling the full picture of what a project produced, or what a funder’s money achieved, has traditionally meant painstaking manual work. OpenAIRE exists to do that reassembly automatically and openly, at the scale of a continent. This article explains what OpenAIRE is and why it matters, drawing on the research information systems domain of the CASRAI Dictionary.
What OpenAIRE is
OpenAIRE is a European open-science infrastructure that aggregates metadata about research from across the continent and beyond, linking it into a single connected resource. It grew out of the need to monitor and support open access to European-funded research, and it has expanded into a broad infrastructure for open science. Its central product is the OpenAIRE Graph (sometimes called the OpenAIRE Research Graph): an enormous, openly available knowledge graph that brings together publications, research data, software, other research products, the projects that produced them, the funding that supported them, and the organisations and people involved. The defining idea is connection. Rather than holding these as separate lists, the graph models the relationships between them — this article resulted from that project, that project was funded by this grant, this dataset supplements that article, these outputs came from that organisation — so that the research landscape can be navigated as a web of linked entities rather than a pile of disconnected records.
How the graph is built
The graph is assembled by harvesting metadata from a vast range of sources and then doing the hard work of linking it together. Sources include institutional and disciplinary repositories, open-access journals, data archives, software repositories, funder databases and registries of persistent identifiers. OpenAIRE gathers metadata from these, then deduplicates, cleans and enriches it, and — crucially — infers and records the relationships between items. The most valuable of these links is often the connection between an output and the project or grant that funded it, because that link is what lets the system answer the question funders most want answered: what did this funding produce? Building the graph is therefore not merely collection but integration: turning many partial, overlapping, inconsistently described records into a coherent connected whole. The quality of that integration depends entirely on the quality and consistency of the metadata being harvested.
What the graph makes possible
A connected, open graph of European research enables a range of uses that scattered records cannot:
- Funder reporting and monitoring. Funders can see the outputs that resulted from the research they supported, and monitor compliance with open-access requirements, without each grant holder reporting everything by hand.
- Discovery. Researchers can find related outputs, datasets and software connected to a project or topic, following links the graph has already drawn.
- Open-science monitoring. Because the graph records what is openly available and how, it supports measuring and encouraging open-science practice across institutions and countries.
- Building on open data. Because the graph is openly available, other services, analyses and national systems can be built on top of it rather than reconstructing the same links independently.
OpenAIRE and EOSC
OpenAIRE does not stand alone; it is a key part of the broader European open-science landscape, and in particular of the European Open Science Cloud (EOSC) — the initiative to create a shared environment for storing, sharing and reusing research data and other outputs across Europe. Within that landscape, OpenAIRE provides much of the connective tissue: the metadata aggregation and the graph that link outputs, data, software, projects and funding into a navigable whole. EOSC aims to give European research a common space for its outputs; OpenAIRE’s graph is a large part of what makes that space coherent rather than a collection of disconnected repositories. The two are complementary pieces of the same vision of an open, interconnected European research system.
Why shared metadata is the foundation
None of this works without shared, consistent metadata, and this is the deepest point about OpenAIRE. A graph that links outputs to projects to funding to people can only be built if those entities are described in compatible ways across all the systems being harvested. If one repository describes a funding link one way and another describes it differently, the graph cannot reliably connect them; if persistent identifiers for people, organisations and outputs are missing or inconsistent, the links cannot be drawn at all. OpenAIRE’s power is therefore a direct dividend of metadata interoperability: the more consistently the world’s repositories and databases describe research, the richer and more accurate the graph becomes. This is precisely why infrastructures like OpenAIRE depend on, and reinforce, the federation of research information — the principle that systems should connect and exchange information rather than operating in isolation, explored in our work on federation.
A consistent vocabulary behind the graph
For metadata to be interoperable across thousands of sources, the elements it contains must mean the same thing everywhere — output types, relationship types, funding and project information, contributor roles. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the information OpenAIRE harvests and links is understood identically wherever it originated. And because the contributions behind every output in the graph are part of the research record, they can be described in the same shared framework — the CRediT taxonomy and its full set of contribution roles. OpenAIRE shows what becomes possible when research information is connected; a shared vocabulary is what makes the connecting possible in the first place.
Leave a Reply