For most of their history, data management plans have led a curiously isolated existence. A researcher writes one to satisfy a funder, submits it as a document, and there it usually stays — a static file, disconnected from the project it describes, the outputs it anticipates, and the systems that manage everything else. This is a waste. A plan describes the data a project will produce, the people responsible, the repositories that will hold it, and the funder behind it — all entities the wider research infrastructure already tracks with persistent identifiers. Connecting the plan to that infrastructure transforms it from an inert document into a living, linked object. This article explains how DMP IDs and machine-actionable plans do exactly that, drawing on the machine-actionable DMP domain of the CASRAI Dictionary.
Giving the plan an identity
The first step is to give the plan itself a persistent identifier. A DMP ID — a persistent identifier for a data management plan, issued through an infrastructure such as DataCite — makes the plan a first-class object in the scholarly record: something that can be referenced unambiguously, cited, and linked to other objects. This sounds modest but it is the keystone of everything that follows. Once a plan has a stable identifier, it can be pointed at and pointed from. A paper can cite the plan that governed its data; a dataset can link back to the plan that anticipated it; a funder can reference the plan associated with a grant. Without an identifier, the plan is just a file somewhere; with one, it becomes a node in the network of research objects, participating in the web of relationships that connects publications, data, people and grants.
Connecting to the wider PID ecosystem
The real power emerges when the DMP ID connects the plan to the other persistent identifiers that describe a project. The plan’s authors can be identified by their ORCID iDs; their institutions by ROR identifiers; the funding by a grant identifier and the funder’s own identifier; the anticipated outputs by DOIs once they exist. Threaded together, these links let the plan take its place in the connected research landscape:
- Plan to people. ORCID links the plan to the researchers responsible, so it appears in their record of activity.
- Plan to institutions. ROR connects the plan to the organisations involved.
- Plan to funding. Grant and funder identifiers tie the plan to the money that required and supported it, helping funders see that planning commitments were made and, in time, met.
- Plan to outputs. Links to the datasets, software and publications that result let anyone trace from the plan to what was actually produced.
The plan stops being a one-off submission and becomes part of the same identifier graph that already connects the rest of the research enterprise.
The RDA DMP Common Standard
Connecting plans to the PID ecosystem requires that the plans themselves be machine-readable in a consistent way, and this is where the RDA DMP Common Standard comes in. Developed through the Research Data Alliance, the Common Standard is an application profile that defines a shared, structured model for expressing the content of a data management plan — its datasets, contributors, hosts, costs, distributions and the rest — in a machine-actionable form. Its purpose is interoperability: a plan expressed according to the Common Standard means the same thing to any system that understands the standard, regardless of which tool created it. This is what allows a machine-actionable DMP (maDMP) to be more than a document trapped in one application. Where a narrative plan is prose that only a human can interpret, a maDMP expressed in the Common Standard is structured data that systems can read, validate, update and exchange.
Exchanging maDMPs between systems
The consequence of a shared standard is that plans can flow between systems rather than being re-keyed at every step. A maDMP created in a DMP tool can be passed to a repository when data is deposited, so the repository already knows what was planned; it can be exchanged with a current research information system (CRIS) so the institution’s record of the project includes its data plan; and it can be shared with funders in a form their systems can ingest, rather than as a PDF a person must read. Information entered once can travel to wherever it is needed, kept consistent across the project’s many systems. This exchange is precisely the kind of federation of research information that reduces duplication and keeps records aligned — the principle, explored in our work on federation, that systems should connect and share rather than each maintaining its own disconnected copy.
From a checkbox to a connected object
Taken together, these developments change what a data management plan is. The combination of a persistent DMP ID, links to ORCID, ROR and grant identifiers, and a shared machine-actionable standard turns the plan from a compliance checkbox into a connected, citable, living object — one that participates in the research lifecycle from the moment it is written, that updates as the project progresses, and that connects to the outputs it anticipated. The plan becomes useful to the researcher, the institution and the funder alike, rather than a box ticked at the start and forgotten.
A consistent vocabulary behind the links
None of this works unless the elements being linked and exchanged mean the same thing across systems — what a dataset entry, a contributor role, a host or a cost denotes in a plan. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that a machine-actionable plan is understood identically wherever it travels. And because the contributions a plan records — data curation, software development and the rest — are part of the research record, they can be described in the same framework, the CRediT taxonomy and its full set of contribution roles. A DMP ID gives the plan an identity; the PID ecosystem gives it relationships; and a shared vocabulary lets those relationships mean what they should.
Leave a Reply