Tag: living DMP

  • Data management plan examples: from narrative DMP to maDMP

    The data management plan has a reputation problem. For many researchers it is a compliance document written in the final week before a grant deadline, accepted by the funder, and never opened again. That is a waste of a genuinely useful artefact, and it is also a missed opportunity, because the same plan can be expressed in a form that systems can act on. This article walks from a conventional narrative DMP to a machine-actionable one, showing what the structure buys you. It builds on the machine-actionable DMPs domain and connects to the workflows described under research administration.

    Where most DMPs start: the narrative plan

    A data management plan (DMP) is a document describing how data will be handled during and after a project: what data will be produced, how they will be stored and backed up, how they will be documented and shared, who is responsible, and how long they will be kept. In its common form it is narrative prose, often written against a funder or institutional template, answering a set of standard headings.

    A narrative answer typically reads something like this:

    “The project will generate approximately 200 GB of microscopy image data and associated tabular measurements. Active data will be stored on the institutional research-data store with nightly backup. On publication, processed datasets will be deposited in a generalist repository under a CC-BY licence and assigned a DOI. Raw image data containing no personal information will be retained for ten years in line with institutional policy. The principal investigator is responsible for data management.”

    There is nothing wrong with this. It is clear, honest, and answers the questions. Its limitation is purely that it is prose: a human must read it to extract any single fact, and no system can check it, update it, or connect it to anything else. Each fact — the licence, the repository, the retention period, the responsible person — is locked inside a sentence.

    The next step: structure the same content

    A machine-actionable DMP (maDMP) contains the same information, but expressed as structured, identified data rather than free text. The reference model is the RDA DMP Common Standard — a JSON schema developed by the Research Data Alliance to represent DMP content in a consistent, exchangeable form. Rather than a paragraph, each element becomes a typed field: a dataset has a title, a type, a personal-data flag, a planned size, a distribution with a named host and a licence, and a link to the responsible contributor, who is in turn identified by an ORCID iD.

    The narrative paragraph above, restructured against that model, becomes a set of explicit elements:

    • Dataset: “Microscopy image data” — type: image; personal data: no; estimated volume: 200 GB.
    • Distribution: host: named generalist repository; access: open; licence: CC-BY; identifier: DOI (assigned on deposit).
    • Retention: 10 years, per institutional policy.
    • Contributor: the principal investigator, identified by ORCID iD, with the role of data contact.
    • The whole plan itself carries a DMP ID — a persistent identifier, typically a DataCite DOI — so it can be cited and referenced across systems.

    The content is unchanged. What changes is that every fact is now addressable on its own.

    What the structure makes possible

    Structuring the plan is not bureaucracy for its own sake; it unlocks behaviours that a narrative simply cannot support.

    • Validation. A system can check that every planned dataset names a repository, that every distribution has a licence, and that the plan meets the funder’s required elements — before submission, automatically.
    • Exchange between systems. A plan authored in a DMP tool can be passed to the institution’s research-information system, to the repository at deposit time, and back to the funder, without anyone re-keying it. This is the maDMP exchange the standard was built for.
    • The living DMP. Because each element is addressable, the plan can be updated as the project unfolds — an anticipated dataset becomes a realised one when it is deposited, and the deposit’s DOI flows back into the plan. The DMP stops being frozen at award and becomes a current record of what actually happened to the data.
    • Connection to the wider record. Because the plan, its datasets, its contributors, and its host all carry identifiers, the DMP becomes a node in the identifier graph — linkable to the project (via a RAiD), to the people (via ORCID), to the institution (via ROR), and to the outputs (via their DOIs).

    A realistic view of where this stands

    It is worth being candid: machine-actionable DMPs are an active and maturing area, not a universally deployed reality. The RDA Common Standard exists and is implemented in several DMP tools; DMP IDs are being minted; and funders are beginning to express interest in structured plans. But many researchers still write, and many funders still accept, narrative plans, and the end-to-end exchange between tools, repositories, and funders is still being built out. The practical takeaway is not that you must produce a maDMP tomorrow, but that writing your narrative plan with structure in mind — naming repositories, stating licences explicitly, identifying people by ORCID, treating the plan as living — positions the same content to become machine-actionable as the infrastructure matures.

    The cheapest move toward a machine-actionable plan is to stop writing the DMP as an essay and start writing it as a set of clear, specific commitments — named repository, explicit licence, identified people, stated retention. Structured thinking comes before structured data.

    Where shared vocabulary fits

    “Dataset”, “distribution”, “retention period”, “data contact”, and “living DMP” need to mean the same thing in a DMP tool, a repository, a CRIS, and a funder’s system for any of this exchange to work. A shared, federated vocabulary that defines these elements precisely — pointing back to the RDA DMP Common Standard for the schema — is what lets a plan authored in one system be acted on by another. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the relevant terms sit in the machine-actionable DMPs domain.

    Related reading

  • Machine-actionable data management plans: the maDMP comes of age

    The data management plan has a reputation problem. For most of its existence it has been a document written under deadline pressure to satisfy a funder requirement, deposited as a PDF, and then never opened again. It describes intentions that, by the end of a project, may bear little resemblance to what actually happened to the data. The machine-actionable DMP is the response to that failure mode, and after some years of standards work it has come of age. This article explains what it is and why it matters, drawing on the machine-actionable DMPs domain.

    From document to data object

    A data management plan (DMP) is a description of the data-management practices to be followed during and after a research project: what data will be produced, how they will be stored and documented, under what licence and access conditions they will be shared, and how long they will be kept. A machine-actionable DMP (maDMP) is the same content expressed as structured data that research systems can exchange, validate, ingest, and update automatically, rather than as prose only a human can read.

    The distinction is not cosmetic. A prose DMP states that data will be deposited in a trusted repository; a maDMP carries that as a structured assertion that a repository system can read, act on, and later check against what was actually deposited. The DMP stops being a one-time document and becomes a node in the research-information graph, connected to the project, the outputs, the funder, and the people.

    The standard that made it possible: the RDA Common Standard

    Structured exchange requires an agreed structure, and that is the contribution of the RDA DMP Common Standard — the application profile developed by the Research Data Alliance to represent maDMP content in a common, system-neutral form. It defines the entities a DMP describes and the relationships between them, so that a DMP created in one tool means the same thing when read by another.

    The standard’s design encodes a useful distinction the prose form blurs: between an anticipated dataset — a dataset the DMP says will be produced — and a realised dataset, one that has actually been produced and, typically, deposited. A maDMP can carry both, which is precisely what lets a system at closeout check whether the datasets the plan anticipated were in fact realised and deposited. Around these sit the structured fields that prose tends to leave vague: the retention period, the licence assertion, the access control policy, the storage location, and a data volume estimate for storage planning.

    The DMP ID: giving the plan an identity

    For a DMP to be referenced across systems, it needs an identity, and that is the role of the DMP ID — a persistent identifier for a specific data management plan, typically a DOI minted by DataCite through tools such as the DMPTool, the DCC’s DMPonline, or ARGOS. With a DMP ID, the plan can be cited like any other research object: a funder can refer to it, a CRIS can link to it, an output can point back to the plan that anticipated it, and the connections become part of the persistent-identifier graph alongside ORCID, ROR, and the grant ID. The DMP ID is what turns the DMP from a loose attachment into a first-class, addressable entity in the persistent-identifier ecosystem.

    The living DMP

    The deepest change the maDMP enables is conceptual: the move from the frozen DMP to the living DMP — a plan updated throughout the project lifecycle rather than fixed at award. A frozen DMP is a prediction made at the least-informed moment of a project, before any data exist. A living DMP is a record that tracks reality: as anticipated datasets become realised, as storage decisions change, as access conditions are settled, the plan is updated, and a DMP version captures each snapshot.

    The frozen DMP answers the question “what did the applicant promise at award?” The living maDMP answers a far more useful question: “what is actually happening to this project’s data, right now?” Only the second is worth the effort of maintaining.

    This is where maDMP exchange earns its keep. When the DMP is structured and identified, a change made in one system can propagate — from a DMP tool to a CRIS, from the CRIS to a repository — so that the plan stays current without re-keying. A scheduled DMP review event becomes a checkpoint against live data rather than a re-reading of a stale document, and a DMP completeness score can be computed automatically against the funder’s required elements.

    Why funders and institutions want this

    The maDMP is not an end in itself; it is wanted because it makes obligations checkable. A funder that requires data to be deposited in a trusted repository under an open licence can, with structured maDMPs, verify that the realised datasets meet the commitment, rather than trusting a final-report paragraph. An institution can monitor data-management compliance across its whole portfolio as a query over structured plans. And the researcher, crucially, benefits too: a living maDMP linked to the project’s outputs means the closeout data-management report is largely assembled already, not reconstructed from memory. This is the same dividend that structured grant and disclosure data pay throughout research administration.

    Where shared vocabulary fits

    The RDA Common Standard supplies the structure — the shape of a maDMP. It does not, on its own, fix the controlled values that populate it: the list of access categories, the licence vocabulary, the dataset-status terms. Two systems can both emit valid Common Standard maDMPs and still disagree on what “restricted access” or “realised” means. That definitional gap, below the structural model, is exactly what a shared, federated vocabulary fills, pointing back to the RDA for the standard and to DataCite for the DMP ID infrastructure. Supplying it is the role the CASRAI dictionary is built for.

    What to do now

    For researchers and data stewards: treat the DMP as a living, structured object with a DMP ID, updated as anticipated datasets become realised. For funders: ask for maDMPs against the RDA Common Standard and verify realised against anticipated at closeout. For standards work: pair the structural standard with shared value vocabularies so that maDMPs from different tools genuinely interoperate.

    Related reading

  • Living DMPs: dynamic data management plans across the lifecycle

    The data management plan has a familiar life story, and for much of its history an unhappy one. A researcher writes a plan to satisfy a funder’s requirement at proposal stage, describing data that does not yet exist. The plan is submitted, the grant awarded, and the document filed away — never opened again. By the time real data begins to flow, the plan is already a work of fiction: the formats changed, the volumes grew, the consent arrangements were refined. A plan written once and never revisited describes the project that was imagined, not the one that happened. The living DMP — a plan that updates dynamically across the lifecycle of a project — is the response, and it belongs to the machine-actionable DMP domain of the CASRAI Dictionary.

    The trouble with the static plan

    The deepest flaw in the traditional DMP is one of timing. It is required precisely when the least is known — at the proposal stage, before the work has begun — then frozen at the moment it is most speculative. Research is not like that. Data management decisions are made and revised continuously: the instruments produce more data than expected, an ethics review changes how participant data must be handled, a new repository becomes the obvious home. A static plan cannot reflect any of this. It becomes, at best, a historical curiosity and, at worst, a misleading record nobody trusts. The energy spent writing it is largely wasted, because the document never connects to the reality it was meant to govern. The static plan fails not because planning is pointless but because a plan that cannot change cannot stay true.

    What makes a DMP “living”

    A living DMP treats the plan as a dynamic document that evolves with the project rather than a fixed deliverable. It is created at the start, as before, but expected to change — updated as decisions are made, as data is produced, and as circumstances shift. The aim is for the plan to remain an accurate description of how the project’s data is actually being managed, useful to the people doing the work rather than written only for an external reader. A living plan can answer real questions: where is this dataset stored, under what licence will it be shared, who is responsible. Because it stays current, it can guide practice, support handovers, and provide an honest record at the end. The shift is from plan-as-document to plan-as-living-record — from something written to be filed, to something maintained to be used.

    Why machine-actionability is the key

    Keeping a plan current by hand is a burden few will sustain, which is why the living DMP depends on machine-actionability. A traditional plan is prose: to update it, a human must edit text. A machine-actionable DMP (a maDMP) expresses its content as structured, machine-readable information, and this changes what is possible. Updates need not be manual: when a dataset is deposited, when an identifier is minted, when a project record changes, the plan can be updated automatically to reflect what has actually happened. The structure also lets the plan be checked — systems can verify whether stated commitments have been met — and lets it exchange information with other systems. Machine-actionability is what makes “living” sustainable: the plan keeps pace with the project without depending on someone remembering to rewrite a document nobody wants to maintain.

    The RDA DMP Common Standard

    For plans to update automatically by exchanging information with repositories, funder systems and institutional databases, those systems must agree on how a plan’s contents are represented. This is the contribution of the RDA DMP Common Standard, developed through the Research Data Alliance: a common, machine-actionable model for the information a data management plan contains. By defining a shared structure for the elements of a plan — the datasets, their characteristics, storage and preservation arrangements, licensing, costs and contributors — the standard lets a plan be created in one tool, understood by another, and updated by information arriving from a third. Without it, every system would represent a plan differently and automatic exchange would be impossible; with it, a living, dynamic DMP can flow between the systems that read and update it.

    Integration with repositories and CRIS

    The living DMP only realises its value when connected to the systems where the real activity happens. Two integrations matter most:

    • Repositories. When data is deposited, the repository holds authoritative information — identifiers, formats, access conditions. A connected DMP can be updated from this directly, so the plan reflects what has actually been deposited rather than what was once intended.
    • Current research information systems (CRIS). A CRIS holds the institutional picture of projects, people, grants and outputs. Linking the DMP to the CRIS lets the plan draw on and contribute to that picture, keeping data management visible alongside the rest of a project’s record — a concern of research administration more broadly.

    Through these connections the plan stops being an isolated document and becomes a node in the research-information landscape — reading from and writing to the systems that record what a project is doing. This is what turns the machine-actionable plan from a clever idea into an operational reality.

    A consistent vocabulary for plans that travel

    For a living DMP to exchange information with repositories, funder systems and a CRIS, the elements it contains must mean the same thing in every system it touches. A licence, a retention period or an access condition recorded in the plan must be understood identically by the repository that updates it and the CRIS that reads it. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so the contents of a machine-actionable plan are understood the same way wherever they flow. And because the people who steward a project’s data make a real contribution, that work can be described in the same shared framework — the CRediT taxonomy and its Data curation role. The static DMP described the project that was imagined; the living DMP describes the project as it really happens — and stays useful from proposal to completion.