Tag: data management plan

  • Data management plan examples: from narrative DMP to maDMP

    The data management plan has a reputation problem. For many researchers it is a compliance document written in the final week before a grant deadline, accepted by the funder, and never opened again. That is a waste of a genuinely useful artefact, and it is also a missed opportunity, because the same plan can be expressed in a form that systems can act on. This article walks from a conventional narrative DMP to a machine-actionable one, showing what the structure buys you. It builds on the machine-actionable DMPs domain and connects to the workflows described under research administration.

    Where most DMPs start: the narrative plan

    A data management plan (DMP) is a document describing how data will be handled during and after a project: what data will be produced, how they will be stored and backed up, how they will be documented and shared, who is responsible, and how long they will be kept. In its common form it is narrative prose, often written against a funder or institutional template, answering a set of standard headings.

    A narrative answer typically reads something like this:

    “The project will generate approximately 200 GB of microscopy image data and associated tabular measurements. Active data will be stored on the institutional research-data store with nightly backup. On publication, processed datasets will be deposited in a generalist repository under a CC-BY licence and assigned a DOI. Raw image data containing no personal information will be retained for ten years in line with institutional policy. The principal investigator is responsible for data management.”

    There is nothing wrong with this. It is clear, honest, and answers the questions. Its limitation is purely that it is prose: a human must read it to extract any single fact, and no system can check it, update it, or connect it to anything else. Each fact — the licence, the repository, the retention period, the responsible person — is locked inside a sentence.

    The next step: structure the same content

    A machine-actionable DMP (maDMP) contains the same information, but expressed as structured, identified data rather than free text. The reference model is the RDA DMP Common Standard — a JSON schema developed by the Research Data Alliance to represent DMP content in a consistent, exchangeable form. Rather than a paragraph, each element becomes a typed field: a dataset has a title, a type, a personal-data flag, a planned size, a distribution with a named host and a licence, and a link to the responsible contributor, who is in turn identified by an ORCID iD.

    The narrative paragraph above, restructured against that model, becomes a set of explicit elements:

    • Dataset: “Microscopy image data” — type: image; personal data: no; estimated volume: 200 GB.
    • Distribution: host: named generalist repository; access: open; licence: CC-BY; identifier: DOI (assigned on deposit).
    • Retention: 10 years, per institutional policy.
    • Contributor: the principal investigator, identified by ORCID iD, with the role of data contact.
    • The whole plan itself carries a DMP ID — a persistent identifier, typically a DataCite DOI — so it can be cited and referenced across systems.

    The content is unchanged. What changes is that every fact is now addressable on its own.

    What the structure makes possible

    Structuring the plan is not bureaucracy for its own sake; it unlocks behaviours that a narrative simply cannot support.

    • Validation. A system can check that every planned dataset names a repository, that every distribution has a licence, and that the plan meets the funder’s required elements — before submission, automatically.
    • Exchange between systems. A plan authored in a DMP tool can be passed to the institution’s research-information system, to the repository at deposit time, and back to the funder, without anyone re-keying it. This is the maDMP exchange the standard was built for.
    • The living DMP. Because each element is addressable, the plan can be updated as the project unfolds — an anticipated dataset becomes a realised one when it is deposited, and the deposit’s DOI flows back into the plan. The DMP stops being frozen at award and becomes a current record of what actually happened to the data.
    • Connection to the wider record. Because the plan, its datasets, its contributors, and its host all carry identifiers, the DMP becomes a node in the identifier graph — linkable to the project (via a RAiD), to the people (via ORCID), to the institution (via ROR), and to the outputs (via their DOIs).

    A realistic view of where this stands

    It is worth being candid: machine-actionable DMPs are an active and maturing area, not a universally deployed reality. The RDA Common Standard exists and is implemented in several DMP tools; DMP IDs are being minted; and funders are beginning to express interest in structured plans. But many researchers still write, and many funders still accept, narrative plans, and the end-to-end exchange between tools, repositories, and funders is still being built out. The practical takeaway is not that you must produce a maDMP tomorrow, but that writing your narrative plan with structure in mind — naming repositories, stating licences explicitly, identifying people by ORCID, treating the plan as living — positions the same content to become machine-actionable as the infrastructure matures.

    The cheapest move toward a machine-actionable plan is to stop writing the DMP as an essay and start writing it as a set of clear, specific commitments — named repository, explicit licence, identified people, stated retention. Structured thinking comes before structured data.

    Where shared vocabulary fits

    “Dataset”, “distribution”, “retention period”, “data contact”, and “living DMP” need to mean the same thing in a DMP tool, a repository, a CRIS, and a funder’s system for any of this exchange to work. A shared, federated vocabulary that defines these elements precisely — pointing back to the RDA DMP Common Standard for the schema — is what lets a plan authored in one system be acted on by another. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the relevant terms sit in the machine-actionable DMPs domain.

    Related reading

  • Software management plans: extending the DMP to research code

    The data management plan has become a fixture of modern research. Funders ask for one, repositories expect data to be deposited, and researchers increasingly treat planning for data as a normal part of designing a project. Yet a quieter gap has opened alongside this success. The data a study produces is now planned for and curated, but the software that generates, cleans and analyses that data — the scripts, pipelines, models and bespoke tools that do the actual computational work — is frequently left undocumented, unversioned and unpreserved. A dataset deposited without the code that produced it is often only half of a reproducible result. The software management plan exists to close that gap by extending the discipline of the DMP to research code, a development that sits naturally within the machine-actionable DMP domain of the CASRAI Dictionary.

    Why data planning is not enough

    A data management plan answers questions about what data will be collected, how it will be documented, where it will be stored and how it can be reused. Those questions matter, but they assume the data can stand on its own. In computational research it frequently cannot. The meaning of a dataset often depends on exactly how it was processed: which version of which analysis script, with which parameters, in which software environment. If that processing is captured only as an informal collection of files on a researcher’s laptop, then the data, however well curated, cannot be fully understood or reproduced once the project ends and the person who wrote the code moves on. Planning for the data while ignoring the code that gives it meaning leaves a reproducibility gap that no amount of careful data deposit can fill.

    What a software management plan covers

    A software management plan asks of code the kinds of questions a DMP asks of data, adapted to the nature of software. Typical concerns include:

    • Identification and versioning. How will the software be version-controlled, and how will specific versions be identified so that a paper can cite the exact version that produced its results?
    • Documentation. What documentation will let someone else install, run and understand the software — dependencies, environment, usage, and the assumptions built into it?
    • Licensing. Under what licence will the software be released, so that others know what they are permitted to do with it?
    • Preservation and availability. Where will the software be deposited for the long term, and how will it be given a persistent identifier so it remains findable and citable after the project closes?
    • Sustainability. Who maintains the software during the project, and what is the realistic plan for it afterwards — archived as a snapshot, handed on, or actively maintained?

    The aim is not to turn every analysis script into a maintained software product. It is to make a deliberate, honest decision about how each piece of research software will be handled, proportionate to its importance.

    The FAIR4RS principles

    Underpinning this is the recognition that software is itself a research output deserving the same care as data. The FAIR principles — that outputs should be Findable, Accessible, Interoperable and Reusable — were written with data in mind, and applying them to software required adaptation, because code differs from data in important ways: it executes, it has dependencies, it changes through versions, and its reusability depends on documentation and environment as much as on access. The FAIR4RS principles (FAIR for Research Software) provide that adaptation, articulating what findability, accessibility, interoperability and reusability mean for software specifically. A software management plan is, in effect, the practical instrument for delivering FAIR4RS within a project: it is where the abstract principles become concrete commitments about identifiers, documentation, licensing and preservation.

    Making plans machine-actionable

    The most significant shift in management planning generally is the move from prose documents to machine-actionable formats. A traditional plan is a narrative that a human reads once and files away; nothing checks whether its commitments were kept. A machine-actionable plan (a maDMP, and by extension a machine-actionable software management plan) expresses its commitments as structured, machine-readable statements that systems can act on. This matters for software because so many of its commitments are checkable. Whether code was deposited in a repository, given a persistent identifier, assigned a licence and linked to the paper that uses it are all facts a system can verify rather than merely record. A machine-actionable plan can therefore connect the promise to the evidence: it can be updated automatically as a repository is created and an identifier minted, and it can flag where a stated commitment has not yet been met. The plan becomes a living part of the research workflow rather than a document written to satisfy a requirement and then forgotten.

    Software as a first-class output

    All of this rests on treating software as a genuine research output, not an invisible by-product. The broader movement to recognise the full range of scholarly outputs — data, software, protocols and more — is the subject of our research outputs domain. When research software is identified, versioned, licensed, preserved and citable, it can be found and reused by others, and its development can be recognised as the substantial intellectual work it often is. A software management plan is the planning instrument that makes those properties likely rather than accidental.

    Crediting and standardising software work

    Because building research software is real contribution, it belongs in the structured account of who did what. The CRediT taxonomy — whose full set of contribution types is described in our overview of the CRediT roles — captures this through its Software role, which recognises programming, implementation and the development and testing of code, alongside Data curation for the data that code produces. For a software management plan to be machine-actionable across the systems that consume it — repositories, CRIS platforms, funder reporting — the elements it contains must mean the same thing everywhere. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the licences, identifiers, output types and roles flowing through a software management plan are understood identically wherever they travel. Data planning made data a managed asset; software management planning extends the same care to the code that makes the data mean something.

  • Machine-actionable data management plans: the maDMP comes of age

    The data management plan has a reputation problem. For most of its existence it has been a document written under deadline pressure to satisfy a funder requirement, deposited as a PDF, and then never opened again. It describes intentions that, by the end of a project, may bear little resemblance to what actually happened to the data. The machine-actionable DMP is the response to that failure mode, and after some years of standards work it has come of age. This article explains what it is and why it matters, drawing on the machine-actionable DMPs domain.

    From document to data object

    A data management plan (DMP) is a description of the data-management practices to be followed during and after a research project: what data will be produced, how they will be stored and documented, under what licence and access conditions they will be shared, and how long they will be kept. A machine-actionable DMP (maDMP) is the same content expressed as structured data that research systems can exchange, validate, ingest, and update automatically, rather than as prose only a human can read.

    The distinction is not cosmetic. A prose DMP states that data will be deposited in a trusted repository; a maDMP carries that as a structured assertion that a repository system can read, act on, and later check against what was actually deposited. The DMP stops being a one-time document and becomes a node in the research-information graph, connected to the project, the outputs, the funder, and the people.

    The standard that made it possible: the RDA Common Standard

    Structured exchange requires an agreed structure, and that is the contribution of the RDA DMP Common Standard — the application profile developed by the Research Data Alliance to represent maDMP content in a common, system-neutral form. It defines the entities a DMP describes and the relationships between them, so that a DMP created in one tool means the same thing when read by another.

    The standard’s design encodes a useful distinction the prose form blurs: between an anticipated dataset — a dataset the DMP says will be produced — and a realised dataset, one that has actually been produced and, typically, deposited. A maDMP can carry both, which is precisely what lets a system at closeout check whether the datasets the plan anticipated were in fact realised and deposited. Around these sit the structured fields that prose tends to leave vague: the retention period, the licence assertion, the access control policy, the storage location, and a data volume estimate for storage planning.

    The DMP ID: giving the plan an identity

    For a DMP to be referenced across systems, it needs an identity, and that is the role of the DMP ID — a persistent identifier for a specific data management plan, typically a DOI minted by DataCite through tools such as the DMPTool, the DCC’s DMPonline, or ARGOS. With a DMP ID, the plan can be cited like any other research object: a funder can refer to it, a CRIS can link to it, an output can point back to the plan that anticipated it, and the connections become part of the persistent-identifier graph alongside ORCID, ROR, and the grant ID. The DMP ID is what turns the DMP from a loose attachment into a first-class, addressable entity in the persistent-identifier ecosystem.

    The living DMP

    The deepest change the maDMP enables is conceptual: the move from the frozen DMP to the living DMP — a plan updated throughout the project lifecycle rather than fixed at award. A frozen DMP is a prediction made at the least-informed moment of a project, before any data exist. A living DMP is a record that tracks reality: as anticipated datasets become realised, as storage decisions change, as access conditions are settled, the plan is updated, and a DMP version captures each snapshot.

    The frozen DMP answers the question “what did the applicant promise at award?” The living maDMP answers a far more useful question: “what is actually happening to this project’s data, right now?” Only the second is worth the effort of maintaining.

    This is where maDMP exchange earns its keep. When the DMP is structured and identified, a change made in one system can propagate — from a DMP tool to a CRIS, from the CRIS to a repository — so that the plan stays current without re-keying. A scheduled DMP review event becomes a checkpoint against live data rather than a re-reading of a stale document, and a DMP completeness score can be computed automatically against the funder’s required elements.

    Why funders and institutions want this

    The maDMP is not an end in itself; it is wanted because it makes obligations checkable. A funder that requires data to be deposited in a trusted repository under an open licence can, with structured maDMPs, verify that the realised datasets meet the commitment, rather than trusting a final-report paragraph. An institution can monitor data-management compliance across its whole portfolio as a query over structured plans. And the researcher, crucially, benefits too: a living maDMP linked to the project’s outputs means the closeout data-management report is largely assembled already, not reconstructed from memory. This is the same dividend that structured grant and disclosure data pay throughout research administration.

    Where shared vocabulary fits

    The RDA Common Standard supplies the structure — the shape of a maDMP. It does not, on its own, fix the controlled values that populate it: the list of access categories, the licence vocabulary, the dataset-status terms. Two systems can both emit valid Common Standard maDMPs and still disagree on what “restricted access” or “realised” means. That definitional gap, below the structural model, is exactly what a shared, federated vocabulary fills, pointing back to the RDA for the standard and to DataCite for the DMP ID infrastructure. Supplying it is the role the CASRAI dictionary is built for.

    What to do now

    For researchers and data stewards: treat the DMP as a living, structured object with a DMP ID, updated as anticipated datasets become realised. For funders: ask for maDMPs against the RDA Common Standard and verify realised against anticipated at closeout. For standards work: pair the structural standard with shared value vocabularies so that maDMPs from different tools genuinely interoperate.

    Related reading

  • Evaluating data management plans: how funders and institutions review DMPs

    Data management plans have become a near-universal requirement. Funders ask for them at the proposal stage, institutions increasingly expect them, and researchers have largely accepted that planning for data is part of designing a project. But requiring a plan and getting a good plan are two very different things. A DMP written hastily to satisfy a requirement, glanced at once and never looked at again, achieves almost nothing — it is a box ticked, not a commitment made. The harder, less-discussed half of the DMP story is evaluation: how plans are actually reviewed, against what criteria, by whom, and with what consequences. As DMPs mature, attention is rightly shifting from whether they exist to whether they are any good. This article examines DMP evaluation, drawing on the machine-actionable DMP domain of the CASRAI Dictionary.

    Why evaluation matters

    The case for taking DMP review seriously is straightforward. If a plan is never assessed, there is little incentive to write a good one, and the requirement degenerates into a formality that consumes effort without improving practice. Evaluation is what gives a DMP teeth: it signals that the plan is expected to be substantive, it provides researchers with feedback they can act on, and it lets funders identify proposals where the data-handling arrangements are inadequate or unrealistic. A reviewed DMP is a commitment someone has engaged with; an unreviewed DMP is a wish.

    Rubrics and review criteria

    To review plans fairly and consistently, reviewers need criteria, and this has driven the development of DMP rubrics — structured frameworks that lay out what a good plan should address and how to judge it. A rubric breaks the assessment down into components and gives reviewers a consistent basis for judging each one, so that plans are evaluated against the same expectations rather than according to each reviewer’s personal sense of what matters. Typical dimensions a rubric covers include:

    • Data description. Is it clear what data will be produced or used, in what formats and volumes?
    • Documentation and metadata. Will the data be documented well enough to be understood and reused?
    • Storage and security. Are arrangements for storing and protecting the data, including any sensitive data, adequate?
    • Preservation and sharing. Where will the data be deposited, under what access conditions and licence, and for how long?
    • Ethical and legal compliance. Are consent, privacy and legal obligations properly addressed?
    • Roles and resources. Is it clear who is responsible, and are the resources to do this realistic?

    One prominent example is the DART (Data management plan Analysis, Reporting and Tracking) rubric, developed to help institutions and reviewers assess DMPs systematically and consistently. Tools and rubrics of this kind matter because they turn “is this a good plan?” — a vague and subjective question — into a structured assessment that different reviewers can apply in comparable ways.

    Funder assessment in practice

    Funders approach DMP assessment in different ways and at different points. Some review the plan as part of the proposal, treating the quality of data-handling arrangements as one factor in deciding what to fund. Others emphasise the DMP as a project deliverable, expecting it to be developed and updated as the project proceeds. In either case, the trend is towards taking the plan seriously as something to be engaged with, not merely collected. There is a balance to strike: assessment should be rigorous enough to improve practice but proportionate enough not to impose a heavy burden. A purely bureaucratic review risks producing better-written but no better-managed data; the aim is to improve what actually happens to the data, not just the prose describing it.

    Feedback loops

    Perhaps the most valuable, and most often neglected, aspect of DMP evaluation is the feedback loop. Assessment is most useful when it is not merely a gate — pass or fail — but a source of guidance that helps researchers improve their plans and their practice. Feedback can flow in several directions:

    • To the researcher, pointing out weaknesses and suggesting improvements, ideally early enough to act on.
    • Into the project, where a plan reviewed at the start can be revisited and updated as the work develops and the data takes shape.
    • Back to support services, where patterns across many plans reveal where researchers commonly struggle, so that training and support can be targeted.

    Feedback is what turns evaluation from a judgement into a constructive tool. A plan that comes back with specific, actionable comments helps the researcher do better; a plan that simply passes or fails teaches nothing.

    Machine-actionable checks

    The move towards machine-actionable DMPs (maDMPs) opens a powerful possibility for evaluation: automating the parts of review that can be automated. When a plan is expressed as structured, machine-readable data rather than free prose, certain checks no longer require a human. A system can verify whether a repository has been specified, whether a licence has been chosen, whether an identifier has been minted, or whether commitments are consistent with funder policy. This does not replace expert human judgement — assessing whether the chosen approach suits the research still requires understanding — but it can handle the routine, checkable elements automatically, freeing reviewers to focus on the judgements that genuinely need them. Machine-actionable checks can also run continuously, so that a living plan is monitored against its commitments throughout a project rather than assessed only once.

    A shared vocabulary for review

    For DMP evaluation to work consistently — across funders, institutions and the tools that support planning — the elements being reviewed and the criteria applied must mean the same thing everywhere. A plan written against one set of expectations and reviewed against another, or described in terms a reviewing system cannot interpret, defeats the purpose. That consistency is what the CASRAI Dictionary supports: a shared vocabulary so that the components of a data management plan are understood identically by those who write them and those who review them, supporting sound research administration. And because reviewing and supporting data management is genuine contribution, the work can be described in the same framework used for every other — the CRediT taxonomy and its full set of contribution roles. A DMP is only as valuable as the seriousness with which it is reviewed; good evaluation is what turns the plan from a promise into a practice.