Category: Guides & Explainers

Practical how-to guides, templates, checklists, and career pathways for research administrators, authors, and institutional teams.

  • NSPM-33 Disclosure and Compliance: A Roadmap for US Research Security and Administrative Teams

    Introduction to NSPM-33 and Research Security

    National Security Presidential Memorandum 33 (NSPM-33) directs key United States federal funding agencies to establish unified, robust requirements for research security. Primarily targeting academic institutions receiving more than $50 million in annual federal research and development funding, NSPM-33 mandates the implementation of standardized disclosure requirements and formal Research Security Programs. For research administrators, compliance is a high-stakes endeavor—non-compliance risks the loss of millions in federal grants and severe reputational damage.

    This roadmap provides US research security teams with a step-by-step compliance guide, analyzing disclosure mandates, programmatic requirements, and technological solutions.

    Core Pillars of NSPM-33 Compliance

    The White House Office of Science and Technology Policy (OSTP) has organized NSPM-33 execution into two fundamental pillars: Disclosures and Institutional Programs.

    Pillar Key Requirements Primary Administrative Impact
    Unified Disclosures Standardized formats (SciENcv) for Biosketches, Current and Pending Support, and digital persistent identifiers (PIDs). Eliminates agency-specific variation; mandates complete disclosure of foreign affiliations and funding.
    Institutional Programs Formal research security programs addressing cybersecurity, foreign travel, export control, and insider threat training. Requires a designated Research Security Officer (RSO) and mandatory annual training for research staff.

    Standardizing Disclosures: The Role of SciENcv and PIDs

    To eliminate administrative complexity, NSPM-33 guidelines enforce the use of standardized digital formats across major federal agencies, including the NSF, NIH, DOE, and DoD:

    • SciENcv Integration: Investigators must use the Science Experts Network Curriculum Vitae (SciENcv) tool to generate Biosketches and Current and Pending Support documents. Hand-written or custom-formatted PDFs are no longer accepted.
    • Digital Persistent Identifiers (PIDs): The implementation guidelines highly encourage or mandate the use of authenticated persistent identifiers, such as ORCID iDs, to link researchers to their affiliations, grants, and publications. This digital lineage allows automated compliance checking and reduces the risk of omitted disclosures.
    • Failing to Disclose: Omissions—whether accidental or intentional—regarding foreign talent recruitment programs, international laboratory space, or non-monetary support (e.g., equipment, postdocs funded by external governments) are subject to civil and criminal penalties.

    Implementing a Certified Research Security Program

    For universities exceeding the $50M federal funding threshold, research administrators must implement and document a comprehensive Research Security Program covering four core areas:

    1. Cybersecurity Safeguards

    The institution must provide a secure IT network compliant with NIST SP 800-171 or CMMC standards. This includes multi-factor authentication, end-to-end data encryption for research data, and regular vulnerability scanning.

    2. Foreign Travel Security

    Establish travel registry policies requiring researchers to register international travel funded by federal grants. Provide mandatory pre-travel briefings, security training, and clean loaner devices (laptops/phones) for travel to high-risk nations.

    3. Export Control & Disclosure Oversight

    Implement rigorous export control protocols (covering ITAR and EAR) to track dual-use technologies, sensitive biological agents, and advanced aerospace designs. Insist on annual audits of international collaborative agreements.

    4. Insider Threat and Research Integrity Training

    Deploy mandatory training modules for all faculty, postdocs, and graduate students working on federal grants. The curriculum must cover intellectual property theft, ethical collaboration boundaries, and disclosure reporting mechanisms.

    Conclusion: Building a Culture of Trustworthy Science

    NSPM-33 compliance should not be viewed simply as a bureaucratic burden. When implemented correctly, a robust Research Security Program protects researchers’ intellectual property, safeguards tax-payer-funded discoveries, and ensures academic freedom is preserved. By leveraging modern digital tools like SciENcv, ORCID, and robust encryption protocols, US institutions can secure their research pipelines while maintaining their position as global leaders in scientific collaboration.

  • Implementing the CRediT Taxonomy: Practical Guide for Journals, Libraries, and Research Administrators

    Introduction to the Contributor Roles Taxonomy (CRediT)

    The traditional model of academic authorship—which ranks researchers in a linear sequence (first author, co-author, corresponding author)—fails to reflect the multi-faceted reality of modern scientific collaboration. Large-scale research requires specialized roles, including software development, data curation, project administration, and hardware calibration. To provide granular, machine-readable attribution, CASRAI pioneered and NISO standardized the CRediT Taxonomy (Contributor Roles Taxonomy), consisting of 14 distinct roles.

    This practical guide outlines how journals, research libraries, and university administrators can implement the CRediT Taxonomy to build transparent, equitable, and modern evaluation systems.

    The 14 CRediT Roles and Definitions

    To ensure high data quality, all stakeholders must understand and apply the 14 standardized roles consistently:

    CRediT Contributor Role Official Definition and Scope
    Conceptualization Ideas; formulation or evolution of overarching research goals and aims.
    Data Curation Management activities to annotate, scrub data and maintain research data.
    Formal Analysis Application of statistical, mathematical, computational, or other formal techniques to analyze study data.
    Funding Acquisition Acquisition of the financial support for the project leading to this publication.
    Investigation Conducting a research and investigation process, specifically performing the experiments, or data/evidence collection.
    Methodology Development or design of methodology; creation of models.
    Project Administration Management and coordination responsibility for the research activity planning and execution.
    Resources Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools.
    Software Programming, software development; designing computer programs; implementing the computer code and supporting algorithms.
    Supervision Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team.
    Validation Verification, whether as a part of the activity or separate, of the overall replication/reproducibility of results/experiments.
    Visualization Preparation, creation and/or presentation of the published work, specifically data visualization/presentation.
    Writing – Original Draft Preparation, creation and/or presentation of the published work, specifically writing the initial draft.
    Writing – Review & Editing Preparation, creation and/or presentation of the published work by those from the original research group, specifically critical review, commentary or revision.

    Implementation Roadmap for Journals and Publishers

    For scholarly journals, capturing contributor roles during submission requires minor changes to editorial management software (e.g., Editorial Manager, ScholarOne, OJS):

    • Mandate at Submission: Require the corresponding author to assign one or more of the 14 CRediT roles to every listed author during the metadata entry phase. Authors can have multiple roles, and multiple authors can share the same role.
    • Integrate XML Metadata: Export the selected roles in the JATS XML format using the <contrib-group> tag attributes. This ensures indexers like PubMed, Crossref, and Scopus can harvest and display the contributor data programmatically.
    • Visible Authorship Statements: Render a clear, dedicated ‘Author Contributions’ section at the end of every PDF and HTML article layout, translating the XML metadata into human-readable text.

    The Role of Libraries and Administrators

    University libraries and research administrators can leverage CRediT metadata to drive fairer evaluation and protect research security:

    Improving Evaluation and Hiring

    By mapping CRediT data to university CRIS systems, promotion committees can look beyond traditional citation counts. For example, hiring committees can identify highly skilled research programmers or biostatisticians whose names appear in the middle of authorship lists but who executed 100% of the ‘Software’ and ‘Formal Analysis’ work.

    Strengthening Research Security

    With frameworks like NSPM-33 demanding complete transparency, CRediT profiles provide verified documentation of who funded, designed, and executed specific portions of international research collaborations, reducing risk and simplifying institutional audits.

    Conclusion: Modernizing Scholarly Collaboration

    The global adoption of the CRediT Taxonomy represents a vital step toward open, equitable, and transparent scholarship. By providing clear pathways for attribution, publishers and institutions can celebrate the diverse contributions of every research team member, incentivize reproducible science, and build a more robust, searchable historical record of scientific discovery.

  • Data availability statements: what to write and where to deposit

    Most journals now ask for a data availability statement, and most authors now write one. Far fewer write one that does what it is meant to do. The phrase “data are available from the authors on reasonable request” has become the default, yet study after study has found that requests against such statements frequently go unanswered — which means the statement records an intention rather than a reality. This guide covers what to write, where to put the data, and how to make a statement that is true. It builds on the foundations in the data-infrastructure domain and connects to the practices described in the reproducibility domain.

    What a data availability statement is for

    A data availability statement (sometimes a data accessibility statement) tells a reader where the data underlying a publication can be found, under what conditions, and — where access is restricted — why. Its purpose is to make the evidential basis of the work locatable and, where ethically possible, reusable. It is the public-facing expression of the principle that a published claim should be checkable against the data behind it. A good statement is specific: it names a repository, gives an identifier, and states the access conditions plainly.

    Make the data FAIR first, then describe it

    The statement is downstream of a deposit decision, so the deposit is where the real work happens. The widely adopted reference point is the FAIR principles — that data should be Findable, Accessible, Interoperable, and Reusable. FAIR is frequently misread as “open”, and the distinction matters: FAIR does not require data to be public. It requires that data be findable (with a persistent identifier and rich metadata), accessible (retrievable by a clear, possibly authenticated, protocol), interoperable (using shared formats and vocabularies), and reusable (with a clear licence and provenance). Sensitive data can be FAIR while remaining access-controlled — the metadata is open and findable even where the data themselves are not.

    Practically, making data FAIR before you write the statement means:

    • Deposit in a repository that mints a persistent identifier — typically a DataCite DOI — so the data are citable and resolvable independently of the article.
    • Describe the data with structured metadata, not just a filename, so they can be found and understood by someone who did not produce them.
    • Attach an explicit licence (for example a Creative Commons licence for open data) so reuse conditions are unambiguous.
    • Use community formats and vocabularies where they exist, so the data interoperate with other datasets in the field.

    Choosing where to deposit: domain first, generalist as fallback

    Where to put the data is the decision that most shapes their long-term value. The general rule is to prefer a domain repository where a recognised one exists for your data type, and to use a generalist repository otherwise.

    Domain repositories

    A domain (or discipline-specific) repository is built around a particular kind of data and enforces the community’s metadata standards — GenBank for nucleotide sequences, the PDB for protein structures, and many others. Depositing here means your data sit alongside comparable datasets, are described to a standard your field already reads, and are discoverable by the people most likely to reuse them. Where your field expects deposit in a specific repository, that expectation is effectively mandatory and should be your first choice.

    Generalist repositories

    Where no suitable domain repository exists, a generalist repository — Zenodo, Figshare, Dryad and others — accepts data of any type, mints a DOI, and supports structured metadata and licensing. Generalists are the right home for the long tail of data that no specialised archive covers.

    A note on trust

    Whichever route you take, prefer a trusted digital repository — one assessed against a recognised standard such as CoreTrustSeal — over ad-hoc hosting. A repository’s job is long-term preservation and stable resolution; a personal website or a generic file-sharing link offers neither, and a link that has rotted makes a data availability statement worse than useless. Institutional and supplementary-file hosting can be acceptable, but the persistence commitment is what matters.

    Writing the statement

    A strong statement names the repository, gives the identifier, and states the conditions. Some patterns:

    • Open deposit: “The data supporting this study are openly available in [repository] at [DOI], under a [licence].”
    • Controlled access: “The data are available from [repository / controlled-access archive] subject to [conditions, e.g. a data access committee], because they contain [reason, e.g. identifiable personal data]. Metadata are openly available at [DOI].”
    • Genuinely no new data: “No new data were generated; the study analysed [named existing datasets] available at [identifiers].”

    Avoid the bare “available on request” formulation wherever the data could instead be deposited. Where access genuinely must be restricted — for participant confidentiality, commercial sensitivity, or Indigenous data governance — say so, give the reason, name who controls access, and still publish open metadata so the dataset is findable. An honest restricted-access statement is far stronger than a vague promise of availability.

    Where shared vocabulary fits

    Terms like “available on request”, “restricted access”, “trusted repository”, and even “FAIR” are used inconsistently across journals and funders, which weakens the policies that depend on them. A shared, federated vocabulary that defines these precisely — pointing back to the FAIR principles and to certification schemes such as CoreTrustSeal — is what lets a statement written for one venue be understood by another. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the relevant terms sit in the data-infrastructure domain.

    Related reading

  • CRediT degree-of-contribution qualifiers: using lead, equal and supporting correctly

    Most researchers who have encountered the CRediT taxonomy know it as a list of fourteen contribution roles — Conceptualization, Methodology, Software, Investigation, Writing – original draft, and so on — that allows a paper to say who did what rather than relying on a bare author list. But there is a second dimension to CRediT that is less widely understood and frequently underused: the ability to attach a degree of contribution to each role. CRediT is not only about which roles a person played; it can also convey how much they contributed to each, through the qualifiers lead, equal and supporting. Used correctly, these transform a flat list of roles into a far more informative account of a collaboration. This article explains them, drawing on the CRediT extensions domain of the CASRAI Dictionary.

    The standard behind CRediT

    It is worth recalling that CRediT is not merely an informal convention but a recognised standard. It was formalised as ANSI/NISO Z39.104-2022, the standard maintained through the National Information Standards Organization (NISO). That formalisation matters: it gives CRediT a stable, authoritative definition that publishers, systems and institutions can implement consistently, rather than each interpreting the taxonomy in its own way. Part of what the standard defines is precisely the degree-of-contribution dimension — the provision that each role may carry a qualifier indicating the extent of a person’s involvement. The qualifiers are therefore an official part of CRediT, not an add-on, and understanding them is part of using the taxonomy as it was designed.

    What lead, equal and supporting mean

    The three degrees are straightforward in concept, and their value lies in applying them honestly. A contributor designated lead for a role had primary responsibility for that aspect of the work — they drove it, took the leading part, bore the main responsibility for it. A contributor marked equal shared responsibility for the role roughly evenly with one or more others; no single person led it, the work was genuinely joint. A contributor marked supporting made a real and acknowledged contribution to the role but in a secondary capacity, assisting rather than leading. The point of the qualifiers is to capture the texture of collaboration that a yes/no assignment misses. Two people might both be credited with Investigation, but if one designed and ran the experiments while the other assisted, “lead” and “supporting” convey that asymmetry truthfully, where listing both would imply a parity that did not exist.

    Why the qualifiers matter

    The degree qualifiers add value in several ways. They improve accuracy: a contribution statement that distinguishes who led from who assisted is simply a more truthful account of the work. They aid recognition: a researcher who led the methodology and another who supported it both deserve credit, but distinguishing the two does justice to each, and helps those reading the record — hiring panels, promotion committees, collaborators — understand the actual shape of someone’s contribution. And they support fairness in difficult cases. Where contributions are genuinely shared, the equal qualifier provides a recognised way to say so, which is particularly valuable for marking shared leadership of a role without forcing an artificial hierarchy. In each case, the qualifier carries information that the plain list of roles cannot, and that information is exactly what makes a contribution statement useful rather than merely present.

    How to apply them in a contributor statement

    Applying the qualifiers well is a matter of judgement exercised honestly. Some practical principles help:

    • Assign degrees role by role. A person’s degree can differ across roles — lead on Writing – original draft, supporting on Investigation. Consider each role on its own terms rather than assigning one overall level.
    • Reserve “lead” for genuine primary responsibility. If several people are all marked lead on the same role, the designation loses its meaning. Lead should identify who actually drove that aspect of the work.
    • Use “equal” when it is true, not as a courtesy. The equal qualifier is valuable precisely because it is accurate; applying it to smooth over differences that really exist undermines the honesty the system depends on.
    • Do not inflate “supporting” into more than it was, nor dismiss it as trivial. A supporting contribution is a real contribution, properly acknowledged; the qualifier honours it for what it was.
    • Agree the assignments among contributors. Degrees, like roles, should be discussed and agreed by the people involved, ideally early, to avoid disputes and to ensure the statement reflects a shared understanding.

    The limits and the discipline

    The qualifiers are powerful only if they are used with discipline. Their entire value rests on being applied truthfully; a contribution statement in which everyone is “lead” on everything conveys nothing, and one in which degrees are assigned to flatter rather than to describe is worse than none, because it dresses up inaccuracy as precision. The degrees are an invitation to be honest about the real distribution of work, not a set of titles to be distributed for diplomatic convenience. Used with that discipline, they let a contribution statement do justice to the genuine complexity of collaborative research; used carelessly, they merely add noise.

    A consistent vocabulary for contribution

    For degree qualifiers to mean the same thing across journals, institutions and reporting systems, the taxonomy and its qualifiers must be applied consistently — which is precisely what formalisation as ANSI/NISO Z39.104-2022 enables, and what a shared vocabulary sustains in practice. That consistency is what the CASRAI Dictionary supports: a shared vocabulary so that a role marked lead, equal or supporting is understood the same way wherever it is recorded, whether it travels through a publisher’s system, a repository or an institutional CRIS. CRediT’s roles tell the reader which parts of the work a contributor touched; the degree qualifiers tell them how much. Together, and used honestly, they turn the author list from a question into an answer — a clear, structured account, grounded in good authorship practice, of exactly who did what, and to what extent.

  • How to write a CRediT author statement: a step-by-step guide

    The CRediT author statement has moved from novelty to routine: a large majority of major publishers now ask for one at submission, and many will not advance a manuscript without it. Yet it is still frequently drafted in the last hour before submission, by one author, from memory. That is a missed opportunity, because a statement assembled carelessly does exactly what a contribution taxonomy is meant to prevent. This guide sets out a step-by-step method for producing one well. The authoritative how-to lives at how to write a CRediT author statement, and this article walks the same ground in practice.

    First, know what CRediT is — and is not

    Before assigning a single role, fix two facts in mind. CRediT is a controlled vocabulary of fourteen contributor roles, each with a canonical definition and a stable identifier, maintained as a NISO standard. It records what people did. It is not a definition of authorship, and it is not a scoring system. The decision about who qualifies as an author is made separately, under the ICMJE criteria in biomedical fields and equivalent norms elsewhere; CRediT supplements that decision and does not replace it. Conflating the two is the most common error in this area, and it is worth reading the full account of authorship and accountability alongside this guide.

    The fourteen roles fall into four loose functional groups that make a useful checklist: planning and design (Conceptualization, Methodology, Software); research and analysis (Validation, Formal analysis, Investigation, Resources, Data curation); communication (Writing – original draft, Writing – review & editing, Visualization); and management (Supervision, Project administration, Funding acquisition). The canonical definitions are set out at the CRediT roles, and you should assign against those definitions rather than against your intuition about what a role name implies.

    Step 1: list every contributor, then settle the author line

    Start with people, not roles. Write down everyone who contributed to the work in any way, including those who may end up acknowledged rather than authored. Then apply your field’s authorship test to decide who belongs on the author line. In biomedical research that is the ICMJE four criteria: substantial contribution to conception or design or to acquisition, analysis or interpretation of data; drafting or critically revising the work; final approval of the version to be published; and accountability for the work. Most publishers apply CRediT only to named authors, so settle the author line first.

    Step 2: assign roles to each named author against the canonical definitions

    Take each author in turn and ask, for each of the fourteen roles, whether they genuinely performed that contribution as the definition describes it. Some pointers that prevent common mistakes:

    • Investigation is performing the experiments or collecting the data — not the same as Data curation, which is annotating, cleaning, and maintaining the data for reuse.
    • Methodology is designing or developing the method; Software is writing the code that implements it. In some fields these overlap, but assign both only where both genuinely happened.
    • Writing – original draft is preparing the initial draft; Writing – review & editing is critical revision by members of the original research group. An author who only commented on a near-final draft did the latter, not the former.
    • Funding acquisition, Resources, and Supervision are legitimate roles, but on their own they may or may not meet the authorship bar in your field — record the contribution honestly and let the authorship test, not the role, decide.

    Be aware that even careful researchers given the same description sometimes disagree on which roles apply; the boundaries between adjacent roles are genuinely fuzzy. Treat the statement as an honest broad signal, not a precise measurement.

    Step 3: add the degree-of-contribution qualifier where it helps

    The standard supports an optional qualifier on each assignment — lead, equal, or supporting. It is not a percentage and it does not rank roles against one another; it distinguishes “I led this” from “I contributed to this.” Most published statements omit it because few publishers require it, but it is genuinely useful where several authors share a role: marking one author as lead on Writing – original draft and two as supporting conveys real information at almost no cost.

    Step 4: confirm with every author

    A contribution statement is a claim made on behalf of named people, so each named person should see and confirm their own roles before submission. This is not bureaucratic box-ticking. It is the step that catches the case where one author has, in good faith, attributed to themselves work that another person actually did — the failure mode that an honest taxonomy exists to surface. Circulate the draft statement; let each author correct their own line.

    Step 5: format it for the journal

    The conventional written form lists each author by name followed by their roles:

    Zhang San: Conceptualization, Methodology, Software. Priya Patel: Data curation, Writing – original draft. Erin Wright: Visualization, Investigation. Adam Lloyd: Supervision, Software, Validation. Maria García-López: Writing – review & editing.

    Many submission systems collect the same information through a structured form instead, which is better: a statement captured as structured metadata can propagate to Crossref and ORCID and be read by downstream systems, whereas a closing paragraph of prose cannot. Where the journal offers the structured route, use it. Where it only collects a narrative paragraph, write the paragraph above — but know that its value as machine-readable data is limited until the publisher’s plumbing catches up.

    Common pitfalls to avoid

    • Treating CRediT as the authorship test. It records contribution; it does not decide who qualifies as an author.
    • Claiming roles for acknowledged work. If a medical writer drafted the text or a technician ran the experiments, do not absorb their roles into an author’s line.
    • Over-assigning. Listing all fourteen roles for the senior author signals nothing. Assign only what was genuinely done.
    • Leaving author order to CRediT. CRediT does not encode author order; that is a separate decision your field’s conventions govern.

    Where shared vocabulary fits

    CRediT itself is settled; the live problem is that its real-world implementation is uneven, with many venues collecting only narrative paragraphs rather than structured metadata. A shared, federated vocabulary that defines the roles consistently and points back to NISO for the standard is what lets a statement written for one system mean the same thing when read by another. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the contributor-roles vocabulary sits in the CRediT extensions domain.

    Related reading

  • The DMP tools landscape: comparing DMPTool, DMPonline and Argos

    A standard for machine-actionable data management plans is only useful if researchers have tools that put it into practice. Over the past decade three platforms have come to dominate the data-management-planning landscape, each developed and maintained by a significant open-science organisation, and each now working towards interoperability through the same common standard. For an institution choosing how to support its researchers, or a researcher trying to understand the options, it helps to see how DMPTool, DMPonline and Argos compare — what they share, where they differ, and what unites them. This article surveys that landscape through the machine-actionable DMP domain of the CASRAI Dictionary.

    Why dedicated tools exist

    It is reasonable to ask why data management planning needs special software at all, when a plan could be written in a word processor. The answer lies in everything a good tool does beyond capturing text. A dedicated DMP platform guides researchers through funder and institutional templates so they answer the right questions; it supplies guidance at the point of need; it allows plans to be shared, reviewed and collaboratively edited; and, increasingly, it exports plans in structured, machine-readable formats so the commitments they contain can be acted on by other systems rather than read once and filed. This last capability — producing a machine-actionable plan rather than a static document — is what distinguishes a modern DMP tool from a template in a folder.

    DMPTool

    The first of the three, DMPTool, is developed and operated by the California Digital Library. It emerged to help researchers, particularly in the United States, meet the data-management-planning requirements of funders, and it provides funder and institutional templates, tailored guidance and a collaborative environment for producing plans. DMPTool has been a leading voice in the move towards machine-actionable planning, contributing to the development of the standards and infrastructure that allow plans to become connected, living objects rather than text deliverables. Its institutional adoption across many universities has made it a familiar part of the research-support landscape, and its development sits within the broader work of the California Digital Library on open scholarship and research infrastructure.

    DMPonline and DMP Roadmap

    The second platform, DMPonline, is developed by the Digital Curation Centre, a long-standing centre of expertise in research-data curation. Like DMPTool, it offers funder and institutional templates, embedded guidance and collaborative editing, and it is widely used across the United Kingdom, Europe and beyond. DMPonline and DMPTool are closely related at a deeper level: they share a common open-source codebase known as DMP Roadmap, jointly developed by the two organisations. This shared foundation means the two services have a great deal in common under the surface even as each is tailored to its own community of funders and institutions. The collaboration behind DMP Roadmap is itself a notable feature of the landscape: rather than building competing systems from scratch, two major infrastructures pooled effort into a common platform, which has helped align their approach to machine-actionable planning.

    Argos

    The third platform, Argos, comes from the European open-science ecosystem and is developed in association with OpenAIRE and EUDAT. Argos was designed from the outset with machine-actionability and openness in mind, and with close integration into the wider European research-infrastructure landscape. It supports the creation of plans against templates and, in keeping with its origins, emphasises producing plans as structured, openly available outputs that connect into the broader graph of European research information. Its provenance in OpenAIRE and EUDAT positions it naturally within an ecosystem oriented towards linking outputs, projects and funding, and it reflects a vision in which the DMP is not an isolated document but a connected node in the research record.

    What unites them: the RDA DMP Common Standard

    For all their differences in origin and community, the three platforms are converging on a shared foundation for interoperability: the RDA DMP Common Standard, developed through the Research Data Alliance. The common standard defines a shared model and structure for expressing the information a DMP contains, so that a machine-actionable plan can be exported from one system and understood by another. This matters because plans do not live in isolation: a plan created in one tool may need to be read by a funder’s system, harvested into a repository, or connected to the persistent identifiers for the people, projects and outputs it describes. Without a common structure, every such exchange would require bespoke translation. With it, a maDMP exported from DMPTool, DMPonline or Argos can in principle flow into the wider ecosystem and be acted upon. The standard is what turns three separate tools into parts of a connected planning landscape.

    Choosing between them

    For an institution or researcher, the choice often comes down to context rather than a verdict on which platform is best. Existing institutional adoption, the funders one works with, the surrounding national infrastructure and integration with other systems all weigh on the decision. Because all three are moving towards the same common standard, the choice is less consequential than it once was: the goal is interoperable, machine-actionable planning, and each platform is a credible route to it. The decision is one of fit, not of compatibility.

    A consistent vocabulary across tools

    For plans to move between these platforms and the systems that consume them, the elements they contain must mean the same thing everywhere — the data types, the licences, the identifiers, the contributor roles. That consistency is what the CASRAI Dictionary provides, complementing the structural interoperability of the RDA standard with shared meaning for the terms that flow through it. And because data management planning is part of the wider research record, the contributions it documents can be described in the same shared framework — the CRediT taxonomy and its full set of contribution roles. To weigh the platforms side by side in more detail, our comparison resources set out their features against one another. The tools differ in origin and emphasis, but they share a destination: planning that machines as well as people can act upon.

  • Co-first authorship and equal contribution: marking shared credit correctly

    Two researchers do roughly equal amounts of the central work on a paper, but only one name can physically come first on the author line. This is now an everyday situation in team science, and the conventional response is to declare the two authors equal contributors. Yet that declaration is recorded in many different ways, some of which barely survive indexing, and the result is that genuinely shared credit is frequently lost when it matters most — in a hiring or promotion committee reading the line. This article sets out how to mark shared credit correctly, building on the conventions described at author order and the role definitions at the CRediT roles.

    What “equal contribution” is claiming

    In most experimental and biomedical fields, position on the author line is information, not decoration. By widespread convention the first author did the bulk of the hands-on work and led the writing; the last author is the senior supervising figure. A co-first or equal-contribution designation is a deliberate intervention against that convention: it asserts that two (occasionally more) people share the leading-author role even though the linear author line can only print them one after another. The claim is specifically about leadership of the work, and it should be reserved for cases where it is genuinely true — not used as a courtesy to soften the awkwardness of ordering.

    It is worth being clear that equal contribution is field-specific. In mathematics, economics, and much of the humanities, authors are listed alphabetically and order carries no contribution signal at all, so an equal-contribution note is redundant. The designation does real work only where order is otherwise read as a ranking.

    The three places shared credit gets recorded

    Shared first authorship can be expressed through three distinct mechanisms, and the strongest practice uses them together rather than relying on any one.

    1. The author-line note

    The equal-contribution symbol is a superscript character placed against two or more names on the author line — most commonly a dagger (†) or an asterisk (*) — resolving to a footnote that reads “These authors contributed equally to this work.” This is the human-readable signal a reader sees on the page. Its weakness is that it is presentational: the symbol and its note are not reliably captured as structured metadata, so a system harvesting the author list may record the two authors in their printed order and silently drop the equality. That is precisely how co-first status disappears downstream.

    2. The contribution statement, using the degree qualifier

    This is where a contribution taxonomy earns its place. The CRediT taxonomy supports an optional degree-of-contribution qualifier on every role assignment: lead, equal, or supporting. It is not a percentage and it does not weigh one role against another; it simply distinguishes who led a role from who shared or supported it. To record co-first authorship honestly, mark the relevant leading roles — typically Conceptualization, Investigation, Formal analysis, and Writing – original draft — as equal for both authors:

    Author A: Conceptualization (equal), Investigation (equal), Writing – original draft (equal). Author B: Conceptualization (equal), Investigation (equal), Writing – original draft (equal).

    This carries far more information than a footnote. It says which parts of the work were shared, and it does so in a form that can travel into structured systems. The qualifier is widely available in publisher submission systems, though rarely required, so you usually have to choose to use it.

    3. Order-neutral display where the venue allows

    A growing number of venues let authors indicate that the printed order of co-first authors may be swapped on individual CVs — the “authors may list their name first” convention. Where offered, this is a sensible complement to the two mechanisms above, because it acknowledges directly that the linear order does not encode a ranking between the equal contributors.

    A method for marking it correctly

    1. Confirm the claim is true. Equal contribution means the leading work was genuinely shared. If one person clearly led, say so with lead and supporting rather than reaching for equal.
    2. Decide the printed order on a transparent basis. Something has to come first. Agree the basis openly — alphabetical, coin-toss, or rotation across the group’s papers — and record that the order is not a ranking.
    3. Add the author-line note so a human reader sees the equality at a glance.
    4. Encode it in the CRediT statement with the equal qualifier on the shared roles, so the claim survives as structured data rather than as a presentational footnote.
    5. Have every named author confirm their own line before submission. Shared-credit claims are exactly where unconfirmed assumptions cause later disputes.

    Common mistakes

    • Relying on the footnote alone. A dagger and a note are fragile. Without the structured qualifier, the equality often does not survive into the systems that later read the author list.
    • Using “equal” to avoid an honest conversation. Declaring everyone equal because ordering is uncomfortable devalues the designation and misrepresents the work.
    • Confusing equal contribution with author order generally. CRediT records what each person did; it does not set author order, which remains a separate decision governed by your field’s conventions.
    • Forgetting the corresponding-author role. Corresponding authorship is a distinct responsibility and can sit with any author, including one of the co-first authors; settle it explicitly.

    Where shared vocabulary fits

    “Co-first”, “joint first”, “equal contribution”, and “shared senior author” are used loosely and recorded inconsistently across venues, which is exactly why the credit so often fails to travel. A shared, federated vocabulary that defines these designations precisely — and points back to NISO for the CRediT standard and its degree qualifier — is what lets an equal-contribution claim mean the same thing wherever it is read. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the relevant terms sit in the CRediT extensions domain.

    Related reading

  • CRediT in JATS XML: a technical primer for production teams

    A contributor-roles statement is only as useful as it is machine-readable. A typesetter can render ‘A.B. wrote the original draft; C.D. supervised’ as a tidy paragraph at the foot of an article, but if that information lives only in prose then no downstream system — a research information system, an indexer, a funder’s reporting tool — can act on it. The point of CRediT, the Contributor Roles Taxonomy, is to make contributions structured, and in scholarly publishing ‘structured’ means encoded in JATS XML. This primer is for the production teams who actually do that encoding: the people for whom ‘add CRediT’ on a project plan turns into concrete decisions about elements, attributes and controlled vocabularies. The authoritative tag-level guidance is set out in the CRediT in JATS reference and the broader JATS implementation notes.

    Where contributor roles live in JATS

    JATS (the Journal Article Tag Suite, the NISO Z39.96 standard) models people in the <contrib-group> element. Each named individual is a <contrib>, carrying their name, affiliations and identifiers. The element that carries a contributor’s function is <role>, nested inside the relevant <contrib>. A single contributor may hold several roles, so multiple <role> elements per <contrib> are expected and entirely valid — one person might legitimately be tagged for Conceptualization, Methodology and Writing – review & editing.

    The job of a production team is to make those <role> elements unambiguous. Free-text role labels are not enough, because ‘wrote the paper’ and ‘drafting’ and ‘Writing – original draft’ are the same role expressed three ways. CRediT solves this by giving each of its roles a stable definition and a canonical identifier, and JATS provides the attributes to point at them.

    The JATS4R recommendation for encoding CRediT

    JATS4R — JATS for Reuse — is the community group that publishes interoperability recommendations for ambiguous corners of the standard, and it has a specific recommendation for CRediT. The core of it is that a <role> element used for a CRediT contribution should declare the vocabulary it draws from and the specific term within it. In practice this means three attributes work together:

    • vocab — identifies the controlled vocabulary as CRediT;
    • vocab-identifier — gives the URI of the taxonomy itself, so a consuming system can resolve what vocabulary is being used;
    • vocab-term and vocab-term-identifier — give the exact term and its canonical URI, so the role resolves to one and only one CRediT definition.

    The human-readable label remains the text content of the <role> element — that is what a reader sees — while the attributes carry the machine meaning. The recommendation is deliberate that the visible text and the term identifier must agree: do not tag a <role> as Data curation in its attributes while the visible text reads ‘Formal analysis’. JATS4R also advises using the official CRediT term strings verbatim rather than house variants, because verbatim strings are what validators and aggregators expect to match.

    Degrees of contribution

    CRediT permits, but does not require, a statement of the degree of a contribution — for example marking one contributor as having led a given role. JATS expresses this through additional attribution on the role rather than by changing the term identifier. Production teams should treat degree as optional metadata that is encoded only when the manuscript actually supplies it; inventing a lead/equal distinction where the authors stated none is a data-quality error, not an enhancement. When degree information is present, keep it consistent across the article so that a reader and a parser draw the same conclusion.

    Common production pitfalls

    Several mistakes recur often enough to be worth naming. The first is putting CRediT roles in the wrong place — bundling them into an unstructured author-contributions paragraph in the article body instead of, or in addition to, the structured <role> elements. The structured encoding is the one machines read; a prose paragraph is a courtesy to humans, not a substitute. The second is omitting vocab-identifier and vocab-term-identifier, which leaves the role as plain text that cannot be reliably disambiguated. The third is term drift: lightly edited labels such as ‘Writing (review and editing)’ that no longer match the canonical CRediT string and therefore fail automated checks.

    A subtler issue is association: every <role> must sit inside the correct <contrib>. In articles with long author lists it is easy for a role to be attached to the wrong person during conversion, especially when contributions are supplied as a separate table that a typesetter merges by hand. Validating that each role resolves to the intended contributor is as important as validating that the term identifiers are correct.

    Building it into the workflow

    The practical recommendation is to capture CRediT as structured data as early as possible — ideally at submission, where many manuscript systems now collect a contribution matrix — and to carry that structure through conversion rather than reconstructing it from prose at the typesetting stage. Round-trip validation against the JATS4R recommendation should be part of the production QA step, alongside the schema validation a publisher already runs. Treating contributor roles as first-class structured metadata, governed by the definitions in the research information systems domain of the CASRAI Dictionary, is what allows contribution data to survive intact all the way to the version of record and beyond.

  • Keeping your ORCID record current: a maintenance guide for researchers

    Registering for an ORCID iD takes about two minutes. Keeping the record behind it accurate is where most researchers fall down, and an out-of-date ORCID record quietly undermines the very thing the identifier is meant to do. The good news is that, with a few settings configured once, ORCID will keep much of your record current for you — the work is far more about permissions than about manual data entry. This guide explains how. For the background on what the identifier is and why it matters, the persistent-identifiers guidance for authors and the explainer on what an ORCID iD is are the place to start; this article assumes you already have one and want to keep it healthy.

    The two kinds of data on an ORCID record

    The single most useful thing to understand about ORCID is that not all the information on a record is equal. ORCID distinguishes between data that you have typed in yourself and data that a trusted organisation has asserted about you.

    • Self-asserted data is anything you add by hand — an affiliation you typed, a paper you entered manually. It is useful, but a reader cannot tell whether it is verified.
    • Validated assertions are added by a trusted organisation through ORCID’s API — your university confirming an employment, a publisher confirming you authored a paper, a funder confirming you hold a grant. These carry the source of the assertion, so anyone reading the record can see that the affiliation came from the institution itself, not just from your own claim.

    A record full of validated assertions is dramatically more trustworthy — and more useful to funders and hiring committees — than one you have populated entirely by hand. The goal of good ORCID maintenance is therefore to let trusted organisations do as much of the asserting as possible.

    Turn on auto-update

    The highest-value setting is auto-update. When you connect your ORCID iD to Crossref and DataCite — a one-time authorisation — new works that are deposited with your ORCID iD attached are added to your record automatically. In practice this means that when you publish a paper and the publisher includes your ORCID iD in the Crossref deposit, the paper appears on your ORCID record without you doing anything, and it appears as a validated assertion sourced from the registration agency.

    The condition is simple but easy to miss: the publisher has to actually collect and deposit your ORCID iD. That is why you should always supply your ORCID iD during submission, and ideally sign in with it rather than typing it, so that the iD is authenticated. An authenticated iD attached at submission is what makes the whole auto-update chain work. Connect once, supply your iD every time, and your publication list largely maintains itself.

    Manage your trusted organisations and trusted individuals

    Auto-update is one instance of a broader mechanism: trusted parties. ORCID lets you grant two kinds of trust:

    • Trusted organisations — institutions, funders, publishers, and systems you authorise to read from or write to your record through the API. Your university’s research-information system, for example, can be a trusted organisation that adds your validated employment affiliation and pushes your institutional outputs onto your record.
    • Trusted individuals — a person, such as a research administrator or an assistant, whom you authorise to manage your record on your behalf. This is useful for senior researchers who would rather delegate the upkeep.

    Both are managed under the Trusted parties section of your account settings, and both are fully revocable. Granting access does not hand over your password; it grants a scoped, auditable permission that you can withdraw at any time. Reviewing this list once or twice a year — confirming the organisations you expect are there, and revoking any you no longer deal with — is the core maintenance habit.

    Set your visibility deliberately

    Every item on an ORCID record has a visibility setting: everyone, trusted parties only, or only me. The default for new items can be configured in your account. For the record to be useful to the systems that consume it — funders checking your outputs, journals verifying your identity, your CRIS pulling your profile — the key items generally need to be public. A common and self-defeating mistake is to register an iD, set everything to private, and then wonder why the identifier seems to do nothing. As a rule, make your name, affiliations, and outputs public, and reserve restricted visibility for things you genuinely want kept back.

    A short maintenance routine

    1. Connect to Crossref and DataCite auto-update once. This is the single highest-leverage action; it keeps your works current automatically.
    2. Always supply your authenticated ORCID iD at submission — for papers, datasets, software, and grant applications — so that each output and award can be asserted onto your record.
    3. Authorise your institution as a trusted organisation so that your employment and institutional outputs arrive as validated assertions.
    4. Review your trusted parties annually and revoke any you no longer use.
    5. Add the things no one else will assert — education, professional memberships, peer-review and editorial service, older works that predate ORCID — by hand, since these often have no organisation to assert them for you.
    6. Check your visibility settings so that the items you want discoverable are actually public.

    Why a current record pays off

    Beyond convenience, an accurate ORCID record increasingly does real work on your behalf. Funders draw on it for applications and reporting; narrative-CV and biosketch tools pull from it; institutional systems reconcile your outputs against it. A record rich in validated assertions lets you make precise, checkable claims about your contribution history — including, where publishers deposit them, your CRediT roles per paper, so that “I led the analysis on these studies” becomes a verifiable statement rather than an assertion on a CV. The effort is front-loaded into a handful of one-time settings; the payoff compounds across every later application and assessment.

    Where shared vocabulary fits

    “Auto-update”, “trusted party”, “validated assertion”, “source”, and “self-asserted” are ORCID-specific terms that are easy to muddle, and confusion about them is exactly why so many records go stale. A shared, federated vocabulary that defines these terms precisely is what lets guidance from one institution be understood at another. Supplying that definitional layer is part of the role the CASRAI dictionary is designed to play.

    Related reading

  • Disclosing generative AI use in research: what to declare and where

    Two or three years ago, declaring the use of a generative AI tool in a manuscript was an unusual courtesy. Today it is a baseline expectation, written into the author instructions of most major publishers and the recommendations of the bodies that set publishing norms. Yet the question authors most often ask is disarmingly practical: what exactly do I have to declare, and where does the declaration go? This article sets out a clear answer, drawing on the vocabulary being developed in the generative AI use and disclosure domain.

    The two settled principles

    Underneath the variation between publishers, two principles have hardened into near-consensus, and they are the right place to start.

    The first is that a generative AI system cannot be an author. The ICMJE recommendations, and parallel statements from COPE, Nature, Science, and the major university presses, are explicit on this point: authorship entails accountability for the work, and a tool cannot be accountable. AI use is therefore disclosed as a method or a tool, never as a contributor on the author line. This connects directly to the broader account of authorship as a matter of responsibility, not merely of having touched the text.

    The second is that the human authors remain fully responsible for everything the manuscript asserts, including anything an AI system produced. A fabricated citation, a misstated statistic, or a plausible-but-wrong sentence is the authors’ error regardless of which tool generated it. Disclosure does not transfer responsibility; it makes the workflow transparent so that responsibility can be located.

    What counts as disclosable use

    The harder question is the threshold. Not every interaction with a computational tool is a disclosable use of generative AI, and policies generally exempt the trivial. The useful distinction is whether the tool produced novel content that materially shaped the published work.

    • AI-assisted writing — where a generative system drafted, restructured, summarised, or substantively edited text whose output shaped the published wording — is disclosable. A generative AI tool is, in the working definition, a system that produces novel text, code, image, or other media from a prompt, typically using a large neural network.
    • AI-assisted analysis — using a model to perform or shape a data-analysis step, including exploratory analysis or hypothesis generation — is disclosable as part of the methods.
    • AI-generated code that forms part of the research, and AI-generated images in a manuscript, are disclosable, the latter often under stricter rules because of the integrity risks around figures.

    By contrast, most policies define an AI use exempt category for tools that do not produce novel content: a spell-checker, a grammar corrector, a reference manager, or basic translation of the author’s own words. Author-written text whose grammar was tidied by an AI checker is not, in this sense, AI-assisted writing. The line is not always crisp — substantive rewriting shades into drafting — and when in doubt the safe practice is to disclose.

    Where the declaration belongs

    Knowing what to declare is half the problem; the other half is placement, and here practice has converged on a small set of locations.

    The dominant convention is a dedicated AI use disclosure statement in the manuscript: a short declaration that names the system, says where in the workflow it was used, and indicates the extent of that use. “Which tool, where, and how much” is the durable shape of a good statement. Many journals place this in the methods section when the use was analytical, and in a distinct acknowledgements-adjacent statement when the use was in writing.

    A useful test for a disclosure statement: a reader should be able to tell, from the statement alone, which parts of the work involved a generative system and what the authors did to verify its output. A generic line that an AI tool was “used to improve readability” fails this test; it names neither the tool nor the boundary of its use.

    Two adjacent practices strengthen the statement. The first is recording a model selection rationale and, where relevant, the prompt engineering that produced reliable outputs — material that belongs in supplementary methods for analytical uses, because it bears on reproducibility. The second is naming the AI tool provider at the organisational level, so that the disclosure points at an identifiable system rather than a generic category.

    Why structured disclosure, not just prose

    A free-text paragraph at the end of a manuscript is where most disclosures live today, and it is better than nothing. But prose disclosure has the same weakness that prose contribution statements have: it does not travel as data. A structured representation — naming the tool, the workflow stage, the extent, and the verification step as discrete, machine-readable fields — lets downstream systems index, audit, and aggregate AI use across the literature. That is the difference between a sentence a human must read and a record a system can act on, and it is the gap a controlled vocabulary is meant to close. The parallel with structured contribution metadata in CRediT is exact: a settled human-readable form, waiting on consistent machine-readable plumbing.

    The role for shared vocabulary

    Publishers’ AI policies differ in wording, in threshold, and in placement, which means a disclosure written for one journal does not necessarily mean the same thing when read by another system. What is missing is not more policy — the principles are settled — but a shared definitional layer: agreed terms for AI-assisted writing, AI-assisted analysis, exempt category, and the rest, so that a disclosure carries the same meaning wherever it is read. Supplying that layer, federating to ICMJE and COPE for the normative content rather than inventing it, is the convening role the CASRAI dictionary is built for. The practical guidance for authors lives at AI disclosure for authors.

    What to do now

    For authors: disclose any use that produced novel content shaping the work, name the tool and the workflow stage, and state that you verified the output. For editors: specify where the statement goes and ask for structured fields, not just a paragraph. For standards work: prioritise shared definitions of the disclosable categories and the exempt threshold, so disclosures mean the same thing across venues.

    Related reading