Category: Uncategorized

  • Sustainable laboratory operations: LEAF, My Green Lab, and the carbon footprint of research

    The carbon footprint of research is unusually large per worker. A typical wet lab consumes 3-10 times the energy of an equivalent office space; a single ultra-low-temperature freezer running 24/7 uses as much electricity as an average household; single-use plastics in life-science labs alone are estimated globally at 5.5 million tonnes per year. A research-intensive university’s Scope 1 and 2 emissions sit primarily in lab buildings; its Scope 3 sits primarily in travel and procurement. The sustainability conversation in 2026 has moved from awareness to operational programmes, with two frameworks dominating: LEAF (the Laboratory Efficiency Assessment Framework developed at UCL) and My Green Lab certification. This post is a practical tour.

    Why labs are different

    Office sustainability programmes (LED lighting, paper recycling, energy-efficient computing) translate poorly to laboratories because the energy intensity is in equipment that cannot simply be switched off. A freezer holding biological samples cannot be turned off at night; a fume hood cannot be reduced to standby; a sequencer cannot run at half power. Lab sustainability is therefore primarily a question of which equipment runs and how it is operated, not whether it is on.

    The corollary is that lab sustainability programmes need a different vocabulary and a different evidence base than office sustainability. Both LEAF and My Green Lab were designed in response to this; both have been validated empirically over the 2018-2024 period.

    LEAF

    The Laboratory Efficiency Assessment Framework, developed at University College London by Martin Farley and colleagues, is a self-assessment-and-certification framework structured at three levels (Bronze, Silver, Gold). LEAF’s strength is that it is operationally specific: each level lists discrete actions a lab can take, and the actions are tied to estimated energy and waste impacts based on UK lab benchmarks.

    The LEAF Bronze level covers basics: freezer temperature optimisation, sash management on fume hoods, equipment shutdown protocols, recycling, lighting, water conservation. Silver adds: research-life-cycle assessment, supplier engagement, training, advocacy. Gold adds: integrated sustainability planning, leadership in the institution’s sustainability programme, mentorship of other labs.

    By 2026 LEAF is in use at over 1,000 institutions globally, with strong concentration in the UK (where it was developed and has the strongest institutional backing). The LEAF self-assessment is free; the certification process involves institutional review. The framework is open and has been adapted to several national contexts.

    The LEAF impact data

    The 2023 UCL study of LEAF Bronze-certified labs found average energy reductions of 5-15% versus baseline, primarily from freezer optimisation and fume-hood sash management. The 2024 follow-up at LEAF Silver labs found additional 5-10% reductions and significant reductions in single-use plastic consumption through supplier engagement. The data underwrite the framework: this is not aspirational, it is documented.

    My Green Lab

    My Green Lab is a US-based non-profit operating a complementary certification programme that has dominated the North American market and increasingly the international one. My Green Lab Certification (MGLC) is built around a survey assessment in 14 categories (energy, water, waste, green chemistry, purchasing, training, etc.) with a numerical score and an annual recertification cycle.

    My Green Lab also operates the ACT Label (Accountability, Consistency, Transparency) for lab products: a vendor-supplied environmental-impact rating for an individual product (pipette tip, plate, reagent) covering its energy, water, packaging, and chemical inputs. The ACT Label is in widespread use across major lab suppliers (Eppendorf, Thermo Fisher, Bio-Rad, NEB) and has become a discriminator at the procurement stage for sustainability-minded labs.

    By 2026 My Green Lab claims over 4,000 certified labs across more than 70 countries; the certified-lab cohort is concentrated in pharmaceutical and biotechnology industrial research, with growing academic uptake.

    Choosing between LEAF and My Green Lab

    Most institutions can choose one and stick with it. LEAF is more prescriptive (the levels list specific actions); My Green Lab is more diagnostic (the survey identifies areas for improvement and tracks scores over time). LEAF has stronger UK and European institutional backing; My Green Lab has stronger US and pharmaceutical-industry backing. An institution coordinating with a major US pharmaceutical partner is likely better off with My Green Lab; one coordinating with UKRI or EU funders is likely better off with LEAF.

    Some institutions run both. The duplicate-overhead cost is moderate (the underlying lab practices are largely the same; the documentation is different) and the dual recognition can be useful.

    Freezer management: the biggest single lever

    The single most impactful intervention in most labs is freezer management. A typical -80°C ultra-low-temperature freezer consumes 16-22 kWh/day. Switching to -70°C set point (acceptable for most stored samples per the 2018 Gail Phébré et al. validation) cuts consumption by 30-40%. Combining with a sample-inventory audit (most labs have 10-30% of freezer contents that are unused, unlabelled, or duplicated and can be discarded) frees space and avoids new freezer purchases.

    The UCL Race to Zero trial in 2022-2023 had 53 labs switch -80°C freezers to -70°C; the energy savings were as predicted and no sample integrity issues were reported across 12+ months of follow-up. This is now standard guidance in both LEAF and My Green Lab.

    Single-use plastics

    The wet-lab single-use-plastic flow is enormous and largely necessary (sterility, contamination control, reproducibility). The mitigation in 2026 has two prongs. First, vendor switching: products with ACT Labels in the better tiers (post-consumer recycled content packaging, take-back programmes, reduced primary packaging) materially reduce flows. Second, recycling streams: rigid PE/PP lab plastics (tip boxes, conical tube racks) are recyclable in dedicated lab-plastic streams operated by several vendors. The recycling capture rate has grown substantially since 2022.

    The remaining hard problem is contaminated plastics (anything that has touched biological or chemical materials and cannot be cleanly recycled). The mitigation is procurement-stage: smaller tip volumes, single-use serological pipette redesigns with less plastic per unit, reusable glassware where appropriate.

    Sustainable HPC

    High-performance computing is the fastest-growing emissions source in many research universities. A modern HPC cluster running 24/7 at high utilisation has a substantial Scope 2 footprint; AI-training workloads in particular have caused HPC electricity consumption to grow sharply since 2022.

    The mitigations in 2026 include: power-aware job scheduling (running flexible jobs when grid carbon intensity is low, e.g., when wind generation is high); efficiency-first allocation (prioritising jobs that have demonstrated CPU/GPU efficiency); ML-model-efficiency policies (preferring smaller, more efficient models where they suffice); reporting emissions per project in the same way that we report compute hours.

    The Green Algorithms tool and the CodeCarbon Python library let researchers estimate emissions per analysis. UKRI and the EU’s HORIZON programme now ask researchers to report estimated emissions in proposals for compute-heavy projects.

    Conference travel: the Scope 3 elephant

    Academic conference travel is, for most research-intensive universities, the single largest Scope 3 emissions category. A round-trip transatlantic flight emits roughly 1-2 tonnes of CO2 per passenger; an academic with a typical conference cadence can easily account for 5-10 tonnes/year of travel emissions, which dwarfs everything else they personally consume.

    The 2020-2024 pandemic enforced a partial shift to virtual conferencing; the post-pandemic settlement has not held. By 2026 conference travel is largely back to pre-pandemic levels, though with somewhat more hybrid options. The frameworks that have emerged include: institutional travel-budget caps with carbon-equivalent accounting; conference-clustering (attending one trip with multiple events rather than several separate trips); flight-free regional conferences (the UK Reproducibility Network’s flight-free Easter conference, the European Geosciences Union’s hybrid format); and proportional-attendance models in which junior researchers attend in person while seniors attend virtually.

    The conference travel emissions conversation is genuinely difficult because there are real career and equity costs to reducing in-person attendance. The current best practice is to count, declare, and make trade-offs visible, rather than to impose a top-down quota.

    Scope 1, 2, 3 in research-org context

    For a research institution: Scope 1 is direct (campus heating fuel, owned vehicles); Scope 2 is purchased energy (electricity, district heating); Scope 3 is everything else (travel, procurement, commuting, waste, investments). For a typical research-intensive university, Scope 3 is 70-90% of total emissions, with travel and procurement dominating. The implication is that a serious sustainability programme must address Scope 3 procurement (sustainable lab purchasing) and Scope 3 travel (conferences and fieldwork), not just on-campus operations.

    The sustainable-research domain at CASRAI tracks framework adoption and institutional case studies; the research carbon footprint entry walks through the standard accounting methodology adapted for research organisations.

    Related dictionary entries

    References

    Farley et al., LEAF: a tool for laboratory sustainability assessment (UCL technical report, 2019, updated 2023). My Green Lab, 2024 Certification Standard (current version). Urbina et al., Labs should cut plastic waste too (Nature, 2015, the foundational plastics paper). Lannelongue et al., Green Algorithms: Quantifying the carbon footprint of computation (Advanced Science, 2021). UCL Race to Zero, Freezer temperature transition report (2023).

  • Why the next CRediT version should include ‘AI assistance’ as a role

    The 14 roles of CRediT were designed in 2013-2014 with a model of contribution that did not include large language models or generative AI systems. A decade on, the taxonomy is robust and widely adopted, but the AI question is hard to ignore. This post makes the case — tentatively, and with attention to the counter-arguments — that the next CRediT revision should add a 15th role explicitly covering AI assistance. We are publishing it here to invite community pushback before any formal proposal goes to the CRediT stewardship group.

    Why this question is not solved by disclosure alone

    The current consensus around generative AI in scholarly authorship rests on two pillars: AI cannot be a co-author (the ICMJE 2023 position), and AI use must be disclosed in a structured declaration. CASRAI agrees with both. They do not, however, resolve the question of how AI assistance shows up in CRediT.

    A worked example. Suppose a paper has four authors. Author A wrote the first draft with substantial assistance from a large language model, which she prompted, edited, fact-checked, and revised. Author B ran the formal analysis using an AI-assisted statistical-discovery tool that proposed model specifications. Author C generated several of the figures using a GenAI visualisation tool. Author D supervised. Each used AI; each used it differently; each took human responsibility for the output. How does the CRediT statement represent this?

    Under current CRediT, AI use is invisible. Author A gets Writing – original draft (lead). Author B gets Formal analysis (lead). Author C gets Visualization (lead). Author D gets Supervision. The AI assistance shows up only in the publisher-mandated AI disclosure, which is a free-text field in the methods or acknowledgements. The structured contributorship record has no place for the granular fact that AI was a tool in each of those role-discharges.

    The proposed 15th role

    The draft scope we are testing is this:

    AI assistance. The use of artificial-intelligence systems, including generative AI, machine-learning models, and automated analytical tools, in the production of the work. Includes prompt engineering, model selection, validation of AI output, and human verification of AI-generated content. Does not include use of AI as a routine tool (e.g., grammar checkers, citation-formatting tools) below a disclosure threshold defined by the publisher.

    The role would carry the standard degree-of-contribution qualifier. A human author whose primary contribution was prompting and verifying an AI system would be marked Lead for AI assistance; a co-author who occasionally checked AI outputs would be Supporting. The role would not be a substitute for the existing roles — the human who used AI for the first draft still gets Writing – original draft — but it would add the structured fact that AI was involved.

    The arguments for

    First, structured disclosure is more useful than prose disclosure. A free-text AI declaration cannot be queried, cross-referenced, or aggregated. A CRediT-style structured role can. Integrity offices investigating a fabrication can query for papers with AI assistance roles; funders tracking AI use in grant outputs can roll up the data; bibliometric studies can analyse patterns. None of this is possible with the current free-text disclosure.

    Second, granularity matters for accountability. Knowing that a paper used AI is less useful than knowing which contributor used AI for which task. The CRediT role assignment makes the accountability specific. If a fabricated reference appears in the introduction, the question of who is responsible for verifying it has a structured answer.

    Third, the boundary is becoming a fiction. Modern statistical workflows include AI components (autoML, AI-assisted exploratory analysis); modern writing workflows include AI components (Copilot for prose, Claude for editing); modern visualisation workflows include AI components. The pretence that these are separable from the role they support is increasingly hard to maintain. If AI is being used to discharge a role, the role assignment should say so.

    The arguments against

    Three serious counter-arguments deserve engagement.

    First, the scope-creep concern. CRediT has held to 14 roles deliberately. Each addition raises the cognitive load on authors filling out the statement, increases the integration burden on publishers, and risks the taxonomy becoming unusable through over-specification. The argument from Liz Allen and the original CRediT designers has been that the taxonomy gains its value from being small enough to use.

    Second, the boundary problem. What counts as AI assistance? A grammar checker is plausibly AI; a citation formatter increasingly is; a search engine ranking results by relevance certainly is. If every modern research tool counts as AI, the role becomes meaningless. A workable scope requires a non-trivial threshold (the draft language above gestures at “below a disclosure threshold defined by the publisher”), and that threshold is hard to define without ending up with either everything or nothing.

    Third, the disclosure-versus-contribution distinction. CRediT is a contributorship taxonomy. AI is not a contributor — that is the settled position. Adding an AI role to CRediT risks blurring this. The alternative is to keep AI in a separate disclosure form, structurally similar to a competing-interests declaration or a funding statement, rather than in the contributorship statement.

    A possible middle path

    The middle path is to keep CRediT at 14 roles and to define a parallel AI assistance declaration with comparable structure: a controlled vocabulary of AI-use types, a per-contributor breakdown linked to ORCID iDs, a model-and-version field, and a verification statement. This would sit alongside CRediT in publisher submission systems and JATS XML, rather than inside it.

    This is closer to where the current publisher disclosure forms are heading, and it preserves the conceptual clarity that CRediT roles describe what humans did, while a separate declaration describes what AI tools were used. We are increasingly inclined to recommend this path, with the caveat that the disclosure must be structured to the same standard as CRediT — not free-text, with controlled vocabularies, deposited to Crossref, and surfaced on ORCID.

    What the CRediT stewardship group should do next

    Three concrete steps. First, run a structured community consultation through 2026 on whether to add AI assistance as a 15th CRediT role, with the alternative being a parallel structured declaration. The CRediT governance page outlines the consultation process. Second, in parallel, draft the data model for a parallel AI assistance declaration so that the comparison is concrete and not abstract. Third, coordinate with NISO on whether either option requires a revision to Z39.104.

    The decision is not urgent in the sense that the integrity system is failing today; the existing disclosure forms work, badly. It is urgent in the sense that every year of delay produces another year of unstructured AI-use data that cannot be aggregated or analysed, which makes the eventual transition harder.

    Related dictionary entries

  • NSPM-33 disclosure: what US researchers must report in 2026

    National Security Presidential Memorandum 33 (NSPM-33), signed in January 2021, directed US federal research funding agencies to strengthen and harmonise disclosure requirements for federally funded researchers. Five years later the implementation has stabilised across NIH, NSF, DOE, DOD, NASA, USDA, and the other major science agencies, with the CHIPS and Science Act of 2022 having added enforcement teeth and the 2024 Research Security Programs Standard Requirement having added institutional-level obligations. This post is the practical 2026 compliance map for US-funded researchers.

    The shape of NSPM-33 in 2026

    NSPM-33’s core mandate is straightforward: a federally-funded researcher must disclose all support they receive (financial, in-kind, or in the form of positions, appointments, or affiliations) so that the funding agency can identify potential conflicts of commitment, undisclosed foreign components, or scientific overlap. The disclosure is made at proposal stage and updated throughout the project’s life.

    The five years of implementation have produced two important refinements. First, the common disclosure forms: NIH’s Other Support format, NSF’s Current and Pending (Other) Support, and parallel formats at other agencies have been substantially harmonised under the NSPM-33 implementation guidance. By 2026 a researcher can largely produce one structured disclosure record (typically in SciENcv format) and have it serve all federal agencies. Second, the structured-data submission: the agencies now require disclosure forms in machine-readable format with ORCID linkage, not as free-form PDFs.

    What must be disclosed

    The 2026 disclosure scope at the major agencies covers, at a minimum:

    • All ongoing and pending research support (federal, non-federal, and foreign).
    • All in-kind support of significance (laboratory space, equipment access, personnel time).
    • All positions and appointments (professorships, visiting positions, advisory roles, board memberships) regardless of whether they are paid.
    • All consulting arrangements above a defined threshold (typically a few thousand dollars per year, but agency-specific).
    • Foreign government talent recruitment programme participation (see below).
    • Patents and patent applications related to the funded research.
    • Sponsored or paid travel above defined thresholds.
    • For NIH specifically, all support for research effort regardless of how titled.

    The Current and Pending Support form (NSF terminology) and the Other Support form (NIH terminology) are the canonical artefacts. They are populated by the researcher at the proposal stage and re-verified at the just-in-time (JIT) request stage if the proposal is funded.

    The foreign-component question

    The single most consequential 2021-2024 enforcement focus was undisclosed foreign components. A foreign component is any significant scientific element of a project performed outside the United States by any source of funding, including foreign collaborator efforts even if not separately funded.

    NIH’s foreign-component disclosure rule existed before NSPM-33 but was inconsistently enforced. Post-NSPM-33 the enforcement has been substantial: dozens of researchers had grants terminated or returned, and several criminal cases proceeded for fabricated disclosures. The 2023-2024 cohort of cases clarified the threshold: an undisclosed foreign-funded position, a foreign-government talent-recruitment-programme membership, or a substantial unreported collaboration with a foreign laboratory are all material non-disclosures with grant-termination and criminal consequences.

    In 2026 the practical rule is conservative: if you have any affiliation, position, support, or significant collaboration outside the US that overlaps in time with your federal-funded project, disclose it. The cost of over-disclosure is filling in more forms; the cost of under-disclosure has become very high.

    Foreign Talent Recruitment Programmes

    The Foreign Talent Recruitment Programme (FTRP) category was sharpened by Section 10632 of the CHIPS and Science Act of 2022, which required agencies to prohibit federally-funded researchers from participating in malign FTRPs. The 2024 implementation guidance defined a malign FTRP as one that involves transfer of intellectual property, transfer of laboratory resources, or compensation contingent on outcomes that benefit a foreign government’s national interests, among several other criteria.

    The category is narrower than the original 2018-2021 “China Initiative” framing might have suggested. Participation in a non-malign FTRP (a competitive postdoctoral programme, an academic exchange visit, an honorary professorship) is not prohibited but must be disclosed. Participation in a malign FTRP is prohibited for federally-funded researchers and must be terminated as a condition of receiving federal funding.

    The institutional-side burden under the 2024 Research Security Programs Standard Requirement is substantial: institutions over a defined funding threshold must implement a research security programme with training, conflict-of-interest screening, foreign-collaboration approval, and ongoing monitoring. The standard requirement specifies the elements; institutions implement them with their own policies.

    The reporting workflow in practice

    The 2026 workflow for a federally-funded researcher at a US institution typically looks like:

    1. SciENcv profile. Maintain a current SciENcv profile with all positions, appointments, and support. SciENcv (Science Experts Network Curriculum Vitae) is the federal-government-supported tool and produces the structured-data formats accepted by NIH, NSF, and other agencies.
    2. Proposal-stage disclosure. Export the relevant disclosure form from SciENcv at proposal preparation. Verify with the institution’s sponsored-research office before submission.
    3. JIT update. For NIH, re-verify Other Support at JIT request. Any changes since proposal submission must be reported.
    4. Award updates. Any new support, position, or appointment acquired during the award must be reported to the agency. NIH’s threshold is “significant changes”; in practice, disclose anything that would have been on the original form.
    5. Annual progress reports. RPPR and other annual reporting captures updated Other Support and current-and-pending. Treat this as a real update, not a copy-paste.
    6. Final reports and closeout. Disclosure obligations continue through closeout.

    Institutional research security programmes

    The 2024 Research Security Programs Standard Requirement obligates institutions over the $50M annual federal-research threshold to operate a research security programme covering: cybersecurity training, foreign-collaboration approval workflow, conflict-of-interest and conflict-of-commitment training, export-control compliance, and ongoing monitoring of researchers’ disclosures against external data sources.

    The institutional layer matters because most disclosure failures are not fraud; they are inadvertent omission by researchers who did not realise an affiliation was disclosable. A well-functioning research security programme acts as a backstop, with regular reminders, training, and a pre-submission review that catches omissions before they become non-disclosures of consequence.

    The CASRAI funder-mandate guide covers the agency-specific disclosure requirements with current links; the research-security domain tracks the cross-agency policy harmonisation.

    What’s still uncertain

    Three areas remain in active interpretation in 2026. First, the treatment of dual-affiliated researchers: a researcher with a tenured position at a US institution and a part-time appointment at a non-US institution must disclose both, but the threshold for the non-US appointment counting as a foreign component is fuzzy in practice. Second, the scope of the conflict-of-commitment definition: an unpaid advisory role at a foreign institution may not count as support but does count as commitment; the agencies vary in how they treat this. Third, the retroactive application: disclosure failures discovered years after the funded work was completed have been treated with substantial inconsistency, with some cases pursued criminally and others handled administratively.

    For researchers, the safe path is conservative disclosure, current SciENcv maintenance, and proactive consultation with the institution’s sponsored-research office whenever an affiliation or support is ambiguous. The compliance cost of asking is low; the cost of under-disclosure that surfaces later is potentially career-ending.

    Related dictionary entries

    References

    NSTC Joint Committee on the Research Environment, Guidance for Implementing National Security Presidential Memorandum 33 (January 2022). NIH Office of Extramural Research, Notice of Information: Updates to Other Support (NOT-OD-22-150 and subsequent updates). CHIPS and Science Act of 2022, Section 10632 (Foreign Talent Recruitment Programs). OSTP, Research Security Programs Standard Requirement (July 2024). NSF, Proposal and Award Policies and Procedures Guide (current version).

  • Indigenous data governance: CARE Principles in practice

    The CARE Principles for Indigenous Data Governance were published in 2019 by the Global Indigenous Data Alliance (GIDA), expressing four principles – Collective benefit, Authority to control, Responsibility, Ethics – designed to sit alongside the FAIR principles when research data involves Indigenous Peoples, communities, lands, or knowledge. This post offers an introductory map of the CARE landscape in 2026, the relationships among the regional Indigenous data sovereignty movements that informed it, and the operational artefacts that researchers and institutions are using to apply CARE in practice. We write as outsiders to these traditions and rely on the published statements of Indigenous-led organisations; what follows is descriptive, not prescriptive, and any institution implementing CARE should engage directly with the communities whose data is in question.

    The CARE Principles

    The CARE Principles, drafted by Stephanie Russo Carroll, Maui Hudson, Tahu Kukutai, and colleagues through the GIDA, articulate that data governance is not only a question of technical FAIR-ness but of who has authority over data, who benefits, and what ethical commitments are owed. The four pillars are:

    • Collective benefit. Data ecosystems should be designed and function in ways that enable Indigenous Peoples to derive benefit from the data. Inclusive development and innovation; improved governance and citizen engagement; equitable outcomes.
    • Authority to control. Indigenous Peoples’ rights and interests in Indigenous data must be recognised and their authority to control such data must be empowered. Recognising rights and interests; data for governance; governance of data.
    • Responsibility. Those working with Indigenous data have a responsibility to share how those data are used. Capability for Indigenous communities; positive relationships; appropriate care for data.
    • Ethics. Indigenous Peoples’ rights and wellbeing should be the primary concern at all stages of the data life cycle. Minimising harm and maximising benefit; justice; future use.

    The principles are deliberately at the level of governance commitments, not the level of technical implementation. Their operationalisation depends on engagement with specific communities and their own governance institutions.

    The regional movements that preceded CARE

    CARE did not emerge in isolation. It is the international synthesis of regional Indigenous data-sovereignty movements that had been building governance frameworks for years.

    OCAP® Principles (Canada)

    The OCAP® Principles – Ownership, Control, Access, Possession – were articulated in 1998 by the First Nations Information Governance Centre (FNIGC) and have governed First Nations data in Canada since. OCAP is a registered trademark of FNIGC; the principles assert that First Nations have collective ownership of their information, control over how it is collected and used, access to it, and physical possession of it. FNIGC operates training programmes that researchers working with First Nations data are expected to complete; multiple Canadian Tri-Agency Indigenous research policies reference OCAP explicitly.

    Te Mana Raraunga (Aotearoa New Zealand)

    Te Mana Raraunga is the Māori Data Sovereignty Network, established 2015. Te Mana Raraunga articulates Māori data sovereignty rooted in tino rangatiratanga (self-determination) under Te Tiriti o Waitangi. The Network’s foundational statements include the 2018 Principles of Māori Data Sovereignty, which were among the documents informing CARE. The relationship between Te Mana Raraunga’s Māori-specific frame and the international CARE frame is one of mutual recognition; Te Mana Raraunga operates with the authority of Māori governance, not as an instance of an international standard.

    Maiam nayri Wingara (Australia)

    Maiam nayri Wingara, the Aboriginal and Torres Strait Islander Data Sovereignty Collective, was established in 2017 and articulated principles of Indigenous data sovereignty for Australia in 2018. The collective’s work emphasises the rights of Aboriginal and Torres Strait Islander peoples to control data about their people, communities, lands, and waters. The Australian Indigenous Health-Welfare Data Working Group and several federal agencies’ Indigenous data policies reference Maiam nayri Wingara’s frame.

    Other regional movements

    Indigenous data sovereignty movements with their own governance frameworks operate in many other contexts, including Sámi Council work in Sapmi, Native American data sovereignty organising in the United States (the United South and Eastern Tribes Tribal Health Program and others), Indigenous Latin American collectives, and others. The CARE Principles refer to and respect this plurality; they are not a substitute for any of these regional frameworks but a complement at international scale.

    How CARE relates to FAIR

    CARE and FAIR are designed to coexist. FAIR addresses technical interoperability and data reusability; CARE addresses governance authority and ethical commitments. A dataset can be both FAIR and CARE-compliant; a dataset can also be FAIR while failing CARE (technically open data that violates community authority); a dataset can be CARE-compliant while not openly FAIR (community-controlled data with restricted access in line with community decision).

    The GIDA’s published positioning is that CARE precedes FAIR when Indigenous data is involved: the questions of authority, benefit, responsibility, and ethics must be settled before the questions of findability, accessibility, interoperability, and reusability are operationalised. A FAIR-without-CARE approach to Indigenous data has historically reproduced harm; CARE asks researchers and institutions to do the governance work first.

    Free, Prior, and Informed Consent

    Free, Prior, and Informed Consent (FPIC) is the international human-rights principle, articulated in the UN Declaration on the Rights of Indigenous Peoples (UNDRIP, 2007) and widely adopted, that Indigenous Peoples must be consulted and consent obtained before any project affecting them, their lands, or their resources proceeds. FPIC applies to research projects involving Indigenous communities, knowledge, or data. The four elements – free (without coercion), prior (sufficiently in advance), informed (with adequate information), consent (with a community decision-making process) – are all substantive.

    FPIC operationalisation depends on the community in question. Some communities have formal protocols and Indigenous Research Ethics committees; others negotiate consent through community-leader engagement; others may decline participation. In all cases the timing of the consent process matters: FPIC sought after a project has been designed is generally not FPIC; FPIC must precede project design or at minimum precede any irreversible step.

    Traditional Knowledge Labels and Local Contexts

    Traditional Knowledge (TK) Labels and Biocultural (BC) Labels, developed by the Local Contexts initiative led by Jane Anderson and Kim Christen, are metadata labels that can be attached to datasets, archival records, or collection items to communicate community-defined permissions, attribution requirements, and cultural protocols. TK Labels include labels for attribution, non-commercial use, outreach, family or clan use, ceremonial use, and others; BC Labels cover biocultural specimens and data with similar granularity.

    The labels are not legal instruments by themselves; they are governance signals issued by communities that researchers and institutions are expected to respect. Several repositories (notably the Mukurtu CMS platform, also developed by Christen and colleagues) integrate TK and BC Labels natively. By 2026 several major museums, archives, and a small but growing number of institutional research repositories support TK Labels at the record level.

    Practical implementation for institutions

    An institution beginning to operationalise CARE alongside its FAIR practice would, in the broadest terms, attend to:

    1. Recognising the priority of community authority over data concerning Indigenous peoples, lands, and knowledge, and reflecting this in institutional research-data policy.
    2. Engaging with communities through their own governance institutions early, with FPIC understood as a substantive process not a checkbox.
    3. Adopting the relevant regional principles where applicable (OCAP in Canada, Te Mana Raraunga principles in Aotearoa, Maiam nayri Wingara in Australia, etc.) rather than treating CARE as a substitute.
    4. Supporting researchers in their institution with training, ethics-board capacity, and community-engagement resources; not pushing the burden onto Indigenous researchers within the institution.
    5. Implementing technical support for community-defined permissions (TK Labels, access-control models that respect community decision) in institutional repositories.
    6. Reporting transparently to communities about how data is used, with channels for community-initiated change to data status.

    Several institutional CRIS and repository vendors have begun adding CARE-aware functionality (TK Label support, community-attribution fields, access-control models that respect community-defined permissions). The CASRAI Indigenous data CARE domain tracks adoption.

    The integrity question

    The honest position for non-Indigenous researchers and institutions is that operationalising CARE well requires deferring to Indigenous-led governance, not designing one’s own “CARE-compliant” system. The literature is consistent on this point: the CARE Principles were developed by Indigenous-led organisations and their authoritative interpretation rests with those organisations and the communities they serve. The CARE Principles are not a checklist that an external institution can mark itself against and self-certify on.

    The implication for institutions and researchers is that the CARE work is relational and ongoing rather than one-time and administrative. The investment is in long-term partnerships with communities, capacity-building within Indigenous research leadership, and a willingness to share authority over how data flows into and out of institutional systems. The technical artefacts (TK Labels, FPIC processes, Mukurtu integrations) support the relational work; they do not substitute for it.

    Where to learn more

    For non-Indigenous researchers and institutions beginning this work, the foundational reading is the GIDA’s published statement of the CARE Principles, alongside the regional movements’ own foundational documents (FNIGC on OCAP, Te Mana Raraunga on Māori data sovereignty, Maiam nayri Wingara on Aboriginal and Torres Strait Islander data sovereignty). The Carroll, Hudson, Kukutai, et al. 2020 paper in Data Science Journal is the foundational scholarly reference for CARE. The Local Contexts initiative’s documentation is the foundational reference for TK and BC Labels. The Mukurtu CMS documentation is the foundational technical reference for community-controlled repository implementation.

    Related dictionary entries

    References

    Carroll, Hudson, Kukutai, et al., The CARE Principles for Indigenous Data Governance (Data Science Journal, 2020). GIDA, CARE Principles for Indigenous Data Governance (founding statement, 2019). First Nations Information Governance Centre, The First Nations Principles of OCAP® (foundational and ongoing publications). Te Mana Raraunga, Principles of Māori Data Sovereignty (2018). Maiam nayri Wingara, Indigenous Data Sovereignty Communique (2018). UN Declaration on the Rights of Indigenous Peoples (2007). Anderson and Christen, work on Traditional Knowledge Labels and the Local Contexts initiative (ongoing).

  • ORCID 4.0: the IDR roadmap and what it means for CASRAI integrations

    ORCID’s Integration and Data Roadmap (IDR) work, which culminated in late 2025 with the 4.0 release of the public and member APIs, is the most consequential PID infrastructure change of the year for anyone who cares about the contributor-affiliation-funding crosswalk. The headline is technical: a new contributions resource that supersedes the old works and employment pairing for representing what a researcher did, where, on whose money, and with whom. The implications reach into nearly every persistent-identifier integration CASRAI tracks.

    What 4.0 actually changes

    The pre-4.0 ORCID record was a federation of resource types: works (with DOIs), employment (with ROR organisation IDs), education, funding (with grant IDs and Funder Registry entries), peer reviews, and the like. Each was useful in isolation. None of them carried the relations between them in a structured form. If a researcher’s ORCID record listed a paper, an employment at the institution that hosted the work, and a grant that funded the work, those three facts sat in separate resources with no machine-readable link.

    4.0 introduces a top-level contribution entity that binds these. A contribution carries: a primary artefact (DOI, software identifier, dataset identifier, or RAiD), a set of CRediT roles with the degree-of-contribution qualifier, an affiliation in force at the time of the contribution (with ROR), funding in force at the time (with Funder Registry or ROR for the funder, plus the grant identifier and ideally a RAiD), and a temporal span. The relationships are explicit and queryable. A consuming system can ask: what did this researcher contribute, at this affiliation, under this grant, on this date? — and get an answer without inference.

    The CRediT-at-record-level integration matures

    The 2024 work to allow CRediT roles to live on an ORCID record (not just in publisher JATS) was the precursor to 4.0. The integration shipped, was widely adopted, and exposed two limitations that 4.0 closes. First, role assignments lived inside the work resource, making it awkward to express a Conceptualization role spanning several papers and datasets. Second, the qualifier was carried only at per-work granularity. 4.0 lets a CRediT role attach to a contribution that groups multiple artefacts, with the qualifier traveling with the contribution.

    Practical example: a researcher who is Lead for Conceptualization across a clinical trial’s primary paper, protocol paper, registered data, and statistical analysis plan should be representable that way. Pre-4.0, the assertion had to be repeated four times; post-4.0, it lives on the contribution entity. See the ORCID implementation guide for the API patterns.

    RAiD becomes a first-class citizen

    One of the unsung wins in 4.0 is the elevation of RAiD to a first-class identifier alongside DOI. Pre-4.0, RAiD could be carried in an ORCID funding resource as an external identifier, but the schema treated it as a second-tier metadata field. 4.0 adds RAiD to the primary identifier set for both contributions and funding, with the same validation and resolution support as DOI.

    This matters because RAiD is increasingly the canonical project-level identifier, and ORCID is increasingly the canonical person-level record. The interlock — researcher X contributed to project RAiD Y, which produced papers A, B, C — is now a structured query rather than a string-match exercise.

    Affiliation history with PIDs at both ends

    The 4.0 employment and affiliation model has been quietly tightened. Every affiliation now requires a ROR organisational ID at registration; legacy string-only affiliations are preserved but flagged. The optional department field accepts a ROR sub-organisation ID where one exists (the ROR hierarchy work has caught up to make this practical), or a free-text department name as a fallback. The result is that affiliation history on an ORCID record is now reliably machine-readable at the ROR ID level.

    For institutions running a CRIS, this closes a longstanding crosswalk gap. CRIS-to-ORCID deposit can now write structured affiliations that ORCID-to-CRIS retrieval can read back without ambiguity. The CASRAI CRIS integration guide has been updated with the 4.0 deposit patterns.

    What CASRAI integrations need to do

    Three things, in priority order.

    1. Update CRediT JATS round-trips. Publishers depositing structured CRediT to ORCID via the member API should switch to the contribution resource for new deposits. Legacy works-with-roles deposits will continue to be accepted through 2026 but will be migrated server-side in 2027. The CASRAI CRediT JATS integration patterns now include both the legacy and the 4.0 deposit forms; new integrators should implement only the 4.0 form.
    2. Validate ROR IDs at affiliation deposit. A CRIS or publisher pushing affiliation data to ORCID should resolve and validate the ROR ID before deposit. The 4.0 API will reject obviously bad ROR IDs at the schema layer but will accept ROR IDs that resolve to deprecated or merged records. A pre-deposit validation pass against the ROR public API catches the common error cases.
    3. Test the funding-to-contribution link. If your integration writes funding entries, link them explicitly to the contributions they funded via the new funded_by relation on the contribution resource. This is the integration point that was missing pre-4.0 and that downstream consumers (funder dashboards, institutional reporting) most want to query.

    Backwards compatibility and the migration window

    ORCID’s commitment is that the 3.x APIs remain available through end-of-2027, with the 4.0 API the recommended target from now. The data model migration is largely automatic for existing records: pre-existing works with associated employment and funding are projected into the contribution model server-side. Consumers reading via the 4.0 API will see contribution entities even for data that was deposited in the 3.x form.

    The one wrinkle is CRediT role assignments that were deposited in 3.x without explicit qualifiers. These project into the contribution model with no qualifier set — a valid state, but less informative than it could be. Publishers should re-deposit historical CRediT data with qualifiers where they have them during 2026.

    What this enables downstream

    The most interesting consequence of 4.0 is the ability to ask compound questions across the PID graph. Which researchers, affiliated with which institutions, contributed in which CRediT roles, to outputs funded by which funders, on which projects? — that query reduces to a structured traversal across ORCID, ROR, Crossref/DataCite, and the Funder Registry, with RAiD optionally tying the project layer together. The OpenAIRE Graph already operationalises a version of this; 4.0 makes it cleaner.

    For institutions, the practical implication is that reporting against funder mandates becomes substantially less manual. For publishers, the JATS-to-ORCID deposit becomes more valuable because it now persists in a queryable graph. For funders, the funder-PID-to-output traceability that ORCID has long promised starts to deliver at scale.

    What’s still missing

    4.0 does not solve everything. The contributor-affiliation-funding triple is now structured; the contributor-contributor relationship (collaboration graphs, mentorship) is not. A relationships resource is in development but not in 4.0. CARE-aligned identifiers for Indigenous researchers are also still in design.

    CASRAI’s integration tracking will follow 4.0 through 2026. The persistent-identifiers domain is being updated to reflect the contribution model; the ORCID federation page tracks member implementation.

    Related dictionary entries