Category: Analysis

Explanatory deep-dives on standards, frameworks, and the open-research landscape.

  • Indigenous data governance: CARE Principles in practice

    The CARE Principles for Indigenous Data Governance were published in 2019 by the Global Indigenous Data Alliance (GIDA), expressing four principles – Collective benefit, Authority to control, Responsibility, Ethics – designed to sit alongside the FAIR principles when research data involves Indigenous Peoples, communities, lands, or knowledge. This post offers an introductory map of the CARE landscape in 2026, the relationships among the regional Indigenous data sovereignty movements that informed it, and the operational artefacts that researchers and institutions are using to apply CARE in practice. We write as outsiders to these traditions and rely on the published statements of Indigenous-led organisations; what follows is descriptive, not prescriptive, and any institution implementing CARE should engage directly with the communities whose data is in question.

    The CARE Principles

    The CARE Principles, drafted by Stephanie Russo Carroll, Maui Hudson, Tahu Kukutai, and colleagues through the GIDA, articulate that data governance is not only a question of technical FAIR-ness but of who has authority over data, who benefits, and what ethical commitments are owed. The four pillars are:

    • Collective benefit. Data ecosystems should be designed and function in ways that enable Indigenous Peoples to derive benefit from the data. Inclusive development and innovation; improved governance and citizen engagement; equitable outcomes.
    • Authority to control. Indigenous Peoples’ rights and interests in Indigenous data must be recognised and their authority to control such data must be empowered. Recognising rights and interests; data for governance; governance of data.
    • Responsibility. Those working with Indigenous data have a responsibility to share how those data are used. Capability for Indigenous communities; positive relationships; appropriate care for data.
    • Ethics. Indigenous Peoples’ rights and wellbeing should be the primary concern at all stages of the data life cycle. Minimising harm and maximising benefit; justice; future use.

    The principles are deliberately at the level of governance commitments, not the level of technical implementation. Their operationalisation depends on engagement with specific communities and their own governance institutions.

    The regional movements that preceded CARE

    CARE did not emerge in isolation. It is the international synthesis of regional Indigenous data-sovereignty movements that had been building governance frameworks for years.

    OCAP® Principles (Canada)

    The OCAP® Principles – Ownership, Control, Access, Possession – were articulated in 1998 by the First Nations Information Governance Centre (FNIGC) and have governed First Nations data in Canada since. OCAP is a registered trademark of FNIGC; the principles assert that First Nations have collective ownership of their information, control over how it is collected and used, access to it, and physical possession of it. FNIGC operates training programmes that researchers working with First Nations data are expected to complete; multiple Canadian Tri-Agency Indigenous research policies reference OCAP explicitly.

    Te Mana Raraunga (Aotearoa New Zealand)

    Te Mana Raraunga is the Māori Data Sovereignty Network, established 2015. Te Mana Raraunga articulates Māori data sovereignty rooted in tino rangatiratanga (self-determination) under Te Tiriti o Waitangi. The Network’s foundational statements include the 2018 Principles of Māori Data Sovereignty, which were among the documents informing CARE. The relationship between Te Mana Raraunga’s Māori-specific frame and the international CARE frame is one of mutual recognition; Te Mana Raraunga operates with the authority of Māori governance, not as an instance of an international standard.

    Maiam nayri Wingara (Australia)

    Maiam nayri Wingara, the Aboriginal and Torres Strait Islander Data Sovereignty Collective, was established in 2017 and articulated principles of Indigenous data sovereignty for Australia in 2018. The collective’s work emphasises the rights of Aboriginal and Torres Strait Islander peoples to control data about their people, communities, lands, and waters. The Australian Indigenous Health-Welfare Data Working Group and several federal agencies’ Indigenous data policies reference Maiam nayri Wingara’s frame.

    Other regional movements

    Indigenous data sovereignty movements with their own governance frameworks operate in many other contexts, including Sámi Council work in Sapmi, Native American data sovereignty organising in the United States (the United South and Eastern Tribes Tribal Health Program and others), Indigenous Latin American collectives, and others. The CARE Principles refer to and respect this plurality; they are not a substitute for any of these regional frameworks but a complement at international scale.

    How CARE relates to FAIR

    CARE and FAIR are designed to coexist. FAIR addresses technical interoperability and data reusability; CARE addresses governance authority and ethical commitments. A dataset can be both FAIR and CARE-compliant; a dataset can also be FAIR while failing CARE (technically open data that violates community authority); a dataset can be CARE-compliant while not openly FAIR (community-controlled data with restricted access in line with community decision).

    The GIDA’s published positioning is that CARE precedes FAIR when Indigenous data is involved: the questions of authority, benefit, responsibility, and ethics must be settled before the questions of findability, accessibility, interoperability, and reusability are operationalised. A FAIR-without-CARE approach to Indigenous data has historically reproduced harm; CARE asks researchers and institutions to do the governance work first.

    Free, Prior, and Informed Consent

    Free, Prior, and Informed Consent (FPIC) is the international human-rights principle, articulated in the UN Declaration on the Rights of Indigenous Peoples (UNDRIP, 2007) and widely adopted, that Indigenous Peoples must be consulted and consent obtained before any project affecting them, their lands, or their resources proceeds. FPIC applies to research projects involving Indigenous communities, knowledge, or data. The four elements – free (without coercion), prior (sufficiently in advance), informed (with adequate information), consent (with a community decision-making process) – are all substantive.

    FPIC operationalisation depends on the community in question. Some communities have formal protocols and Indigenous Research Ethics committees; others negotiate consent through community-leader engagement; others may decline participation. In all cases the timing of the consent process matters: FPIC sought after a project has been designed is generally not FPIC; FPIC must precede project design or at minimum precede any irreversible step.

    Traditional Knowledge Labels and Local Contexts

    Traditional Knowledge (TK) Labels and Biocultural (BC) Labels, developed by the Local Contexts initiative led by Jane Anderson and Kim Christen, are metadata labels that can be attached to datasets, archival records, or collection items to communicate community-defined permissions, attribution requirements, and cultural protocols. TK Labels include labels for attribution, non-commercial use, outreach, family or clan use, ceremonial use, and others; BC Labels cover biocultural specimens and data with similar granularity.

    The labels are not legal instruments by themselves; they are governance signals issued by communities that researchers and institutions are expected to respect. Several repositories (notably the Mukurtu CMS platform, also developed by Christen and colleagues) integrate TK and BC Labels natively. By 2026 several major museums, archives, and a small but growing number of institutional research repositories support TK Labels at the record level.

    Practical implementation for institutions

    An institution beginning to operationalise CARE alongside its FAIR practice would, in the broadest terms, attend to:

    1. Recognising the priority of community authority over data concerning Indigenous peoples, lands, and knowledge, and reflecting this in institutional research-data policy.
    2. Engaging with communities through their own governance institutions early, with FPIC understood as a substantive process not a checkbox.
    3. Adopting the relevant regional principles where applicable (OCAP in Canada, Te Mana Raraunga principles in Aotearoa, Maiam nayri Wingara in Australia, etc.) rather than treating CARE as a substitute.
    4. Supporting researchers in their institution with training, ethics-board capacity, and community-engagement resources; not pushing the burden onto Indigenous researchers within the institution.
    5. Implementing technical support for community-defined permissions (TK Labels, access-control models that respect community decision) in institutional repositories.
    6. Reporting transparently to communities about how data is used, with channels for community-initiated change to data status.

    Several institutional CRIS and repository vendors have begun adding CARE-aware functionality (TK Label support, community-attribution fields, access-control models that respect community-defined permissions). The CASRAI Indigenous data CARE domain tracks adoption.

    The integrity question

    The honest position for non-Indigenous researchers and institutions is that operationalising CARE well requires deferring to Indigenous-led governance, not designing one’s own “CARE-compliant” system. The literature is consistent on this point: the CARE Principles were developed by Indigenous-led organisations and their authoritative interpretation rests with those organisations and the communities they serve. The CARE Principles are not a checklist that an external institution can mark itself against and self-certify on.

    The implication for institutions and researchers is that the CARE work is relational and ongoing rather than one-time and administrative. The investment is in long-term partnerships with communities, capacity-building within Indigenous research leadership, and a willingness to share authority over how data flows into and out of institutional systems. The technical artefacts (TK Labels, FPIC processes, Mukurtu integrations) support the relational work; they do not substitute for it.

    Where to learn more

    For non-Indigenous researchers and institutions beginning this work, the foundational reading is the GIDA’s published statement of the CARE Principles, alongside the regional movements’ own foundational documents (FNIGC on OCAP, Te Mana Raraunga on Māori data sovereignty, Maiam nayri Wingara on Aboriginal and Torres Strait Islander data sovereignty). The Carroll, Hudson, Kukutai, et al. 2020 paper in Data Science Journal is the foundational scholarly reference for CARE. The Local Contexts initiative’s documentation is the foundational reference for TK and BC Labels. The Mukurtu CMS documentation is the foundational technical reference for community-controlled repository implementation.

    Related dictionary entries

    References

    Carroll, Hudson, Kukutai, et al., The CARE Principles for Indigenous Data Governance (Data Science Journal, 2020). GIDA, CARE Principles for Indigenous Data Governance (founding statement, 2019). First Nations Information Governance Centre, The First Nations Principles of OCAP® (foundational and ongoing publications). Te Mana Raraunga, Principles of Māori Data Sovereignty (2018). Maiam nayri Wingara, Indigenous Data Sovereignty Communique (2018). UN Declaration on the Rights of Indigenous Peoples (2007). Anderson and Christen, work on Traditional Knowledge Labels and the Local Contexts initiative (ongoing).

  • ORCID 4.0: the IDR roadmap and what it means for CASRAI integrations

    ORCID’s Integration and Data Roadmap (IDR) work, which culminated in late 2025 with the 4.0 release of the public and member APIs, is the most consequential PID infrastructure change of the year for anyone who cares about the contributor-affiliation-funding crosswalk. The headline is technical: a new contributions resource that supersedes the old works and employment pairing for representing what a researcher did, where, on whose money, and with whom. The implications reach into nearly every persistent-identifier integration CASRAI tracks.

    What 4.0 actually changes

    The pre-4.0 ORCID record was a federation of resource types: works (with DOIs), employment (with ROR organisation IDs), education, funding (with grant IDs and Funder Registry entries), peer reviews, and the like. Each was useful in isolation. None of them carried the relations between them in a structured form. If a researcher’s ORCID record listed a paper, an employment at the institution that hosted the work, and a grant that funded the work, those three facts sat in separate resources with no machine-readable link.

    4.0 introduces a top-level contribution entity that binds these. A contribution carries: a primary artefact (DOI, software identifier, dataset identifier, or RAiD), a set of CRediT roles with the degree-of-contribution qualifier, an affiliation in force at the time of the contribution (with ROR), funding in force at the time (with Funder Registry or ROR for the funder, plus the grant identifier and ideally a RAiD), and a temporal span. The relationships are explicit and queryable. A consuming system can ask: what did this researcher contribute, at this affiliation, under this grant, on this date? — and get an answer without inference.

    The CRediT-at-record-level integration matures

    The 2024 work to allow CRediT roles to live on an ORCID record (not just in publisher JATS) was the precursor to 4.0. The integration shipped, was widely adopted, and exposed two limitations that 4.0 closes. First, role assignments lived inside the work resource, making it awkward to express a Conceptualization role spanning several papers and datasets. Second, the qualifier was carried only at per-work granularity. 4.0 lets a CRediT role attach to a contribution that groups multiple artefacts, with the qualifier traveling with the contribution.

    Practical example: a researcher who is Lead for Conceptualization across a clinical trial’s primary paper, protocol paper, registered data, and statistical analysis plan should be representable that way. Pre-4.0, the assertion had to be repeated four times; post-4.0, it lives on the contribution entity. See the ORCID implementation guide for the API patterns.

    RAiD becomes a first-class citizen

    One of the unsung wins in 4.0 is the elevation of RAiD to a first-class identifier alongside DOI. Pre-4.0, RAiD could be carried in an ORCID funding resource as an external identifier, but the schema treated it as a second-tier metadata field. 4.0 adds RAiD to the primary identifier set for both contributions and funding, with the same validation and resolution support as DOI.

    This matters because RAiD is increasingly the canonical project-level identifier, and ORCID is increasingly the canonical person-level record. The interlock — researcher X contributed to project RAiD Y, which produced papers A, B, C — is now a structured query rather than a string-match exercise.

    Affiliation history with PIDs at both ends

    The 4.0 employment and affiliation model has been quietly tightened. Every affiliation now requires a ROR organisational ID at registration; legacy string-only affiliations are preserved but flagged. The optional department field accepts a ROR sub-organisation ID where one exists (the ROR hierarchy work has caught up to make this practical), or a free-text department name as a fallback. The result is that affiliation history on an ORCID record is now reliably machine-readable at the ROR ID level.

    For institutions running a CRIS, this closes a longstanding crosswalk gap. CRIS-to-ORCID deposit can now write structured affiliations that ORCID-to-CRIS retrieval can read back without ambiguity. The CASRAI CRIS integration guide has been updated with the 4.0 deposit patterns.

    What CASRAI integrations need to do

    Three things, in priority order.

    1. Update CRediT JATS round-trips. Publishers depositing structured CRediT to ORCID via the member API should switch to the contribution resource for new deposits. Legacy works-with-roles deposits will continue to be accepted through 2026 but will be migrated server-side in 2027. The CASRAI CRediT JATS integration patterns now include both the legacy and the 4.0 deposit forms; new integrators should implement only the 4.0 form.
    2. Validate ROR IDs at affiliation deposit. A CRIS or publisher pushing affiliation data to ORCID should resolve and validate the ROR ID before deposit. The 4.0 API will reject obviously bad ROR IDs at the schema layer but will accept ROR IDs that resolve to deprecated or merged records. A pre-deposit validation pass against the ROR public API catches the common error cases.
    3. Test the funding-to-contribution link. If your integration writes funding entries, link them explicitly to the contributions they funded via the new funded_by relation on the contribution resource. This is the integration point that was missing pre-4.0 and that downstream consumers (funder dashboards, institutional reporting) most want to query.

    Backwards compatibility and the migration window

    ORCID’s commitment is that the 3.x APIs remain available through end-of-2027, with the 4.0 API the recommended target from now. The data model migration is largely automatic for existing records: pre-existing works with associated employment and funding are projected into the contribution model server-side. Consumers reading via the 4.0 API will see contribution entities even for data that was deposited in the 3.x form.

    The one wrinkle is CRediT role assignments that were deposited in 3.x without explicit qualifiers. These project into the contribution model with no qualifier set — a valid state, but less informative than it could be. Publishers should re-deposit historical CRediT data with qualifiers where they have them during 2026.

    What this enables downstream

    The most interesting consequence of 4.0 is the ability to ask compound questions across the PID graph. Which researchers, affiliated with which institutions, contributed in which CRediT roles, to outputs funded by which funders, on which projects? — that query reduces to a structured traversal across ORCID, ROR, Crossref/DataCite, and the Funder Registry, with RAiD optionally tying the project layer together. The OpenAIRE Graph already operationalises a version of this; 4.0 makes it cleaner.

    For institutions, the practical implication is that reporting against funder mandates becomes substantially less manual. For publishers, the JATS-to-ORCID deposit becomes more valuable because it now persists in a queryable graph. For funders, the funder-PID-to-output traceability that ORCID has long promised starts to deliver at scale.

    What’s still missing

    4.0 does not solve everything. The contributor-affiliation-funding triple is now structured; the contributor-contributor relationship (collaboration graphs, mentorship) is not. A relationships resource is in development but not in 4.0. CARE-aligned identifiers for Indigenous researchers are also still in design.

    CASRAI’s integration tracking will follow 4.0 through 2026. The persistent-identifiers domain is being updated to reflect the contribution model; the ORCID federation page tracks member implementation.

    Related dictionary entries