Tag: fair data maturity model

  • FAIR Dataset Mandates Risk Becoming a Checkbox

    A FAIR dataset is one that meets the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016 — but a funder mandate requiring deposit and a data management plan does not, on its own, guarantee this. Genuine FAIR compliance demands rich metadata, persistent identifiers and community-standard formats that most minimally compliant deposits skip entirely, because current incentive structures reward the act of depositing, not the work of curating.

    A FAIR dataset is a digital research object — data or its metadata — that satisfies the Findable, Accessible, Interoperable and Reusable principles first formalised by the FORCE11 community and published in Scientific Data in March 2016. The principles were designed to be applied in degrees, not as a pass/fail gate, which is precisely where funder policy and researcher practice have diverged.

    What does a FAIR dataset actually require?

    The FAIR principles set out four categories of requirement, each broken into specific sub-criteria. They are deliberately conceptual rather than prescriptive, which is a strength for cross-disciplinary adoption and a weakness for enforcement.

    • Findable — data and metadata carry a globally unique, persistent identifier and are indexed in a searchable resource.
    • Accessible — retrieval uses a standardised, open protocol, with metadata remaining accessible even when the underlying data cannot be.
    • Interoperable — data and metadata use a shared, formal language and vocabularies that follow FAIR principles themselves.
    • Reusable — data carry a clear licence, detailed provenance, and conform to domain-relevant community standards.

    The Research Data Alliance’s FAIR Data Maturity Model, published in 2020, decomposes these four principles into 41 discrete indicators covering both data and metadata. That granularity matters: a dataset can satisfy some indicators and fail most others while still being described, loosely, as “FAIR.” A funder checking only for repository deposit is verifying perhaps one or two of the 41.

    Why do funder mandates default to minimal compliance?

    Funder FAIR requirements typically operationalise as two things: a submitted data management plan and a deposit in a recognised repository at the end of the project. Neither step audits metadata richness, vocabulary use, or licensing clarity. The result is a policy that is easy to comply with and easy to satisfy without producing a dataset anyone outside the original team could actually reuse.

    Three structural gaps explain why:

    • Resourcing. Science Europe’s funders’ briefing on data management planning recommends that compliant curation cost roughly 5% of total research budget — a figure rarely built into grant awards, leaving curation as unfunded overhead.
    • Recognition. Data curation is not weighted in hiring, promotion or tenure decisions in most institutions, so time spent enriching metadata competes directly with time spent on publications that do count.
    • Standards gaps. Many disciplines still lack the domain-relevant community vocabularies that Interoperability and Reusability depend on, so even willing depositors have nothing FAIR-compliant to conform to.

    Horizon Europe requires that all data produced under the programme be FAIR “by default,” which is the strongest funder-level statement of intent currently in force. Yet the European Commission’s own guidance materials acknowledge that FAIRness is a spectrum, not a binary condition — an admission that sits uneasily alongside a compliance model built around a single deposit checkpoint.

    The maturity gap: from “FAIR start” to genuine reusability

    The European Commission’s Joint Research Centre published FAIR Data Guidelines in 2025 that organise the RDA’s 41 indicators into five progressive maturity levels. The framework is useful precisely because it makes visible how far “minimally compliant” sits from “genuinely reusable.”

    Maturity level What it requires
    FAIR start Published in a catalogue with mandatory metadata; data itself is not structured for machine reuse.
    FAIR play Links added between datasets and related resources, with enriched provenance and cross-referencing.
    FAIR go Data structured to community standards, with defined terminologies (not necessarily machine-readable).
    FAIR share Machine-readable data models (JSON Schema, XML Schema, SHACL) with richly documented provenance.
    FAIRest of them all Machine-readable model endorsed by the domain community; terms exposed via shared FAIR vocabularies.

    Most mandate-driven deposits land at “FAIR start” — indexed, licensed, discoverable, but not structured for genuine machine or cross-team reuse. The JRC guidelines are explicit that not every dataset needs the top tier, but they are equally explicit that FAIRness can degrade over time if metadata and platforms are not actively maintained. A one-off deposit satisfying a funder’s closeout requirement is not maintenance; it is a snapshot.

    Rebuilding incentives for genuine data stewardship

    Treating FAIR as a compliance checkbox is a governance failure, not a researcher failure. Three changes would shift the incentive structure toward genuine stewardship rather than deposit-and-forget behaviour.

    1. Credit the labour. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. “Data curation” is one of its fourteen roles, offering institutions an existing, citable mechanism to formally recognise stewardship work in author contribution statements — a mechanism that remains inconsistently applied in promotion and tenure review.
    2. Fund it explicitly. Grant budgets should ring-fence curation costs at the level Science Europe’s own guidance recommends, rather than treating data management plans as an unfunded compliance document.
    3. Audit maturity, not deposit. Funders and institutions should reference maturity models such as the RDA’s 41 indicators or the JRC’s five-level scale in closeout review, rather than accepting repository deposit as sufficient evidence of FAIR compliance.

    FAIR is also not a complete governance answer on its own. The CARE Principles for Indigenous Data Governance, released by the Global Indigenous Data Alliance in 2019, extend the framework to cover collective benefit, authority to control, responsibility and ethics — dimensions that a pure findability-and-format checklist does not touch. Institutions building data policy around FAIR alone are optimising for machine reuse while leaving governance and consent questions unaddressed.

    Frequently asked questions

    What is a FAIR dataset?

    A FAIR dataset satisfies the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016. It carries a persistent identifier, standardised access, shared vocabularies, and clear licensing and provenance — not merely a repository listing.

    What does FAIR stand for with data?

    FAIR stands for Findable, Accessible, Interoperable and Reusable. The acronym describes a framework for data stewardship, not a certification; the Research Data Alliance breaks it into 41 measurable indicators rather than a single pass condition.

    What does FAIR stand for in data management?

    In data management, FAIR describes the target state a data management plan should work toward: identifiers, rich metadata, open protocols and community-standard formats. It guides curation decisions throughout a project, not just the final deposit.

    Why does FAIR data matter?

    FAIR data matters because it lets both humans and machines discover, verify and reuse research outputs without contacting the original authors. Poorly curated “FAIR” deposits undermine reproducibility and waste the public investment funders intended the mandate to protect.

    Implications and outlook

    Funder FAIR mandates have succeeded in one respect: deposit rates have risen sharply since 2016. They have not, on current evidence, produced datasets that are reliably machine-actionable or cross-team reusable at scale. That gap will not close through stricter wording in policy documents; it requires funders to resource curation at realistic cost, institutions to credit it in career progression via mechanisms such as CRediT’s Data curation role, and disciplines to finish building the community standards that Interoperability depends on. Until those three conditions are met, “FAIR by default” will remain a policy aspiration rather than a description of the average deposited dataset.

  • FAIR Principles Data Maturity: Score Against RDA

    FAIR data maturity is scored by testing each dataset against the 41 indicators of the Research Data Alliance’s FAIR Data Maturity Model, grading Findability, Accessibility, Interoperability and Reusability separately, then weighting results by each indicator’s priority tier — essential, important, or useful. FAIR principles data management moves from an abstract commitment to a measurable score once an institution runs this test consistently across its repository.

    FAIR data is data that meets the Findable, Accessible, Interoperable and Reusable criteria first published by Wilkinson et al. in Scientific Data in 2016 — a paper now cited more than 22,000 times. This guide is a practical scoring walkthrough, not another explainer of what the four letters mean: it shows research offices how to actually audit existing datasets and repositories against the RDA model and turn the result into a remediation plan.

    What is the RDA FAIR Data Maturity Model?

    The RDA FAIR Data Maturity Model is a specification published by a Research Data Alliance working group in 2020 to standardise how organisations test FAIRness. Before it existed, dozens of institutions had built incompatible local checklists, making it impossible to compare a “FAIR score” from one repository against another.

    The model does not ship as software. It is a reference document that defines:

    • 41 indicators — testable statements mapped to the fifteen core GO FAIR sub-principles (F1–F4, A1–A2, I1–I3, R1–R1.3)
    • Three priority tiers — essential, important and useful — so institutions can triage effort rather than treat every indicator as equally urgent
    • Evaluation guidance — worked examples for testing each indicator against real metadata and data objects, rather than self-reported compliance

    Because the indicators trace directly to the GO FAIR principles, a dataset that scores well against the RDA model is, by construction, meeting the same criteria described in the original 2016 Scientific Data paper — just with a repeatable measurement attached.

    How does FAIR maturity scoring actually work?

    Scoring is done indicator by indicator, not principle by principle. Most institutions that implement the RDA model score each of the 41 indicators on a simple 0–4 scale — 0 (not implemented) through 4 (fully implemented) — then multiply by a priority weight before aggregating to a per-dataset and per-repository total.

    FAIR letter Sub-principles tested Typical essential-tier evidence
    Findable F1–F4 Persistent identifier (DOI via DataCite), indexed metadata record, machine-readable catalogue entry
    Accessible A1–A2 Retrieval via an open protocol (HTTPS), metadata that resolves even if the data itself is restricted
    Interoperable I1–I3 Structured, non-proprietary format; controlled vocabularies; qualified links to related records
    Reusable R1–R1.3 Machine-readable licence, documented provenance, alignment with a domain metadata standard

    A dataset that carries a DOI and open licence but lacks controlled vocabulary terms will score high on Findable and Reusable, and low on Interoperable — the point of indicator-level scoring is precisely to surface that kind of uneven profile, which a single pass/fail “is it FAIR?” verdict would hide.

    Manual vs automated assessment: which tool fits?

    Two complementary assessment routes exist. Automated tools are fast but only test what a machine can verify; manual review is slower but catches the indicators that require human judgement, such as whether a licence is genuinely clear or a vocabulary is genuinely domain-appropriate.

    Tool / method Coverage of the 41 indicators Output Best suited to
    F-UJI (FAIRsFAIR project) Machine-testable subset only — roughly 17 metrics derived from the RDA indicators Automated percentage score per FAIR letter, run against a DOI Bulk baseline scans across a whole repository
    FAIR-Aware (DANS) Self-assessment questionnaire, not indicator-scored Qualitative readiness report and recommendations Researchers preparing a dataset before deposit
    Manual RDA specification review All 41 indicators, including human-judgement ones Full indicator-by-indicator score with evidence notes Institutional audits and remediation planning

    A hybrid approach is the most defensible for an institution-wide programme: run an automated scan across every repository record for a fast baseline, then reserve manual review for the essential-tier indicators no tool can verify — licence clarity, provenance completeness and domain-standard alignment.

    A step-by-step scoring walkthrough

    The following sequence turns the RDA model from a reference document into a repeatable institutional process.

    1. Select a representative sample. Pull datasets across disciplines, repository platforms, and funder mandates — a sample skewed toward one department will misstate institutional maturity.
    2. Map each dataset’s DOI or identifier record and run an automated F-UJI scan for the machine-testable indicators before any manual work begins.
    3. Score the remaining essential-tier indicators manually, checking licence text, metadata schema, and vocabulary choice against the evidence guidance in the RDA specification.
    4. Weight and aggregate. Multiply each indicator score by its priority weight, sum within each FAIR letter, then average across the sample to produce a repository-level maturity profile.
    5. Report by weakest letter, not overall average. An institution scoring 3.6/4 on Findable but 1.2/4 on Interoperable needs a vocabulary-adoption project, not a generic “improve FAIR compliance” action item.

    Worked example — three datasets from the same institutional repository, scored on the 0–4 scale before weighting:

    Dataset Findable Accessible Interoperable Reusable
    Clinical trial dataset (restricted access) 4 3 2 3
    Environmental sensor archive 3 4 3 2
    Survey microdata (open) 2 4 1 4

    This profile — strong on Accessible, weak on Interoperable across all three — is a genuinely institution-specific finding a generic FAIR explainer cannot give you; only a scored audit surfaces it, and it points to a single fix (adopting a shared controlled vocabulary at ingest) rather than four separate ones.

    Common questions about FAIR data scoring

    What are FAIR principles for data?

    FAIR principles are four criteria — Findable, Accessible, Interoperable and Reusable — first published in a 2016 Scientific Data paper by Wilkinson et al. They require datasets to carry a persistent identifier, standardised retrieval protocols, shared vocabularies and machine-readable licensing, so both humans and software can locate and reuse research data reliably.

    What are the four pillars of the FAIR data principles?

    The four pillars are Findable (unique persistent identifiers and rich metadata), Accessible (standardised, open retrieval protocols), Interoperable (shared vocabularies and qualified references) and Reusable (clear licensing, provenance and community standards). The RDA FAIR Data Maturity Model breaks these four pillars into 41 individually testable indicators.

    What are the FAIR data principles of UKRI?

    UKRI does not publish a separate FAIR standard. Its research councils, including NERC’s Environmental Data Service, require grant-funded datasets to follow the same GO FAIR-published Findable, Accessible, Interoperable and Reusable principles, citing benefits including increased citation, stronger research integrity, and compliance with data management plan commitments.

    What are the FAIR principles of GDPR?

    FAIR and GDPR address different concerns and are not in conflict. FAIR governs discoverability and reuse of metadata, while GDPR governs lawful processing of personal data. A dataset containing personal information can be fully FAIR — richly described and findable — while access to the underlying records stays restricted under GDPR-compliant authorisation.

    What this means for research data offices

    A scored FAIR audit gives research offices something a qualitative checklist cannot: a repository-level baseline that can be re-measured after each remediation cycle. Institutions preparing data management plan compliance evidence for UKRI, Horizon Europe, or cOAlition S-aligned funders can cite the same indicator scores as their supporting evidence, rather than producing a fresh narrative justification each time.

    Scoring also clarifies where FAIR and openness diverge. Following the “as open as possible, as closed as necessary” principle, a dataset can score highly on all four FAIR letters while remaining access-controlled — the metadata is open and richly described even when the underlying records are not. Institutions handling Indigenous or community-originated data should additionally weigh the CARE Principles — Collective Benefit, Authority to Control, Responsibility and Ethics — published by the Global Indigenous Data Alliance, which govern who controls reuse decisions rather than how discoverable the data is.

    The practical next step after a first scoring pass is not a single “get to 100%” target — no dataset needs every useful-tier indicator satisfied — but a prioritised backlog built from essential-tier gaps, feeding directly into repository ingest workflows and metadata templates so the next deposit scores higher without a second audit.