A FAIR dataset is one that meets the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016 — but a funder mandate requiring deposit and a data management plan does not, on its own, guarantee this. Genuine FAIR compliance demands rich metadata, persistent identifiers and community-standard formats that most minimally compliant deposits skip entirely, because current incentive structures reward the act of depositing, not the work of curating.
A FAIR dataset is a digital research object — data or its metadata — that satisfies the Findable, Accessible, Interoperable and Reusable principles first formalised by the FORCE11 community and published in Scientific Data in March 2016. The principles were designed to be applied in degrees, not as a pass/fail gate, which is precisely where funder policy and researcher practice have diverged.
- What does a FAIR dataset actually require?
- Why do funder mandates default to minimal compliance?
- The maturity gap: from “FAIR start” to genuine reusability
- Rebuilding incentives for genuine data stewardship
- Frequently asked questions
- Implications and outlook
What does a FAIR dataset actually require?
The FAIR principles set out four categories of requirement, each broken into specific sub-criteria. They are deliberately conceptual rather than prescriptive, which is a strength for cross-disciplinary adoption and a weakness for enforcement.
- Findable — data and metadata carry a globally unique, persistent identifier and are indexed in a searchable resource.
- Accessible — retrieval uses a standardised, open protocol, with metadata remaining accessible even when the underlying data cannot be.
- Interoperable — data and metadata use a shared, formal language and vocabularies that follow FAIR principles themselves.
- Reusable — data carry a clear licence, detailed provenance, and conform to domain-relevant community standards.
The Research Data Alliance’s FAIR Data Maturity Model, published in 2020, decomposes these four principles into 41 discrete indicators covering both data and metadata. That granularity matters: a dataset can satisfy some indicators and fail most others while still being described, loosely, as “FAIR.” A funder checking only for repository deposit is verifying perhaps one or two of the 41.
Why do funder mandates default to minimal compliance?
Funder FAIR requirements typically operationalise as two things: a submitted data management plan and a deposit in a recognised repository at the end of the project. Neither step audits metadata richness, vocabulary use, or licensing clarity. The result is a policy that is easy to comply with and easy to satisfy without producing a dataset anyone outside the original team could actually reuse.
Three structural gaps explain why:
- Resourcing. Science Europe’s funders’ briefing on data management planning recommends that compliant curation cost roughly 5% of total research budget — a figure rarely built into grant awards, leaving curation as unfunded overhead.
- Recognition. Data curation is not weighted in hiring, promotion or tenure decisions in most institutions, so time spent enriching metadata competes directly with time spent on publications that do count.
- Standards gaps. Many disciplines still lack the domain-relevant community vocabularies that Interoperability and Reusability depend on, so even willing depositors have nothing FAIR-compliant to conform to.
Horizon Europe requires that all data produced under the programme be FAIR “by default,” which is the strongest funder-level statement of intent currently in force. Yet the European Commission’s own guidance materials acknowledge that FAIRness is a spectrum, not a binary condition — an admission that sits uneasily alongside a compliance model built around a single deposit checkpoint.
The maturity gap: from “FAIR start” to genuine reusability
The European Commission’s Joint Research Centre published FAIR Data Guidelines in 2025 that organise the RDA’s 41 indicators into five progressive maturity levels. The framework is useful precisely because it makes visible how far “minimally compliant” sits from “genuinely reusable.”
| Maturity level | What it requires |
|---|---|
| FAIR start | Published in a catalogue with mandatory metadata; data itself is not structured for machine reuse. |
| FAIR play | Links added between datasets and related resources, with enriched provenance and cross-referencing. |
| FAIR go | Data structured to community standards, with defined terminologies (not necessarily machine-readable). |
| FAIR share | Machine-readable data models (JSON Schema, XML Schema, SHACL) with richly documented provenance. |
| FAIRest of them all | Machine-readable model endorsed by the domain community; terms exposed via shared FAIR vocabularies. |
Most mandate-driven deposits land at “FAIR start” — indexed, licensed, discoverable, but not structured for genuine machine or cross-team reuse. The JRC guidelines are explicit that not every dataset needs the top tier, but they are equally explicit that FAIRness can degrade over time if metadata and platforms are not actively maintained. A one-off deposit satisfying a funder’s closeout requirement is not maintenance; it is a snapshot.
Rebuilding incentives for genuine data stewardship
Treating FAIR as a compliance checkbox is a governance failure, not a researcher failure. Three changes would shift the incentive structure toward genuine stewardship rather than deposit-and-forget behaviour.
- Credit the labour. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. “Data curation” is one of its fourteen roles, offering institutions an existing, citable mechanism to formally recognise stewardship work in author contribution statements — a mechanism that remains inconsistently applied in promotion and tenure review.
- Fund it explicitly. Grant budgets should ring-fence curation costs at the level Science Europe’s own guidance recommends, rather than treating data management plans as an unfunded compliance document.
- Audit maturity, not deposit. Funders and institutions should reference maturity models such as the RDA’s 41 indicators or the JRC’s five-level scale in closeout review, rather than accepting repository deposit as sufficient evidence of FAIR compliance.
FAIR is also not a complete governance answer on its own. The CARE Principles for Indigenous Data Governance, released by the Global Indigenous Data Alliance in 2019, extend the framework to cover collective benefit, authority to control, responsibility and ethics — dimensions that a pure findability-and-format checklist does not touch. Institutions building data policy around FAIR alone are optimising for machine reuse while leaving governance and consent questions unaddressed.
Frequently asked questions
What is a FAIR dataset?
A FAIR dataset satisfies the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016. It carries a persistent identifier, standardised access, shared vocabularies, and clear licensing and provenance — not merely a repository listing.
What does FAIR stand for with data?
FAIR stands for Findable, Accessible, Interoperable and Reusable. The acronym describes a framework for data stewardship, not a certification; the Research Data Alliance breaks it into 41 measurable indicators rather than a single pass condition.
What does FAIR stand for in data management?
In data management, FAIR describes the target state a data management plan should work toward: identifiers, rich metadata, open protocols and community-standard formats. It guides curation decisions throughout a project, not just the final deposit.
Why does FAIR data matter?
FAIR data matters because it lets both humans and machines discover, verify and reuse research outputs without contacting the original authors. Poorly curated “FAIR” deposits undermine reproducibility and waste the public investment funders intended the mandate to protect.
Implications and outlook
Funder FAIR mandates have succeeded in one respect: deposit rates have risen sharply since 2016. They have not, on current evidence, produced datasets that are reliably machine-actionable or cross-team reusable at scale. That gap will not close through stricter wording in policy documents; it requires funders to resource curation at realistic cost, institutions to credit it in career progression via mechanisms such as CRediT’s Data curation role, and disciplines to finish building the community standards that Interoperability depends on. Until those three conditions are met, “FAIR by default” will remain a policy aspiration rather than a description of the average deposited dataset.
Leave a Reply