FAIR Principles Data Maturity: Score Against RDA

Q: What are FAIR principles for data?

FAIR principles are four criteria — Findable, Accessible, Interoperable and Reusable — first published in a 2016 Scientific Data paper by Wilkinson et al. They require datasets to carry a persistent identifier, standardised retrieval protocols, shared vocabularies and machine-readable licensing, so both humans and software can locate and reuse research data reliably.

Q: What are the four pillars of the FAIR data principles?

The four pillars are Findable (unique persistent identifiers and rich metadata), Accessible (standardised, open retrieval protocols), Interoperable (shared vocabularies and qualified references) and Reusable (clear licensing, provenance and community standards). The RDA FAIR Data Maturity Model breaks these four pillars into 41 individually testable indicators.

Q: What are the FAIR principles of GDPR?

FAIR and GDPR address different concerns and are not in conflict. FAIR governs discoverability and reuse of metadata, while GDPR governs lawful processing of personal data. A dataset containing personal information can be fully FAIR — richly described and findable — while access to the underlying records stays restricted under GDPR-compliant authorisation.

FAIR data maturity is scored by testing each dataset against the 41 indicators of the Research Data Alliance’s FAIR Data Maturity Model, grading Findability, Accessibility, Interoperability and Reusability separately, then weighting results by each indicator’s priority tier — essential, important, or useful. FAIR principles data management moves from an abstract commitment to a measurable score once an institution runs this test consistently across its repository.

FAIR data is data that meets the Findable, Accessible, Interoperable and Reusable criteria first published by Wilkinson et al. in Scientific Data in 2016 — a paper now cited more than 22,000 times. This guide is a practical scoring walkthrough, not another explainer of what the four letters mean: it shows research offices how to actually audit existing datasets and repositories against the RDA model and turn the result into a remediation plan.

What is the RDA FAIR Data Maturity Model?
How does FAIR maturity scoring actually work?
Manual vs automated assessment: which tool fits?
A step-by-step scoring walkthrough
Common questions about FAIR data scoring
What this means for research data offices

What is the RDA FAIR Data Maturity Model?

The RDA FAIR Data Maturity Model is a specification published by a Research Data Alliance working group in 2020 to standardise how organisations test FAIRness. Before it existed, dozens of institutions had built incompatible local checklists, making it impossible to compare a “FAIR score” from one repository against another.

The model does not ship as software. It is a reference document that defines:

41 indicators — testable statements mapped to the fifteen core GO FAIR sub-principles (F1–F4, A1–A2, I1–I3, R1–R1.3)
Three priority tiers — essential, important and useful — so institutions can triage effort rather than treat every indicator as equally urgent
Evaluation guidance — worked examples for testing each indicator against real metadata and data objects, rather than self-reported compliance

Because the indicators trace directly to the GO FAIR principles, a dataset that scores well against the RDA model is, by construction, meeting the same criteria described in the original 2016 Scientific Data paper — just with a repeatable measurement attached.

How does FAIR maturity scoring actually work?

Scoring is done indicator by indicator, not principle by principle. Most institutions that implement the RDA model score each of the 41 indicators on a simple 0–4 scale — 0 (not implemented) through 4 (fully implemented) — then multiply by a priority weight before aggregating to a per-dataset and per-repository total.

FAIR letter	Sub-principles tested	Typical essential-tier evidence
Findable	F1–F4	Persistent identifier (DOI via DataCite), indexed metadata record, machine-readable catalogue entry
Accessible	A1–A2	Retrieval via an open protocol (HTTPS), metadata that resolves even if the data itself is restricted
Interoperable	I1–I3	Structured, non-proprietary format; controlled vocabularies; qualified links to related records
Reusable	R1–R1.3	Machine-readable licence, documented provenance, alignment with a domain metadata standard

A dataset that carries a DOI and open licence but lacks controlled vocabulary terms will score high on Findable and Reusable, and low on Interoperable — the point of indicator-level scoring is precisely to surface that kind of uneven profile, which a single pass/fail “is it FAIR?” verdict would hide.

Manual vs automated assessment: which tool fits?

Two complementary assessment routes exist. Automated tools are fast but only test what a machine can verify; manual review is slower but catches the indicators that require human judgement, such as whether a licence is genuinely clear or a vocabulary is genuinely domain-appropriate.

Tool / method	Coverage of the 41 indicators	Output	Best suited to
F-UJI (FAIRsFAIR project)	Machine-testable subset only — roughly 17 metrics derived from the RDA indicators	Automated percentage score per FAIR letter, run against a DOI	Bulk baseline scans across a whole repository
FAIR-Aware (DANS)	Self-assessment questionnaire, not indicator-scored	Qualitative readiness report and recommendations	Researchers preparing a dataset before deposit
Manual RDA specification review	All 41 indicators, including human-judgement ones	Full indicator-by-indicator score with evidence notes	Institutional audits and remediation planning

A hybrid approach is the most defensible for an institution-wide programme: run an automated scan across every repository record for a fast baseline, then reserve manual review for the essential-tier indicators no tool can verify — licence clarity, provenance completeness and domain-standard alignment.

A step-by-step scoring walkthrough

The following sequence turns the RDA model from a reference document into a repeatable institutional process.

Select a representative sample. Pull datasets across disciplines, repository platforms, and funder mandates — a sample skewed toward one department will misstate institutional maturity.
Map each dataset’s DOI or identifier record and run an automated F-UJI scan for the machine-testable indicators before any manual work begins.
Score the remaining essential-tier indicators manually, checking licence text, metadata schema, and vocabulary choice against the evidence guidance in the RDA specification.
Weight and aggregate. Multiply each indicator score by its priority weight, sum within each FAIR letter, then average across the sample to produce a repository-level maturity profile.
Report by weakest letter, not overall average. An institution scoring 3.6/4 on Findable but 1.2/4 on Interoperable needs a vocabulary-adoption project, not a generic “improve FAIR compliance” action item.

Worked example — three datasets from the same institutional repository, scored on the 0–4 scale before weighting:

Dataset	Findable	Accessible	Interoperable	Reusable
Clinical trial dataset (restricted access)	4	3	2	3
Environmental sensor archive	3	4	3	2
Survey microdata (open)	2	4	1	4

This profile — strong on Accessible, weak on Interoperable across all three — is a genuinely institution-specific finding a generic FAIR explainer cannot give you; only a scored audit surfaces it, and it points to a single fix (adopting a shared controlled vocabulary at ingest) rather than four separate ones.

Common questions about FAIR data scoring

What are FAIR principles for data?

FAIR principles are four criteria — Findable, Accessible, Interoperable and Reusable — first published in a 2016 Scientific Data paper by Wilkinson et al. They require datasets to carry a persistent identifier, standardised retrieval protocols, shared vocabularies and machine-readable licensing, so both humans and software can locate and reuse research data reliably.

What are the four pillars of the FAIR data principles?

The four pillars are Findable (unique persistent identifiers and rich metadata), Accessible (standardised, open retrieval protocols), Interoperable (shared vocabularies and qualified references) and Reusable (clear licensing, provenance and community standards). The RDA FAIR Data Maturity Model breaks these four pillars into 41 individually testable indicators.

What are the FAIR data principles of UKRI?

UKRI does not publish a separate FAIR standard. Its research councils, including NERC’s Environmental Data Service, require grant-funded datasets to follow the same GO FAIR-published Findable, Accessible, Interoperable and Reusable principles, citing benefits including increased citation, stronger research integrity, and compliance with data management plan commitments.

FAIR and GDPR address different concerns and are not in conflict. FAIR governs discoverability and reuse of metadata, while GDPR governs lawful processing of personal data. A dataset containing personal information can be fully FAIR — richly described and findable — while access to the underlying records stays restricted under GDPR-compliant authorisation.

What this means for research data offices

A scored FAIR audit gives research offices something a qualitative checklist cannot: a repository-level baseline that can be re-measured after each remediation cycle. Institutions preparing data management plan compliance evidence for UKRI, Horizon Europe, or cOAlition S-aligned funders can cite the same indicator scores as their supporting evidence, rather than producing a fresh narrative justification each time.

Scoring also clarifies where FAIR and openness diverge. Following the “as open as possible, as closed as necessary” principle, a dataset can score highly on all four FAIR letters while remaining access-controlled — the metadata is open and richly described even when the underlying records are not. Institutions handling Indigenous or community-originated data should additionally weigh the CARE Principles — Collective Benefit, Authority to Control, Responsibility and Ethics — published by the Global Indigenous Data Alliance, which govern who controls reuse decisions rather than how discoverable the data is.

The practical next step after a first scoring pass is not a single “get to 100%” target — no dataset needs every useful-tier indicator satisfied — but a prioritised backlog built from essential-tier gaps, feeding directly into repository ingest workflows and metadata templates so the next deposit scores higher without a second audit.