Tag: go fair initiative

  • F-UJI FAIR Evaluator: What It Actually Scores

    The F-UJI FAIR evaluator is an automated web service that checks whether a dataset’s metadata — not its actual data quality — satisfies a fixed set of machine-readable tests built from the FAIRsFAIR Data Object Assessment Metrics. A high F-UJI percentage means a dataset’s landing page, identifiers and schema exposed enough structured signals for a script to find and parse; it does not certify that a human researcher can actually understand, trust or reuse the data inside.

    F-UJI is one of several tools now used to operationalise the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable), alongside FAIRshake, FAIR-Checker, FAIR Aware and the FAIR Data Point specification promoted by the GO FAIR Initiative. This article explains what each type of tool actually scores, where automated scoring diverges from manual FAIR maturity review, and why institutions and research data repositories should treat a high machine score as a floor, not a finish line.

    What is the F-UJI FAIR evaluator?

    F-UJI (FAIRsFAIR Research Data Object Assessment Service) is a web service and REST API that assesses a research data object against 16 core FAIR metrics. A user submits a persistent identifier — typically a DOI — and F-UJI queries external infrastructure including the DataCite API, re3data, schema.org JSON-LD embedded on the landing page, and DCAT or Dublin Core fields to determine whether each metric passes.

    The metrics were developed under the EU Horizon 2020 FAIRsFAIR project (2019–2022) and are now maintained and versioned by its successor, the FAIR-IMPACT project, with the metric set published as a citable release (DOI 10.5281/zenodo.15045911). F-UJI’s source code is maintained on GitHub by the PANGAEA data publisher, and the tool is offered as a free public assessment service and API.

    How F-UJI’s automated scoring actually works

    F-UJI does not read the dataset’s content. It inspects the metadata surrounding the dataset — the landing page markup, the identifier’s resolution behaviour, declared licences, and machine-readable provenance fields — and scores each of the 16 metrics as pass, partial or fail. The overall percentage is a weighted sum across the Findable, Accessible, Interoperable and Reusable metric groups.

    • Findable metrics check for a persistent identifier, whether the metadata is indexable by search engines, and whether the identifier resolves to rich metadata.
    • Accessible metrics check that metadata remains retrievable even if the data itself becomes unavailable, and that access protocols are standard.
    • Interoperable metrics check for structured vocabularies declared in a JSON-LD @context (schema.org, DCAT, PROV-O) and for qualified references to related resources.
    • Reusable metrics check for a machine-readable licence, provenance statements, and a community-recognised file format for the data’s actual distribution.

    A documented example from the FAIR Data Innovations Hub illustrates how mechanical this scoring is in practice: a dataset scored 67% on its first F-UJI run, with the Findable, Interoperable and Reusable metrics flagged for missing JSON-LD context, missing PROV-O provenance fields and an undeclared distribution format. After the maintainers added a single enriched schema.org/PROV-O JSON-LD block to the landing page — without changing the underlying data at all — the same dataset scored 100% on re-assessment. The data did not become more reusable in that interval; its metadata simply became more machine-legible.

    F-UJI vs FAIRshake vs manual maturity frameworks

    F-UJI is not the only FAIR assessment approach in circulation, and the three main categories differ in what they actually test and who defines “FAIR” for the purpose of the test.

    Dimension F-UJI FAIRshake Manual maturity review
    Method Fully automated, no human input Hybrid — automated tests plus human-scored rubrics Fully manual, questionnaire/checklist-based
    Basis of criteria Fixed FAIRsFAIR/FAIR-IMPACT metric set Community-defined rubrics per research domain Institution- or project-specific checklist
    Input required A persistent identifier (e.g. DOI) A URL, via web interface or browser extension The dataset, documentation and reviewer time
    Output Percentage score per metric and overall Nine-square “FAIR insignia” visualisation Narrative report with recommendations
    Scalability High — suited to bulk repository audits Moderate Low — resource-intensive
    Contextual nuance Low — rigid, rule-based Moderate — rubrics can be domain-tailored High — accounts for discipline-specific reuse

    FAIRshake was originally developed by the Ma’ayan Laboratory at the Icahn School of Medicine at Mount Sinai under the US National Institutes of Health’s Big Data to Knowledge (BD2K) programme. Rather than one fixed metric set, it lets research communities author their own rubrics and score resources — manually, automatically, or both — against them, then renders the result as a colour-coded insignia rather than a single number.

    The GO FAIR Initiative takes a different, upstream approach: instead of scoring existing datasets after the fact, it promotes the FAIR Data Point (FDP) specification — a layered REST API (FAIR Data Point → Catalog → Dataset → Distribution) that a research data repository implements so that FAIRness is built into how metadata is served, rather than retrofitted and then measured.

    What a high FAIR score does not prove

    A 100% F-UJI score is a statement about metadata exposure, not about data quality, ethical provenance, statistical validity, or whether another researcher can actually rerun the analysis. This distinction matters because automated tools are increasingly cited in funder and repository policy discussions as if they were a proxy for genuine reusability.

    • A perfectly scored dataset can still contain undocumented preprocessing steps, missing sample metadata, or errors that no metadata check can catch.
    • F-UJI cannot verify that a licence field is legally accurate — only that a machine-readable licence field exists.
    • None of F-UJI, FAIRshake or FAIR Aware assess whether the underlying research methodology or data collection itself was sound; that remains a peer-review and domain-expert function.
    • Scores are not comparable across tools: a dataset scoring 67% on F-UJI is not equivalent to 67% “FAIR” on any absolute scale, since each tool’s metric weighting differs.

    A ScienceDirect study (Devaraju et al., 2021, cited more than 90 times) frames this precisely, describing F-UJI-based measurement as “centred on core metrics” that apply until domain- or community-specific FAIR criteria are agreed — an explicit acknowledgement that the automated baseline is deliberately generic, not a final word on reusability.

    Common questions about automated FAIR scoring

    What does F-UJI actually measure?

    F-UJI measures whether a dataset’s metadata — its identifier, landing-page markup, licence declaration and provenance fields — meets 16 machine-testable criteria drawn from the FAIRsFAIR/FAIR-IMPACT metric set. It does not inspect or validate the dataset’s actual content, methodology or scientific accuracy.

    Is a high F-UJI score the same as genuinely FAIR data?

    No. A high score confirms that metadata is machine-readable and complete according to a fixed rule set. Genuine reusability additionally depends on documentation quality, data integrity and domain-specific context that automated tools are structurally unable to evaluate.

    How does FAIRshake differ from F-UJI?

    FAIRshake combines automated tests with human-scored, community-defined rubrics, whereas F-UJI applies one fixed metric set with no human input. FAIRshake reports results as a visual “FAIR insignia” rather than F-UJI’s single percentage score.

    Do funders formally require automated FAIR scores?

    No major funder currently mandates a specific F-UJI or FAIRshake score as a compliance threshold. Funder and institutional policies (for example under Horizon Europe and UKRI) reference the FAIR Data Principles as a qualitative expectation, with automated tools used voluntarily to self-check progress.

    Implications for repositories and funders

    For research data repositories, the practical use of F-UJI is diagnostic, not evaluative: it flags specific, fixable metadata gaps — a missing JSON-LD block, an undeclared licence field, an absent provenance statement — far faster than a manual audit could. Repositories improving their F-UJI scores should treat each metric failure as a discrete engineering task, not as a proxy for a broader data-quality programme.

    For institutions and funders assessing compliance, the more defensible approach combines automated metadata scoring as a first-pass filter with a manual or community-rubric review for anything reused in decision-relevant research. Relying on one automated percentage to certify “FAIR” data risks the same error as equating a spellchecker’s clean pass with a well-argued essay: necessary, not sufficient.

    As the GO FAIR Initiative’s FAIR Data Point specification gains adoption, the balance may shift from retrospective scoring toward FAIRness built into repository infrastructure from the point of deposit — making after-the-fact tools like F-UJI a verification step rather than the primary mechanism for achieving reusable research data.