Tag: research data repository

FAIR Data Point: Making Data Machine-Actionable

A FAIR data point is a lightweight metadata server that exposes structured, standardised descriptions of a dataset — its identifier, creator, licence and access route — through a REST API, so software (not just people) can discover and assess it automatically. Under the GO FAIR implementation network, FAIR Data Points are the working infrastructure that turns the FAIR principles from a policy statement into a queryable service.

In formal terms, a FAIR Data Point (FDP) is a metadata repository that follows the DCAT2 vocabulary and organises records in a fixed hierarchy — repository, catalogue, dataset, distribution — using Linked Data Platform containers, as set out in the peer-reviewed FDP specification (da Silva Santos et al., 2023, Data Intelligence, MIT Press).

What Is a FAIR Data Point?
How Does the GO FAIR Initiative Use FAIR Data Points?
FAIR Data Point vs Machine-Actionable DMP: What Is the Difference?
How Is FAIRness Measured? The F-UJI Evaluator
Where Does DDI Fit Into the FAIR Data Point Stack?
Frequently Asked Questions
What This Means for Data Stewards and Developers

What Is a FAIR Data Point?

A FAIR Data Point separates metadata from data. It does not host the dataset itself; it hosts a machine-readable description of the dataset, reachable at a stable HTTP endpoint. This separation is what makes the metadata queryable independently of wherever the underlying files are actually stored.

The reference specification defines four nested layers, each exposed as its own resource:

Repository — the top-level FDP instance, describing the organisation or project running it
Catalogue — a themed grouping of related datasets
Dataset — the described research object, with identifier, creator, licence and rights statement
Distribution — the concrete access point (a download URL, an API, a query service)

Every layer is exposed via a REST API and encoded as RDF using the DCAT2 vocabulary, which is why an FDP can be crawled and indexed by external harvesters without bespoke integration work per institution.

How Does the GO FAIR Initiative Use FAIR Data Points?

GO FAIR is a grassroots, community-run implementation network for the FAIR principles, not a standards body with formal ownership of a single specification. It organises its work around three self-described pillars — GO CHANGE (policy and culture), GO TRAIN (skills) and GO BUILD (technical infrastructure) — coordinated through the GO FAIR Foundation.

FAIR Data Points sit inside the GO BUILD pillar. GO FAIR pairs FDPs with FAIR Implementation Profiles (FIPs): a documented set of choices a specific research community makes about identifiers, vocabularies, access protocols and licensing terms. The FIP tells an FDP deployment which controlled vocabularies to use at the dataset and distribution layers, so that metadata from two unrelated institutions in the same domain remains interoperable rather than merely similar.

The combined goal is what GO FAIR calls the “Internet of FAIR Data & Services” — a distributed network of FDPs that automated agents can traverse to locate relevant data without a central index. A working example already in production is the European Joint Programme on Rare Diseases (EJP RD) Virtual Platform, whose index runs on a federated network of FDPs contributed by member registries across Europe, funded through the EU Horizon research programme.

FAIR Data Point vs Machine-Actionable DMP: What Is the Difference?

The two are frequently conflated because both are described as “machine-actionable,” but they describe different objects at different points in the research lifecycle. A machine-actionable Data Management Plan (maDMP) — built on the Research Data Alliance’s DMP Common Standard and served by tools such as DMPTool or DMPonline — describes intentions: what data a project will produce, where it will deposit it and under what licence. An FDP describes an already-deployed dataset that a machine can query right now.

Aspect	FAIR Data Point	Machine-Actionable DMP
Lifecycle stage	Post-deposit, dataset already exists	Pre-project, data not yet produced
Governing spec	GO FAIR / FDP specification (DCAT2, LDP)	RDA DMP Common Standard
Query interface	REST API over a live metadata service	JSON export or plan-management tool API
Granularity	Per dataset / per distribution	Per project or funding award
Typical operator	Data repository or institutional archive	Institution, funder, or research office

Confusing the two leads institutions to procure the wrong tool: an maDMP platform will not make a finished dataset crawlable, and an FDP deployment will not help a project plan its future data management obligations.

How Is FAIRness Measured? The F-UJI Evaluator

F-UJI is an automated FAIR assessment tool developed under the Horizon 2020 FAIRsFAIR project. It scores a dataset’s exposed metadata — including metadata served by an FDP — against a fixed set of maturity indicators grouped under the four FAIR facets, returning a numeric FAIRness score rather than a binary pass/fail.

F-UJI can only evaluate what is machine-visible: it checks whether a licence, persistent identifier or access protocol is declared in the metadata, not whether the underlying data file is actually reusable in practice. This is precisely why the metadata layer an FDP provides matters — a well-structured FDP deployment is what allows a tool like F-UJI to detect FAIRness signals automatically, while a plain data-download page with no structured metadata will score poorly regardless of how well-organised the actual dataset is.

Where Does DDI Fit Into the FAIR Data Point Stack?

The Data Documentation Initiative (DDI) is an XML/RDF metadata standard maintained by the DDI Alliance for describing social, behavioural and economic science data at the variable level — survey questions, coding frames, sampling design. DCAT2, the vocabulary an FDP uses by default, describes a dataset at the catalogue-entry level; it was never designed to capture variable-level detail.

A research community whose FAIR Implementation Profile specifies DDI alongside DCAT2 gets both: FDP-level crawlability for discovery, and DDI-level granularity for reuse. Social-science archives affiliated with the Consortium of European Social Science Data Archives (CESSDA) and the UK Data Service already publish DDI metadata; wiring that metadata into an FDP endpoint is a genuine interoperability gain rather than duplicated effort.

Frequently Asked Questions

What is a FAIR data point?

A FAIR Data Point is a metadata repository that exposes a dataset’s identifier, licence, creator and access route through a REST API, structured according to the DCAT2 vocabulary. It publishes metadata about data, not the data itself, so automated tools can discover and evaluate the dataset without human involvement.

What does FAIR data mean?

FAIR data meets the 2016 principles of Findability, Accessibility, Interoperability and Reusability, first formally published by Wilkinson et al. in Scientific Data. The principles apply to metadata as much as to the underlying files, which is why machine-readable metadata infrastructure, such as an FDP, is required to satisfy them at scale.

What are the four pillars of the FAIR data principles?

The four pillars are Findable (a persistent identifier and rich metadata exist), Accessible (metadata is retrievable via an open protocol, even if the data itself is restricted), Interoperable (metadata uses a shared, formal vocabulary such as DCAT2), and Reusable (a clear licence and provenance are attached).

What This Means for Data Stewards and Developers

Deploying a FAIR Data Point is an infrastructure decision, not a documentation exercise. In practice it requires three steps: agreeing a FAIR Implementation Profile with the relevant research community, mapping local repository metadata onto DCAT2 at the dataset and distribution layers, and registering the resulting endpoint so external harvesters and tools such as F-UJI can find it.

Pair persistent dataset identifiers from DataCite with the FDP’s dataset layer so citation and discovery metadata stay consistent
Use ROR identifiers for the institutional agent fields rather than free-text organisation names
Treat the FDP as complementary to, not a replacement for, an maDMP — one documents intent, the other serves the finished product

Funders are moving in this direction: the UNESCO Recommendation on Open Science (2021) names FAIR data as a foundational pillar, and Horizon Europe grant conditions increasingly expect data to be discoverable by machines, not just listed in a repository catalogue. For institutions building research-data infrastructure now, a standards-conformant FAIR Data Point is a defensible way to demonstrate machine-actionability rather than assert it in a data management plan.

For related definitions and terminology, see the CASRAI dictionary and the research administration pillar.

July 4, 2026

Materials Data Repository: NIST’s FAIR Approach

The NIST Materials Data Repository is a US federal, open-access archive that lets materials scientists deposit, describe and reuse research data files under the Materials Genome Initiative (MGI). It matters for research data management (RDM) because materials science has lagged biomedical and social-science fields in adopting FAIR data principles, and NIST’s infrastructure — built on the open-source DSpace platform — offers a concrete, working template for what FAIR looks like in a physical-science discipline.

A materials data repository is a structured digital archive purpose-built for storing, describing and sharing datasets specific to materials science: crystal structures, mechanical-property measurements, spectroscopy files, simulation outputs and processing metadata. Unlike a general-purpose institutional repository, it is organised around domain metadata schemas that make heterogeneous, often binary, materials data searchable and machine-actionable.

What is the NIST Materials Data Repository?
How does it support FAIR data principles?
How does it compare with other materials data infrastructure?
What does this mean for RDM programmes?
Answer-first Q&A
Where materials science RDM is heading

What is the NIST Materials Data Repository?

The NIST Materials Data Repository, hosted at materialsdata.nist.gov, is a file repository maintained by the US National Institute of Standards and Technology’s Material Measurement Laboratory. It accepts data in any format and pairs each deposit with descriptive metadata — title, author, ownership and, where available, richer domain fields — specifically to counter the “opacity” of binary materials files that would otherwise be unsearchable.

NIST states the repository was created to give the research community “a concrete mechanism for the interchange and re-use of research data on materials systems,” in direct support of the Materials Genome Initiative, the 2011 US federal effort to accelerate materials discovery through better data infrastructure. Content is organised into communities and collections, which groups related datasets and improves browsability for specific research teams or projects.

Technically, the repository runs on DSpace, an open-source repository platform widely used across academic libraries, which gives it three RDM-relevant capabilities out of the box: persistent identifiers for deposited files, a web-accessible API for machine-to-machine access, and federation with other repositories. NIST has used that API to feed repository references into the Materials Data Facility and a “root and rules” search algorithm, extending the data’s reach beyond the repository’s own interface.

How does the repository support FAIR data principles?

The FAIR data principles — Findable, Accessible, Interoperable, Reusable — were formalised in 2016 in Scientific Data by Wilkinson et al. as a shared standard for making research data machine-actionable, not just human-readable. NIST’s repository operationalises each element rather than treating FAIR as an abstract aspiration.

Findable: rich, mandatory metadata plus persistent identifiers make each dataset discoverable independent of where its underlying file happens to live.
Accessible: the majority of holdings are public and retrievable through a standard web browser or the repository’s API, with limited invitation-only collections reserved for pre-publication analysis.
Interoperable: structured metadata and DSpace’s federation capability let the repository exchange records with external systems such as the Materials Data Facility, rather than functioning as an isolated silo.
Reusable: depositor-selected licensing terms and descriptive context give downstream users the information they need to judge whether a dataset is fit for reuse in new research.

This matters because FAIR compliance in materials science carries a different technical burden than it does in genomics or clinical trials data. A single alloy characterisation dataset can combine imaging files, spectroscopy outputs and tabular composition data in incompatible native formats — which is precisely the interoperability problem a domain-specific repository, rather than a generic institutional one, is built to solve.

How does it compare with other materials data infrastructure?

NIST’s repository is one node in a small but growing international ecosystem of materials-specific data infrastructure. Research administrators advising physical-science departments should understand where each fits, since “materials data repository” covers genuinely different data types — deposited raw files versus computed, simulation-derived properties.

Repository	Steward	Data type	Notable FAIR feature
NIST Materials Data Repository	NIST (US federal)	Deposited experimental/research files, any format	Persistent IDs, API, DSpace federation
MDR (DICE)	National Institute for Materials Science, Japan	Data and publications, domain-tailored metadata	Metadata schemas tuned to materials disciplines
Materials Project	Lawrence Berkeley National Laboratory	Computed structure/property data	Open API for bulk computed-data queries
NOMAD	FAIRmat / open-source community	Simulation and computational materials data	Explicitly FAIR-by-design, free and open source

UK institutions have a domestic reference point too: the Henry Royce Institute, the UK’s national institute for advanced materials research, maintains a Digital Materials Foundry that curates links to major computational materials databases for UK researchers, positioning FAIR materials data as institutional infrastructure rather than a project-by-project afterthought.

Registries such as re3data.org — the DataCite-affiliated global registry of research data repositories — independently list the NIST repository, which gives it discoverability outside its own domain and is itself a small but real Findability signal under the FAIR framework.

What does this mean for RDM programmes?

Materials science RDM guidance remains thin relative to biomedical and social-science fields, where funder mandates, data-sharing plans and repository certification (CoreTrustSeal, for example) are comparatively mature. Research administrators supporting engineering and physical-science faculties can draw three practical lessons from NIST’s model.

Domain-specific metadata schemas matter more than generic institutional-repository templates for high-heterogeneity data such as materials characterisation files.
Persistent identifiers and API access are not optional extras — they are what converts a file dump into FAIR-compliant infrastructure.
Federation with discipline hubs (the Materials Data Facility, re3data.org) extends a dataset’s reach far beyond a single institutional URL.

For research administrators building data management plans that reference physical-science outputs, pointing PIs toward an established domain repository — rather than a generic institutional one — materially improves the odds that FAIR criteria in funder compliance reviews are actually met.

Answer-first Q&A

What is the purpose of a materials data repository?

A materials data repository exists to make heterogeneous, often binary materials science data — spectroscopy, imaging, composition and mechanical-property files — searchable, citable and reusable. It solves the specific problem that raw materials files are otherwise opaque to search engines and incompatible with generic institutional repository metadata schemas.

What are examples of materials data repositories besides NIST’s?

Beyond the NIST Materials Data Repository, notable examples include Japan’s NIMS MDR (via the DICE platform), the US Materials Project for computed structure data, and NOMAD, a European open-source repository explicitly built to FAIR specifications for computational materials science.

Is it costly to deposit data in a repository like NIST’s?

NIST’s Materials Data Repository is a federally funded, open-access service with no publicly advertised deposit fee, unlike some generalist commercial repositories that charge per gigabyte above a free tier. Costs for materials-specific deposit are therefore typically absorbed by the institution’s existing RDM infrastructure rather than billed per dataset.

What is the best materials data repository for FAIR compliance?

There is no single “best” repository — the right choice depends on data type. NOMAD and the Materials Project suit computed/simulation data, while NIST’s and NIMS’ MDR suit deposited experimental datasets; all four implement the core FAIR pillars but through different metadata and access mechanisms.

Where materials science RDM is heading

Materials science FAIR infrastructure is converging on the same architecture that biomedical and social-science RDM adopted earlier: persistent identifiers, API-level machine access, domain-tuned metadata and cross-repository federation. NIST’s Materials Data Repository, updated as recently as March 2025 according to its own programme page, demonstrates that a federal physical-science agency can build FAIR-compliant infrastructure without waiting for a universal cross-discipline standard to arrive first. For research administrators, the practical task now is steering physical-science principal investigators toward these domain repositories in data management plans, rather than defaulting to generalist options that were never built for materials data’s particular complexity.

July 4, 2026

Research Data Management Policy: €10.2bn Case

A research data management policy that treats FAIR compliance as a line-item cost, rather than a reuse and reputation asset, is the wrong accounting model. PwC estimated in a 2018 study for the European Commission that the absence of FAIR (Findable, Accessible, Interoperable, Reusable) research data costs the European economy at least €10.2 billion a year, largely through duplicated data collection and wasted researcher time. That figure is the strongest evidence available that under-investment in research data management (RDM) infrastructure is a false economy, not a saving.

A research data management policy is an institutional document setting out the responsibilities of researchers and the institution for planning, storing, securing, sharing and preserving research data across its lifecycle. Most UK universities — Southampton, Birmingham, Manchester, Edinburgh and others — already publish one. The argument here is narrower and more contentious: most are drafted, funded and governed as compliance paperwork, when the evidence says they should be funded as reuse and reputation infrastructure.

Why RDM policy gets treated as a cost centre
What the evidence actually says about FAIR and avoided cost
How funder compliance requirements are changing the calculus
The case for investing in data stewardship, not just policy text
Answer-first Q&A
Implications for institutional leaders

Why RDM policy gets treated as a cost centre

Institutional budgets typically classify research data management as overhead: storage costs, repository subscriptions, a data steward’s salary, training time. Each appears as a debit with no offsetting credit line, because savings from avoided duplication and faster reuse accrue diffusely, across future researchers and grants, not to the budget holder who paid for the infrastructure.

This accounting mismatch is compounded by how the data management plan (DMP) requirement is handled in practice. Most funders now mandate one, but research offices frequently treat it as a box-ticking exercise completed at proposal stage and never revisited, rather than a live operational document. That framing under-serves the researcher, who gets no practical reuse benefit, and the institution, which under-recovers the true cost of good RDM from grants that would pay for it.

UK Research and Innovation (UKRI) explicitly states that costs associated with research data management — storage, curation, repository deposit — are eligible for recovery under its funding. Institutions treating RDM as unfunded overhead are frequently leaving recoverable grant money unclaimed rather than avoiding a cost.

What the evidence actually says about FAIR and avoided cost

The FAIR data principles were formalised in 2016 by Wilkinson et al. in Scientific Data as a guide for making digital assets Findable, Accessible, Interoperable and Reusable by both humans and machines. FAIR data is not a compliance checkbox; it is a design standard for making data usable by someone who was not present when it was collected.

The clearest attributed cost estimate comes from PwC’s 2018 cost-benefit analysis for the European Commission, which put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven by researcher time lost searching for data, recreation of data that already exists, and lost interdisciplinary reuse. A separate, frequently cited illustration is the University of Minnesota’s decades-long diet study, whose original data nearly disappeared into storage before being recovered and reanalysed — a reminder that data loss is a recurring, avoidable event when retention and documentation are afterthoughts.

Three mechanisms explain where the savings actually come from:

Avoided duplication. Findable, well-described data lets a second researcher build on an existing dataset instead of re-running a costly collection exercise.
Faster reuse cycles. Interoperable data in standard formats with persistent identifiers can be integrated into new analyses without reformatting or re-negotiating access.
Preserved institutional memory. Deposit in a certified repository protects data against the single most common loss vector: staff turnover and undocumented local storage.

None of this shows up as a saving on a university’s annual accounts, which is precisely why RDM investment is chronically under-prioritised relative to its documented return.

How funder compliance requirements are changing the calculus

Funder mandates are steadily converting FAIR data from voluntary good practice into a hard compliance gate, which changes the institutional risk calculus even for leaders unconvinced by the reuse argument. UKRI’s Common Principles on Research Data, and the underlying Concordat on Open Research Data, require a data management plan for funded research and state that data should be made openly available with as few restrictions as necessary. Horizon Europe applies comparable requirements, and cOAlition S’s Plan S pushes the same expectations into journal-level open-access policy.

A comparison of how three major funders frame the requirement illustrates the convergence:

Funder / framework	Core RDM requirement	FAIR reference
UKRI	Data management plan for funded research; RDM costs eligible for recovery	Endorses FAIR via the Concordat on Open Research Data
Horizon Europe	DMP required within six months of project start, updated across lifecycle	“As open as possible, as closed as necessary,” explicitly FAIR-aligned
cOAlition S (Plan S)	Underlying data should accompany open-access publications	References FAIR principles for supporting data

Institutions that fund RDM only to the minimum needed for a single grant’s DMP template are exposed twice: to duplicated administrative cost when infrastructure is rebuilt project by project, and to compliance risk as funders move toward auditing DMP adherence rather than merely requiring its submission.

The case for investing in data stewardship, not just policy text

A policy document alone does not create FAIR data. That requires people: a data steward function — a dedicated role, a network of disciplinary data champions, or a research data service embedded in the library — able to advise researchers on repository choice, metadata standards and licensing at the point where those decisions are actually made, not after the fact.

Institutions that fund this role tend to route researchers toward standards-based infrastructure rather than ad hoc local storage: a research data repository registered in re3data.org, ideally holding Core Trust Seal certification, with persistent identifiers (DOIs) and standard metadata attached to every deposit. This is the practical, unglamorous mechanism by which the €10.2 billion estimate above is actually avoided — not through a policy PDF, but through a person and a repository that make FAIR operational.

CASRAI’s relevance here is provenance and interoperability, not ownership. CASRAI originated the CRediT contributor role taxonomy in 2014, now stewarded by NISO as ANSI/NISO Z39.104-2022 — the same underlying argument in a different domain: standardising who-did-what reduces duplicated verification effort just as standardising data description reduces duplicated data collection. Institutions weighing their research administration infrastructure should treat RDM policy, contributor attribution and open data reuse as one reputational and efficiency system, not separate obligations.

Answer-first Q&A

What is a research data management policy?

A research data management policy is an institutional document defining responsibilities for planning, storing, securing, sharing, and archiving research data across its lifecycle. UK universities including Edinburgh and Manchester publish theirs publicly, typically requiring a data management plan at proposal stage and deposit in an approved repository after project completion.

What are the FAIR data principles?

The FAIR data principles — Findable, Accessible, Interoperable, Reusable — were published by Wilkinson et al. in 2016 in Scientific Data as guidance for making digital research assets usable by both humans and machines, through persistent identifiers, standard metadata, and clear licensing.

Do UK and EU funders require a data management plan?

Yes. UKRI requires a data management plan for funded research and treats RDM costs as eligible for recovery, while Horizon Europe requires a DMP within six months of project start under its “as open as possible, as closed as necessary” principle.

How much does poor research data management actually cost?

PwC’s 2018 analysis for the European Commission put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven primarily by duplicated data collection and researcher time lost searching for data that already exists elsewhere.

Implications for institutional leaders

The practical implication is a reframing exercise, not necessarily a large new budget line. Research offices should cost RDM infrastructure — repositories, data steward time, metadata training — against the funder-eligible recovery already available through DMP-linked grants, rather than absorbing it as unfunded overhead. Leaders reviewing their research data management policy should ask whether it funds a data steward with real authority over repository choice and metadata quality, or whether it is a document that satisfies a compliance checklist and stops there.

The evidence — a €10.2 billion EU-wide cost estimate, UKRI’s funding eligibility for RDM costs, and Horizon Europe’s escalating DMP requirements — points one direction: institutions that keep treating FAIR compliance as a cost centre are choosing to keep paying the duplication tax FAIR data was designed to eliminate.

July 4, 2026

DDI Metadata Standard: FAIR Data Checklist for Survey Archives

The DDI metadata standard (Data Documentation Initiative) is an international, XML-based specification for documenting surveys, censuses, and other social, behavioural, and economic science microdata at both the study and variable level. It is the metadata backbone that most social science data archives use to make survey data findable, accessible, interoperable, and reusable (FAIR) — turning a raw data file plus a PDF codebook into a machine-readable, citable, cataloguable research object.

DDI is not a government mandate or a funder requirement; it is a community-maintained documentation standard. The DDI Alliance, an international collaboration established in 2003, maintains the specification and its schemas. This guide explains what the standard covers, who uses it, how it maps onto the FAIR principles, and the practical steps a repository or research team needs to adopt it.

What is the DDI metadata standard?
Who maintains DDI and which archives use it?
How does DDI support the FAIR data principles?
DDI-Codebook vs DDI-Lifecycle vs DDI-CDI
A practical checklist for adopting DDI
Answer-first Q&A
What this means for research data repositories

What is the DDI metadata standard?

The Data Documentation Initiative is a metadata standard for describing the full lifecycle of a research data collection: study design, sampling, data collection, processing, variables, and access conditions. It was built specifically for social, behavioural, and economic sciences data — surveys, censuses, panel studies, and administrative microdata — rather than as a general-purpose schema.

Records are encoded in Extensible Markup Language (XML), which makes them machine-readable and harvestable. A DDI catalogue record typically documents three layers: the study description (bibliographic citation, scope, geography, time period, methodology), the data file description (format, structure, missing-data conventions, weighting), and the variable description (question text, value labels, codes). This granularity is what separates DDI from simpler discovery schemas such as Dublin Core, which describe a resource but not its internal variable structure.

Who maintains DDI and which archives use it?

The DDI Alliance, an international collaboration of research institutions, statistical agencies, and data archives established in 2003, develops and maintains the specification. DDI is listed as a recognised research-data metadata standard in the Research Data Alliance Metadata Standards Catalog (entry m13), which documents its scope, schemas, and adoption.

According to the UK Data Service, DDI “is used by most social science data archives in the world” to structure catalogue records, and it forms the basis of the discovery metadata behind its own collection. The Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan and the members of CESSDA, the Consortium of European Social Science Data Archives, likewise build their cataloguing infrastructure on DDI, harvesting records via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) so aggregators can index them without direct database access.

How does DDI support the FAIR data principles?

The FAIR Guiding Principles — findable, accessible, interoperable, reusable — were formalised for the research community in 2016. DDI operationalises each principle for survey and social science data specifically, rather than leaving them as abstract goals.

Findable: structured study-level metadata (title, creators, keywords, abstract, coverage) makes records indexable by catalogues and search engines, and DDI records are commonly assigned persistent identifiers, including DOIs registered through DataCite.
Accessible: standardised access-condition fields tell a would-be reuser exactly how to request or download the data, and harvesting via OAI-PMH gives repositories a predictable retrieval protocol.
Interoperable: a shared XML vocabulary and controlled thesauri — the European Language Social Science Thesaurus (ELSST), maintained by CESSDA, is one widely used example — let metadata move between archives and languages without semantic drift.
Reusable: variable-level documentation (question wording, value labels, derivation logic) and provenance information are what actually let a second researcher re-run or extend an analysis, which is the point FAIR exists to serve.

DDI-Codebook vs DDI-Lifecycle vs DDI-CDI: which do you need?

DDI is not a single schema. Three variants serve different documentation depths, and choosing the wrong one is the most common early adoption mistake.

Variant	Best for	Documents	Status
DDI-Codebook (DDI-C)	A single finished dataset	Study, file, and variable description for one deposit	Simpler, widely used legacy format
DDI-Lifecycle (DDI-L)	Longitudinal or multi-wave studies	The full research lifecycle: concept, instrument, collection, processing, archiving, reuse	Comprehensive, versioned in the 3.x series
DDI-CDI (Cross-Domain Integration)	Integrating structured data across statistical and research domains	Model-driven descriptions that link datasets, variables, and classifications across systems	Developed jointly by the DDI Alliance and the SDMX community

A single-wave survey deposited once needs only DDI-Codebook. A cohort study revisited over years — the kind of resource the UK Data Service and ICPSR both hold in volume — needs DDI-Lifecycle to capture instrument changes between waves. DDI-CDI is aimed at repositories that need to align microdata with aggregate statistics (for example, linking a survey to official statistics published under SDMX), which is an emerging rather than default requirement.

A practical checklist for adopting DDI

Repositories and research teams introducing DDI documentation for the first time should work through these steps in order:

Identify your lifecycle stage. A one-off dataset needs DDI-Codebook; a repeated or panel study needs DDI-Lifecycle.
Model metadata before ingest, not after. Capture study description, sampling, collection dates, and variable labels/codes at deposit time using a structured deposit form, as the UK Data Service does, rather than reverse-engineering them from a finished file.
Use a DDI-aware authoring tool (for example Colectica or Nesstar-derived CESSDA tooling) instead of hand-writing XML, which is error-prone at scale.
Register a persistent identifier. Crosswalk core fields to the DataCite metadata schema so the dataset gets a citable DOI alongside its DDI record.
Adopt a controlled vocabulary such as ELSST for subject keywords to keep records interoperable across languages and archives.
Enable OAI-PMH harvesting so catalogue aggregators and search services can index the record without bespoke integration work.
Validate against peer practice — check the record structure against the RDA Metadata Standards Catalog entry and against comparable ICPSR or CESSDA holdings before publishing.

Answer-first Q&A

What is the metadata standard DDI?

DDI (Data Documentation Initiative) is an international metadata standard for documenting socioeconomic surveys, censuses, and microdata. It is maintained by the DDI Alliance, encoded in XML, and used by most social science data archives worldwide to capture study, file, and variable-level documentation in one structured record.

What is the best metadata standard for survey data?

For general resource discovery, Dublin Core (ISO 15836) is the simplest and most widely implemented option. For social science survey and microdata specifically, DDI is the domain standard, because it documents variables and methodology in a depth Dublin Core does not attempt.

How does DDI support the FAIR data principles?

DDI supports FAIR by pairing structured, machine-readable metadata with persistent identifiers for findability, standardised access fields for accessibility, a shared XML vocabulary and thesauri for interoperability, and variable-level provenance for reusability — the depth needed to re-run a secondary analysis.

What is the difference between DDI-Codebook and DDI-Lifecycle?

DDI-Codebook documents a single finished dataset. DDI-Lifecycle documents the entire research process — instrument design, fieldwork, processing, and archiving — across multiple waves, making it the correct choice for longitudinal and panel studies rather than one-off deposits.

What this means for research data repositories

Funder and journal data-sharing policies increasingly ask for FAIR-compliant deposits, but “FAIR” is a set of principles, not a file format. DDI is one of the few domain standards that translates those principles into a concrete, testable schema for survey and social science data — which is why it underpins the cataloguing infrastructure at the UK Data Service, ICPSR, and CESSDA member archives rather than being a niche archival choice.

Institutions building or upgrading a research data repository for social science holdings should treat DDI-Lifecycle adoption, ELSST keywording, and DataCite DOI registration as a single connected workflow rather than three separate projects. Repositories that skip variable-level documentation still get a catalogue entry, but they do not get reuse — and reuse, not deposit, is the actual measure of FAIR success. Institutional research administration and data management guidance should reference DDI explicitly wherever survey or microdata deposit is in scope.

July 4, 2026

Australian Research Data Commons: FAIR Model

The Australian Research Data Commons (ARDC) is Australia’s national research data infrastructure body: formed in 2018 by merging three earlier programmes, it gives researchers shared, FAIR-aligned access to data discovery, compute, and identifier services so individual universities do not have to build this capability alone.

The ARDC is a public company limited by guarantee that operates Australia’s national research data commons, formed on 1 July 2018 from the merger of the Australian National Data Service (ANDS), Nectar, and Research Data Services (RDS). For research administrators and institutional leaders comparing centralised national investment against distributed, institution-by-institution research data management (RDM), the ARDC is the clearest working example of the centralised model operating at national scale.

What is the Australian Research Data Commons?
How is the ARDC funded and governed?
What infrastructure does the ARDC actually operate?
Centralised vs distributed: what does the ARDC model mean for institutions?
Answer-first questions on the ARDC
What this means for institutions and funders
Outlook

What is the Australian Research Data Commons?

The Australian Research Data Commons consolidates three predecessor national programmes into a single body responsible for research data infrastructure across all disciplines. Before 2018, the Australian National Data Service (ANDS, established 2008), Nectar (established 2009), and Research Data Services (RDS) each managed a separate piece of the national e-research landscape: discovery, compute, and storage respectively.

Merging them removed the seams between discovery, storage, and compute that researchers previously had to navigate across three separately governed programmes. The ARDC’s stated aim, per its own site, is to enable Australian researchers and industry to access “nationally significant” digital research infrastructure, skills, and data collections rather than each institution replicating this from scratch.

How is the ARDC funded and governed?

The ARDC is funded primarily through the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS), the same mechanism that underwrote its predecessor programmes. ANDS was originally funded via a 2008 agreement between the (then) Department of Innovation, Industry, Science and Research and Monash University, with further funding arriving through the Education Investment Fund under the government’s Super Science Initiative.

Governance sits with a board overseeing a public company limited by guarantee, headquartered in Melbourne with staff across Canberra, Adelaide, Perth, Ballarat, Brisbane, and Sydney. This is a materially different governance shape from a distributed RDM model, where each university’s research office, library, and IT division independently funds and governs its own data services against the institution’s own budget cycle.

What infrastructure does the ARDC actually operate?

The ARDC’s core, user-facing service is Research Data Australia, a discovery portal giving access to metadata records from over 100 Australian research organisations, cultural institutions, and government agencies. It also runs the Nectar Research Cloud, a shared national compute facility, and coordinates three Thematic Research Data Commons that target long-term, discipline-specific infrastructure needs, including health and medical research and the humanities, arts, social sciences and Indigenous research (HASS) domain.

Beyond discovery and compute, the ARDC’s remit extends to standards and skills work that a single institution would struggle to justify funding alone:

Coordinating Australia’s national persistent identifier (PID) strategy, encouraging consistent use of identifiers for people, organisations, and datasets
Publishing FAIR data guides and running structured training such as “FAIR Data 101”
Requiring FAIR-aligned practice from its own co-investment projects as a condition of funding
Operating the Nectar Research Cloud (roughly 50,000 compute cores serving around 20,000 users, per historical ARDC/Nectar reporting) alongside virtual laboratories for specific research communities

Centralised vs distributed: what does the ARDC model mean for institutions?

A centralised national commons like the ARDC amortises the cost of discovery infrastructure, identifier strategy, and large-scale compute across an entire research system rather than each institution paying separately. The trade-off is that institutions cede some control over roadmap priorities and must align local practice with a national standard rather than an internally chosen one.

Dimension	Centralised national model (ARDC)	Distributed institutional model
Funding source	National programme (NCRIS)	Individual institutional budgets
Discovery layer	One shared portal (Research Data Australia)	Separate institutional repositories
Compute/storage	Shared national cloud (Nectar)	Institution-specific procurement
Standards consistency	Single national PID and FAIR policy	Varies by institution
Duplication risk	Low — infrastructure built once	Higher — each institution rebuilds similar tooling
Local control	Lower — national roadmap governs priorities	Higher — institution sets its own priorities

Institutions weighing this trade-off are not choosing between “good” and “bad” infrastructure; they are choosing where duplication cost and local autonomy sit on a single spectrum. The ARDC demonstrates that a national commons can deliver FAIR-aligned discovery and compute without every institution independently re-solving the same identifier and storage problems.

Answer-first questions on the ARDC

What is Research Data Australia?

Research Data Australia is the ARDC’s national discovery portal, giving researchers a single point of access to metadata describing datasets held across more than 100 Australian research organisations, cultural institutions, and government agencies. It descends from the earlier ANDS Collections Registry and remains the ARDC’s principal public-facing discovery service.

How is the ARDC funded?

The ARDC is funded chiefly through the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS), following on from funding arrangements that originally supported its predecessor programmes, ANDS and Nectar, including money from the Education Investment Fund under the Super Science Initiative.

What did the ARDC replace?

The ARDC replaced three separately governed programmes on 1 July 2018: the Australian National Data Service (ANDS), Nectar (National eResearch Collaboration Tools and Resources), and Research Data Services (RDS), consolidating discovery, compute, and storage under one national body.

What this means for institutions and funders

For institutions and funders outside Australia, the ARDC is a working case study rather than a template to copy wholesale — national research systems differ in scale, federal structure, and existing infrastructure maturity. What generalises is the underlying logic: discovery metadata, persistent identifiers, and baseline compute are commodity infrastructure that gains value from being shared rather than re-procured by every institution.

Institutions currently investing in distributed RDM should ask which of their own services are genuinely differentiating (subject-specific curation, disciplinary expertise) versus which are commodity infrastructure better funded once, nationally or consortially, than dozens of times over.

Outlook

The ARDC’s roadmap continues to run through Australia’s National Research Infrastructure planning cycle, with persistent identifiers and FAIR-by-default practice as recurring priorities. As more national and regional funders assess where to draw the line between centralised and distributed research administration infrastructure, the ARDC’s decade-long consolidation experience — and the FAIR principles it operationalises via its data terminology and standards resources — offers a concrete reference point rather than an abstract framework.

July 4, 2026

National Data Repository Mandates: UK, US, EU

National data repository requirements now differ sharply by jurisdiction: the UK coordinates through UKRI’s Concordat on Open Research Data and a planned National Data Library, the US relies on agency-specific mandates such as the NIH Data Management and Sharing Policy layered on the OPEN Government Data Act, and the EU binds Horizon Europe funding to mandatory FAIR data management plans routed through the European Open Science Cloud. All three converge on the FAIR principles as the technical baseline, but they diverge sharply on enforcement, centralisation and what “as open as possible” means in practice.

A national data repository is a government- or funder-endorsed infrastructure (or federated network of infrastructures) for depositing, curating and providing persistent access to datasets produced by publicly funded research, so that they meet the FAIR standard of being Findable, Accessible, Interoperable and Reusable. No single global rulebook defines what such a repository must look like — which is precisely why the UK, US and EU have built three structurally different systems around the same FAIR foundation.

What counts as a national data repository?
How does the UK mandate research data repositories?
How does the US enforce data-sharing requirements?
How does the EU mandate FAIR data through Horizon Europe?
Where do the three approaches converge and diverge?
Common questions on national data repository mandates
What this means for institutions and researchers

What counts as a national data repository?

A national data repository is infrastructure, endorsed at government or funder level, that stores research datasets with persistent identifiers, standardised metadata and defined reuse licences. The FAIR data principles — first formalised in Scientific Data in 2016 — define the technical bar: data and metadata must be findable via persistent identifiers, accessible over open protocols, interoperable through shared vocabularies, and reusable under clear provenance and licensing.

Crucially, FAIR does not mean unconditionally open. The dominant policy language across all three jurisdictions is some variant of “as open as possible, as closed as necessary” — datasets with legitimate privacy, security or intellectual-property constraints can remain FAIR while access to the raw data itself stays restricted, provided the metadata is still discoverable.

How does the UK mandate research data repositories?

The UK’s approach is coordinated centrally through UK Research and Innovation (UKRI) rather than fragmented across individual funders. The Concordat on Open Research Data, agreed by UK funders and sector bodies, sets the expectation that publicly funded research data should be made openly available with as few restrictions as possible, in a timely and responsible manner.

UKRI has been developing a harmonised open research data policy to replace the varying requirements previously set by its individual research councils, with a more explicit alignment to FAIR principles than the original Concordat text. The UK does not run one single mandatory repository for all disciplines; instead it combines a cross-disciplinary resource — the UK Data Service, holding the country’s largest collection of economic, population and social research data — with discipline-specific data centres. A National Data Library initiative is also under development. Enforcement runs through grant conditions rather than statute.

How does the US enforce data-sharing requirements?

The US combines a government-wide legal baseline with agency-specific enforcement, producing a federated rather than centralised system. The OPEN Government Data Act codifies the principle that federal government data — including federally funded research outputs captured by agencies — should be open and machine-readable by default, operationalised through the Data.gov catalogue.

The sharpest enforcement sits with individual funding agencies. Under the NIH Data Management and Sharing (DMS) Policy, effective since January 2023, NIH-funded researchers must submit a DMS Plan describing how scientific data will be managed and shared, with FAIR principles strongly encouraged. The National Science Foundation requires a Data Management Plan for all proposals and supports deposit through disciplinary repositories and its own NSF Public Access Repository (NSF-PAR). This gives communities flexibility to choose fitting repositories, at the cost of one unified national research-data repository.

How does the EU mandate FAIR data through Horizon Europe?

The EU operates the most centrally binding framework of the three. The Directive on open data and the re-use of public sector information requires member states to establish national policies for open access to publicly funded research data on an “open by default” basis, explicitly aligned with FAIR principles. For research funded under Horizon Europe, making data FAIR is a mandatory grant condition, not a recommendation: funded projects must produce a Data Management Plan and comply with FAIR requirements as a condition of the award, under the same “as open as possible, as closed as necessary” test used elsewhere.

Infrastructure is built around the European Open Science Cloud (EOSC), described by the European Commission as a federated environment intended to become a “web of FAIR data and services” spanning all scientific disciplines. Within that federation, researchers commonly deposit through the general-purpose repository Zenodo — built and operated with CERN — while the Community Research and Development Information Service (CORDIS) serves as the EU’s public repository of record for funded project information.

Where do the three approaches converge and diverge?

All three jurisdictions treat FAIR as the technical baseline and all three qualify openness with a “necessary restriction” clause. The differences lie in enforcement mechanism, degree of centralisation, and whether a single flagship repository exists.

Feature	UK	US	EU
Primary instrument	UKRI Concordat on Open Research Data (evolving to a harmonised FAIR-explicit policy)	OPEN Government Data Act; NIH DMS Policy; NSF Public Access Policy	EU Open Data Directive; Horizon Europe grant conditions
Legal basis	Funder policy condition	Federal statute plus agency policy	Legally binding directive plus grant condition
FAIR status	Increasingly explicit in new UKRI policy	Encouraged, embedded in agency plans	Mandatory for Horizon Europe-funded projects
Data management plan required	Yes, for UKRI funding	Yes, for NIH and NSF funding	Yes, mandatory for Horizon Europe
Repository model	Centralised flagship (UK Data Service) plus disciplinary centres	Federated (Data.gov, NSF-PAR, disciplinary repositories)	Federated supranational (EOSC, Zenodo, CORDIS)

Common questions on national data repository mandates

What are the FAIR data principles required by UKRI?

UKRI requires funded researchers to make outputs Findable, Accessible, Interoperable and Reusable, aligned with its Concordat on Open Research Data. UKRI councils frame this as maximising the impact, visibility and citation of research while applying the “as open as possible, as restricted as necessary” test to data with legitimate sensitivities.

Does the NIH require a data management and sharing plan?

Yes. Since 25 January 2023, the NIH Data Management and Sharing (DMS) Policy requires funded researchers to submit a DMS Plan describing how scientific data will be preserved and shared. NIH strongly encourages applying FAIR principles when selecting repositories and structuring metadata for that plan.

Is FAIR data mandatory under Horizon Europe?

Yes, unlike the UK’s evolving policy and the US’s encouraged-but-agency-specific approach, Horizon Europe makes FAIR data management a binding grant condition. Funded projects must submit a Data Management Plan and comply with FAIR requirements, subject to the same necessary-restriction exceptions used across all three jurisdictions.

Is there one single national data repository researchers must use?

No jurisdiction mandates a single universal repository. The UK combines a flagship service (UK Data Service) with disciplinary centres; the US runs a federated system across Data.gov and agency repositories such as NSF-PAR; the EU federates access through EOSC, Zenodo and CORDIS. Researchers typically choose the repository matching their discipline and funder requirements.

What this means for institutions and researchers

For research administrators managing multi-jurisdictional funding, a single data management plan template cannot satisfy all three regimes. Compliance teams must map deposit requirements per funder rather than assume FAIR-labelled data automatically meets every mandate’s specific repository, licensing and metadata conditions.

The trend line points toward convergence. The UK’s move to a harmonised, more explicitly FAIR-aligned UKRI policy and the EU’s EOSC federation both signal a shift from fragmented rules toward unified infrastructure. The US remains the outlier: its federal open-data statute operates largely independently of agency-specific mandates from NIH and NSF.

Institutions should treat “FAIR” and “open” as related but distinct compliance targets. A dataset can be fully FAIR — persistently identified, well-described, licensed — while remaining access-restricted for legitimate reasons in every jurisdiction covered here. Repository choice and data management plan content should be checked against the specific funder mandate, not a generic FAIR checklist.

July 3, 2026

Indigenous Data Sovereignty: Why FAIR Needs CARE

Indigenous data sovereignty is the right of Indigenous peoples and nations to govern the collection, ownership, interpretation, and application of data about their own communities, lands, and knowledge. Blanket “open by default” research-data mandates built on the FAIR Data Principles can override that right when they treat findability and accessibility as unconditional. The fix is not to abandon FAIR, but to add a CARE-informed consent layer — tiered access controls, negotiated data-sharing agreements, and governance authority held by the originating community — that sits inside FAIR’s own accessibility principle rather than outside it.

As funders push open-data compliance deeper into grant conditions, research offices increasingly reconcile a mandate to publish with a community’s right to say no, say later, or say “only under these conditions.”

What is indigenous data sovereignty?
How do CARE principles relate to FAIR data principles?
Do open data mandates override indigenous data sovereignty?
What does a CARE-informed consent layer look like in practice?
Data sharing agreement vs data processing agreement — which applies?
Implications for research administrators
Conclusion: consent is compatible with findability

What is indigenous data sovereignty?

Indigenous data sovereignty describes the inherent right of Indigenous peoples to govern data about their own communities, resources, and lands — a right that derives from tribal and national self-determination rather than from any single data-protection statute. The Global Indigenous Data Alliance (GIDA) traces the movement’s institutional roots to country-specific networks: the Aotearoa New Zealand-based Te Mana Raraunga (Māori Data Sovereignty Network, formed 2015), Australia’s Maiam nayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Collective (2017), Canada’s First Nations Information Governance Centre, and the US Indigenous Data Sovereignty Network.

These networks converged on a shared position: data collected about Indigenous peoples should remain subject to the governance of the nation or community it describes — including tribal law — not solely the policies of the funder, institution, or repository that hosts it. This is a governance claim, not merely a privacy preference, and it applies whether the data in question is health records, environmental monitoring, ceremonial knowledge, or genomic samples.

How do CARE principles relate to FAIR data principles?

The CARE Principles for Indigenous Data Governance — Collective Benefit, Authority to Control, Responsibility, and Ethics — were developed specifically to sit alongside the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable), not to replace them. The Research Data Alliance’s International Indigenous Data Sovereignty Interest Group formalised CARE in 2019 to address what FAIR, on its own, does not: who benefits, who decides, and under what ethical obligations data circulates.

Principle set	Primary question it answers	Governing focus
FAIR (Findable, Accessible, Interoperable, Reusable)	How usable is the data, technically?	Data as an object
CARE (Collective Benefit, Authority to Control, Responsibility, Ethics)	Who benefits, and who decides?	Data as a relationship

Framing these as rivals misreads FAIR’s own text. FAIR principle A1.2 explicitly states that the accessibility protocol must “allow for an authentication and authorisation procedure, where necessary” — meaning FAIR was never a synonym for unconditional open access. Data can be fully findable, with rich metadata, a persistent identifier, and a documented access route, while the underlying content sits behind a governed permission gate. That gap between “discoverable” and “downloadable” is precisely where a CARE-informed consent layer belongs.

Do open data mandates override indigenous data sovereignty?

Open data mandates do not automatically override Indigenous data sovereignty, but poorly designed ones can function that way in practice. Funder policies such as UKRI’s research data policy and cOAlition S’s Plan S commitments require data to be made available with “as open as possible, as restricted as necessary” language — a formulation that already anticipates legitimate restriction, yet is frequently implemented by institutions as a default push toward maximal openness.

PLOS’s own editorial position, published in its EveryONE blog in October 2023, states plainly that Indigenous Data Sovereignty is the right of Indigenous peoples to own and govern data about their communities, resources, and lands — and that open-access publishing policies must accommodate, not override, that right through mechanisms such as data-access statements that explain restrictions rather than force disclosure. The Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Code of Ethics for Aboriginal and Torres Strait Islander Research similarly requires researcher agreements on data ownership, access, and storage to be negotiated with communities before collection begins, not retrofitted at publication.

Where mandates and sovereignty align: both frameworks require documented data-management plans, clear provenance, and persistent identifiers.
Where friction emerges: “open by default” clauses that treat non-disclosure as an exception requiring justification, rather than a governance decision requiring respect.
The resolvable middle: metadata and access statements can be fully open even when the underlying dataset is access-controlled.

A consent layer is a set of governance and technical controls — inserted between data creation and data reuse — that lets a community set the terms under which its data is discovered, accessed, and re-used, without removing that data from the research record entirely. In practice this combines four elements research administrators already have tools for:

Tiered metadata: a public, FAIR-compliant record (title, abstract, provenance, persistent identifier via DataCite or Crossref) that is fully findable even when the dataset itself is restricted.
Governance-holder sign-off: a named Indigenous governance body (tribal council, iwi authority, data sovereignty collective) with authority to approve, condition, or decline each reuse request — not a one-time blanket consent captured at initial collection.
A trusted research environment (TRE): a controlled-access computing environment where approved researchers can analyse restricted data without exporting raw records, satisfying reusability without unconditional distribution.
Biocultural or Traditional Knowledge labels: machine-readable metadata tags (the Local Contexts initiative’s TK and BC Labels) that travel with a dataset to signal provenance, cultural protocols, and permitted uses wherever it is indexed or mirrored.

None of these four elements block findability. They condition access — which is exactly what FAIR’s accessible principle already permits.

Data sharing agreement vs data processing agreement — which applies?

A data sharing agreement (DSA) and a data processing agreement (DPA) serve different legal functions, and conflating them is a common source of failure in Indigenous data governance. A DSA governs the transfer of data between two parties who each have independent authority over how it is subsequently used — the correct instrument for Indigenous data sovereignty, because it lets the originating community retain and exercise ongoing authority to control, per CARE’s second principle.

A DPA, by contrast, is used when one party (a processor) handles data strictly on behalf of another (the controller) with no independent decision-making rights — the model built into contract templates under UK GDPR. Using a DPA where a DSA is required strips the originating community of ongoing authority.

Instrument	Who holds decision authority	Fit for Indigenous data sovereignty
Data Sharing Agreement (DSA)	Both parties, independently	Appropriate — preserves community authority to control
Data Processing Agreement (DPA)	Controller only; processor has none	Inappropriate as a standalone instrument — reduces community to data subject

Implications for research administrators

Research data management (RDM) policy templates written purely around funder compliance checklists will systematically under-serve Indigenous data governance unless they build in a consent layer as a standard clause, not an exception process. Institutions should require, at the data-management-plan stage, an explicit question: does this dataset describe an Indigenous community, and if so, has a governance body with authority to control been identified and consulted before collection?

Research data repositories that host Indigenous-derived datasets should support tiered access controls and TK/BC Label metadata natively, rather than treating restricted-access as a bespoke workaround bolted onto an open-by-default platform. Institutions building or procuring a trusted research environment for sensitive data should evaluate whether it can enforce community-set reuse conditions per dataset, not merely per project.

Conclusion: consent is compatible with findability

Indigenous data sovereignty and the FAIR Data Principles are not opposed frameworks competing for the same ground — FAIR governs how data is described and discovered, while CARE and a CARE-informed consent layer govern who decides what happens next. A research data management policy that hard-codes this distinction, uses the right agreement type for the right relationship, and gives Indigenous governance bodies a standing role rather than a one-off consultation, satisfies funder open-data requirements and Indigenous data sovereignty at the same time. The two are compatible by design; the mandates just need to stop assuming otherwise.

July 3, 2026

F-UJI FAIR Evaluator: What It Actually Scores

The F-UJI FAIR evaluator is an automated web service that checks whether a dataset’s metadata — not its actual data quality — satisfies a fixed set of machine-readable tests built from the FAIRsFAIR Data Object Assessment Metrics. A high F-UJI percentage means a dataset’s landing page, identifiers and schema exposed enough structured signals for a script to find and parse; it does not certify that a human researcher can actually understand, trust or reuse the data inside.

F-UJI is one of several tools now used to operationalise the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable), alongside FAIRshake, FAIR-Checker, FAIR Aware and the FAIR Data Point specification promoted by the GO FAIR Initiative. This article explains what each type of tool actually scores, where automated scoring diverges from manual FAIR maturity review, and why institutions and research data repositories should treat a high machine score as a floor, not a finish line.

What is the F-UJI FAIR evaluator?
How F-UJI’s automated scoring actually works
F-UJI vs FAIRshake vs manual maturity frameworks
What a high FAIR score does not prove
Common questions about automated FAIR scoring
Implications for repositories and funders

What is the F-UJI FAIR evaluator?

F-UJI (FAIRsFAIR Research Data Object Assessment Service) is a web service and REST API that assesses a research data object against 16 core FAIR metrics. A user submits a persistent identifier — typically a DOI — and F-UJI queries external infrastructure including the DataCite API, re3data, schema.org JSON-LD embedded on the landing page, and DCAT or Dublin Core fields to determine whether each metric passes.

The metrics were developed under the EU Horizon 2020 FAIRsFAIR project (2019–2022) and are now maintained and versioned by its successor, the FAIR-IMPACT project, with the metric set published as a citable release (DOI 10.5281/zenodo.15045911). F-UJI’s source code is maintained on GitHub by the PANGAEA data publisher, and the tool is offered as a free public assessment service and API.

How F-UJI’s automated scoring actually works

F-UJI does not read the dataset’s content. It inspects the metadata surrounding the dataset — the landing page markup, the identifier’s resolution behaviour, declared licences, and machine-readable provenance fields — and scores each of the 16 metrics as pass, partial or fail. The overall percentage is a weighted sum across the Findable, Accessible, Interoperable and Reusable metric groups.

Findable metrics check for a persistent identifier, whether the metadata is indexable by search engines, and whether the identifier resolves to rich metadata.
Accessible metrics check that metadata remains retrievable even if the data itself becomes unavailable, and that access protocols are standard.
Interoperable metrics check for structured vocabularies declared in a JSON-LD @context (schema.org, DCAT, PROV-O) and for qualified references to related resources.
Reusable metrics check for a machine-readable licence, provenance statements, and a community-recognised file format for the data’s actual distribution.

A documented example from the FAIR Data Innovations Hub illustrates how mechanical this scoring is in practice: a dataset scored 67% on its first F-UJI run, with the Findable, Interoperable and Reusable metrics flagged for missing JSON-LD context, missing PROV-O provenance fields and an undeclared distribution format. After the maintainers added a single enriched schema.org/PROV-O JSON-LD block to the landing page — without changing the underlying data at all — the same dataset scored 100% on re-assessment. The data did not become more reusable in that interval; its metadata simply became more machine-legible.

F-UJI vs FAIRshake vs manual maturity frameworks

F-UJI is not the only FAIR assessment approach in circulation, and the three main categories differ in what they actually test and who defines “FAIR” for the purpose of the test.

Dimension	F-UJI	FAIRshake	Manual maturity review
Method	Fully automated, no human input	Hybrid — automated tests plus human-scored rubrics	Fully manual, questionnaire/checklist-based
Basis of criteria	Fixed FAIRsFAIR/FAIR-IMPACT metric set	Community-defined rubrics per research domain	Institution- or project-specific checklist
Input required	A persistent identifier (e.g. DOI)	A URL, via web interface or browser extension	The dataset, documentation and reviewer time
Output	Percentage score per metric and overall	Nine-square “FAIR insignia” visualisation	Narrative report with recommendations
Scalability	High — suited to bulk repository audits	Moderate	Low — resource-intensive
Contextual nuance	Low — rigid, rule-based	Moderate — rubrics can be domain-tailored	High — accounts for discipline-specific reuse

FAIRshake was originally developed by the Ma’ayan Laboratory at the Icahn School of Medicine at Mount Sinai under the US National Institutes of Health’s Big Data to Knowledge (BD2K) programme. Rather than one fixed metric set, it lets research communities author their own rubrics and score resources — manually, automatically, or both — against them, then renders the result as a colour-coded insignia rather than a single number.

The GO FAIR Initiative takes a different, upstream approach: instead of scoring existing datasets after the fact, it promotes the FAIR Data Point (FDP) specification — a layered REST API (FAIR Data Point → Catalog → Dataset → Distribution) that a research data repository implements so that FAIRness is built into how metadata is served, rather than retrofitted and then measured.

What a high FAIR score does not prove

A 100% F-UJI score is a statement about metadata exposure, not about data quality, ethical provenance, statistical validity, or whether another researcher can actually rerun the analysis. This distinction matters because automated tools are increasingly cited in funder and repository policy discussions as if they were a proxy for genuine reusability.

A perfectly scored dataset can still contain undocumented preprocessing steps, missing sample metadata, or errors that no metadata check can catch.
F-UJI cannot verify that a licence field is legally accurate — only that a machine-readable licence field exists.
None of F-UJI, FAIRshake or FAIR Aware assess whether the underlying research methodology or data collection itself was sound; that remains a peer-review and domain-expert function.
Scores are not comparable across tools: a dataset scoring 67% on F-UJI is not equivalent to 67% “FAIR” on any absolute scale, since each tool’s metric weighting differs.

A ScienceDirect study (Devaraju et al., 2021, cited more than 90 times) frames this precisely, describing F-UJI-based measurement as “centred on core metrics” that apply until domain- or community-specific FAIR criteria are agreed — an explicit acknowledgement that the automated baseline is deliberately generic, not a final word on reusability.

Common questions about automated FAIR scoring

What does F-UJI actually measure?

F-UJI measures whether a dataset’s metadata — its identifier, landing-page markup, licence declaration and provenance fields — meets 16 machine-testable criteria drawn from the FAIRsFAIR/FAIR-IMPACT metric set. It does not inspect or validate the dataset’s actual content, methodology or scientific accuracy.

Is a high F-UJI score the same as genuinely FAIR data?

No. A high score confirms that metadata is machine-readable and complete according to a fixed rule set. Genuine reusability additionally depends on documentation quality, data integrity and domain-specific context that automated tools are structurally unable to evaluate.

How does FAIRshake differ from F-UJI?

FAIRshake combines automated tests with human-scored, community-defined rubrics, whereas F-UJI applies one fixed metric set with no human input. FAIRshake reports results as a visual “FAIR insignia” rather than F-UJI’s single percentage score.

Do funders formally require automated FAIR scores?

No major funder currently mandates a specific F-UJI or FAIRshake score as a compliance threshold. Funder and institutional policies (for example under Horizon Europe and UKRI) reference the FAIR Data Principles as a qualitative expectation, with automated tools used voluntarily to self-check progress.

Implications for repositories and funders

For research data repositories, the practical use of F-UJI is diagnostic, not evaluative: it flags specific, fixable metadata gaps — a missing JSON-LD block, an undeclared licence field, an absent provenance statement — far faster than a manual audit could. Repositories improving their F-UJI scores should treat each metric failure as a discrete engineering task, not as a proxy for a broader data-quality programme.

For institutions and funders assessing compliance, the more defensible approach combines automated metadata scoring as a first-pass filter with a manual or community-rubric review for anything reused in decision-relevant research. Relying on one automated percentage to certify “FAIR” data risks the same error as equating a spellchecker’s clean pass with a well-argued essay: necessary, not sufficient.

As the GO FAIR Initiative’s FAIR Data Point specification gains adoption, the balance may shift from retrospective scoring toward FAIRness built into repository infrastructure from the point of deposit — making after-the-fact tools like F-UJI a verification step rather than the primary mechanism for achieving reusable research data.

July 3, 2026

UK Data Service vs ICPSR: Choosing an Archive

The UK Data Service and ICPSR are the two largest social-science data archives in the English-speaking research world, and the right choice usually depends on jurisdiction and funder mandate rather than feature parity. The UK Data Service is the ESRC-funded national repository for UK social, economic and population data, while ICPSR is a US-based, membership-funded consortium archive at the University of Michigan. Researchers outside the biomedical repository ecosystem — where PubMed-linked mandates dominate — need to weigh deposit workflow, restricted-access tiers and citation practice before picking either as a home for a dataset.

The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom, formed in October 2012 when the Economic and Social Research Council (ESRC) consolidated the UK Data Archive — established at the University of Essex in 1967 — with several university partners. ICPSR, by contrast, is a membership consortium of academic and research institutions that has archived social and behavioural science data since 1962. Both are listed in re3data.org, the global Registry of Research Data Repositories, and both hold CoreTrustSeal certification for trustworthy digital repositories.

What the UK Data Service and ICPSR actually are
How deposit workflows compare
How restricted-access tiers differ
How citation practices compare
Which archive fits your project
Answer-first questions researchers ask

What Are the UK Data Service and ICPSR?

The UK Data Service is a national data repository funded through UKRI’s Economic and Social Research Council (ESRC) and led by the UK Data Archive at the University of Essex, in partnership with the University of Manchester, Jisc, EDINA and University College London. It holds more than 6,000 datasets, including UK Census data, the Labour Force Survey, the Millennium Cohort Study and cross-national surveys such as the European Social Survey.

ICPSR — the Inter-university Consortium for Political and Social Research — is a membership-funded archive based at the University of Michigan, serving several hundred member institutions worldwide alongside non-member depositors and users. Its holdings span large-scale US and international surveys, criminal justice, education and ageing data, and it runs openICPSR as a self-publishing companion repository for rapid dissemination.

How Do Deposit Workflows Compare?

Both archives run a curated deposit model rather than a bare-metal upload box: staff review documentation, check disclosure risk and enhance metadata before release. The UK Data Service’s ESRC funding creates a contractual hook — grant holders are required to offer their data for archiving as a condition of the ESRC Research Data Policy — which ICPSR’s membership model does not replicate for non-US funders.

UK Data Service: two routes — the main curated collection for large, complex or sensitive studies, and ReShare, a lighter self-deposit repository for smaller datasets, code and syntax files.
ICPSR: two routes — the standard curated deposit process, and openICPSR, a self-publishing repository for researchers who want faster turnaround with lighter-touch review.

Depositors submitting to either service should expect a documentation checklist covering variable-level metadata, consent and ethics evidence, and a data management plan — the same categories UKRI and NSF grant terms typically require regardless of which archive receives the deposit.

How Do Restricted-Access Tiers Differ?

Access tiering is where the two services diverge most for researchers working with confidential or disclosive social-science data. The UK Data Service operates a published three-tier model; ICPSR uses a comparable but differently named structure built around its Virtual Data Enclave.

Access dimension	UK Data Service	ICPSR
Open tier	No registration; Open Government Licence data	Public-use files via free MyData account
Standard tier	Safeguarded — registration plus End User Licence	Member-institution access under consortium terms
Restricted tier	Controlled — SecureLab, requiring accredited-researcher training under the Five Safes Framework	Restricted-use data via secure Virtual Data Enclave or encrypted physical media, subject to a data security plan
Governance standard	Accredited under the Digital Economy Act 2017 by the UK Statistics Authority (2020)	Institutional Review Board and data-use-agreement based review

The UK Data Service’s Five Safes Framework — safe people, projects, settings, data and outputs — was developed with HMRC DataLab and the Office for National Statistics Secure Research Services, and now underpins the SafePod Network launched in 2021 for wider geographical access to sensitive data. ICPSR’s restricted-data pathway achieves an equivalent security outcome through its enclave model but does not use the Five Safes terminology, which matters for UK researchers writing data management plans against ESRC or UKRI templates that reference it explicitly.

How Do Citation Practices Compare?

Both archives assign persistent identifiers and expect formal data citation, but their machinery differs. The UK Data Service works with DataCite and the British Library to issue DOIs and promotes an easy-to-use citation tool, framing its approach around the FAIR data principles — Findable, Accessible, Interoperable, Reusable — and its open-source QAMyData tool, which gives depositors a health check for numeric data before release.

ICPSR similarly issues persistent identifiers for deposited studies and expects citation in publications that reuse its data, but its emphasis sits more on bibliography-style study citations tied to its own numbering system than on a dedicated public FAIR-compliance tool. For researchers publishing in journals that enforce data-availability statements — a growing requirement under funder open-science mandates — the practical difference is smaller than the access-tier gap: both produce a citable, resolvable record, but only the UK Data Service publishes a named QA tool for pre-citation data quality.

Which Archive Should Researchers Outside Biomedicine Choose?

For most projects the decision is jurisdictional rather than qualitative. A research data repository choice driven by funder mandate removes ambiguity immediately: ESRC-funded UK researchers must offer data to the UK Data Service, while NSF- or NIH-adjacent US social-science grants more commonly point toward ICPSR or openICPSR.

Choose the UK Data Service if your funder is UKRI/ESRC, your data concerns UK administrative, census or longitudinal panel data, or you need SecureLab/Five Safes access to controlled government microdata.
Choose ICPSR if your institution is a consortium member, your data is US-focused or cross-national with US partners, or you want the faster openICPSR self-publishing route.
Consult both catalogues before depositing internationally comparable survey data (e.g. European Social Survey, Eurobarometer) — coverage overlaps, and the UK Data Service can facilitate UK-based access to ICPSR holdings.

Institutions building or reviewing a data management plan should treat this as a data repository for research compliance question first and a discoverability question second: a technically excellent dataset deposited in the wrong repository for its funder mandate creates avoidable rework at grant closeout.

Answer-First Questions Researchers Ask

What Is the UK Data Service?

The UK Data Service is the ESRC-funded national repository for UK economic, population and social research data, led by the UK Data Archive at the University of Essex. It holds over 6,000 datasets, including census, survey and longitudinal study data, and operates under the OAIS digital-preservation reference model.

How Do You Access Data on the UK Data Service?

Access runs through three published tiers: Open data requiring no registration, Safeguarded data requiring registration and an End User Licence, and Controlled data requiring SecureLab accreditation under the Five Safes Framework. Most researchers start with the free data catalogue and register once they identify a specific study.

Is the UK Data Service Free?

Yes — the service is free to data owners depositing studies and free at the point of use for non-commercial research and teaching. Commercial users may incur administrative fees, and controlled-tier access requires accredited-researcher training rather than a monetary charge.

Implications for Research Administrators

Data management plans reviewed by institutional research offices, ARMA and INORMS-aligned research administrators, and funder compliance teams increasingly treat repository choice as an auditable field, not a footnote. A UK-funded study archived outside the UK Data Service without documented justification can trigger ESRC compliance queries at final reporting; a US consortium study left undeposited with ICPSR can weaken an institution’s case for renewed membership funding. Neither archive competes with domain-specific biomedical repositories governed by NISO, ICMJE or COPE norms — this comparison sits squarely in the national data repository space for social science, distinct from that ecosystem.

As open-science mandates from UKRI, cOAlition S and equivalent US funders converge on FAIR-by-default expectations, the operational gap between the UK Data Service and ICPSR is narrowing to jurisdiction, access-tier terminology and citation tooling rather than underlying trustworthiness — both hold CoreTrustSeal certification and both sit inside the CESSDA/re3data recognised-repository landscape that funders now check by default.

July 3, 2026

Genomic Data Repository Guide: ENA vs GEO vs SRA

Choosing a genomic data repository comes down to three questions: what type of data you have, whether it is identifiable human data, and what your funder or journal mandates. Raw sequencing reads generally go to the European Nucleotide Archive (ENA) or the Sequence Read Archive (SRA) — two mirrored nodes of the same international collaboration — while processed gene-expression data belongs in the Gene Expression Omnibus (GEO). A genomic data repository is a persistent, publicly accessible database that assigns stable identifiers to deposited sequence or expression datasets so they can be cited, retrieved and reused under FAIR data principles.

ENA, GEO and SRA are the three repositories researchers encounter most often when funder or journal data-sharing policies require deposition of sequencing output. They are not interchangeable: each has a different primary data type, a different metadata standard, and a different position in the international data-sharing infrastructure. This guide compares them on deposit requirements, metadata standards and journal acceptance so research administrators and authors can make a defensible, mandate-compliant choice.

What is a genomic data repository?
ENA vs GEO vs SRA: how do they differ?
What are the deposit requirements for each repository?
Which metadata standards apply?
Do journals and funders accept all three equally?
Frequently asked questions
What this means for research administrators

What is a genomic data repository?

A genomic data repository is a curated, publicly accessible database that archives DNA or RNA sequence data — raw reads, assembled genomes, or processed expression tables — and assigns each dataset a stable accession number for permanent citation. Repositories exist because journals and funders increasingly require that sequence data underlying a publication be deposited somewhere reviewers, readers and future researchers can retrieve it, rather than held privately by the authors.

The three most consulted repositories for sequencing output are the European Nucleotide Archive (ENA), the Sequence Read Archive (SRA), and the Gene Expression Omnibus (GEO). ENA and SRA are both members of the International Nucleotide Sequence Database Collaboration (INSDC), alongside Japan’s DNA Data Bank of Japan (DDBJ); records submitted to any one of the three are mirrored across all of them, typically within 24-48 hours.

ENA vs GEO vs SRA: how do they differ?

The single biggest distinction is data type: ENA and SRA hold raw sequence reads (FASTQ, BAM, CRAM), while GEO holds processed functional genomics results — expression matrices, normalised counts and the experimental metadata describing them — and links out to SRA for the underlying raw reads. Geography and stewardship differ too: ENA is maintained by EMBL-EBI in the UK/Europe, while SRA and GEO are both maintained by the US National Center for Biotechnology Information (NCBI).

Feature	ENA	GEO	SRA
Steward	EMBL-EBI (Europe)	NCBI (US)	NCBI (US)
Primary data type	Raw reads, assemblies, annotated sequences	Processed expression data + metadata	Raw sequencing reads
INSDC member	Yes	No (links to SRA)	Yes
Metadata standard	ENA checklists	MINSEQE / MIAME	INSDC submission schema
Access model	Open (controlled tier via EGA for identifiable human data)	Open	Open (controlled tier via dbGaP)

A frequently overlooked distinction is access control. None of ENA, SRA or GEO is designed to hold identifiable human genomic or phenotypic data. That category of data belongs in a controlled-access archive — the European Genome-phenome Archive (EGA), jointly run by EMBL-EBI and the CRG, or NCBI’s database of Genotypes and Phenotypes (dbGaP) — where access is granted through a data access committee rather than opened to the public. Depositing identifiable clinical genomic data in an open repository such as ENA or SRA would breach both the repositories’ own policies and, in most jurisdictions, data protection law.

What are the deposit requirements for each repository?

Each repository sets its own submission checklist, but all three require a structured description of the experiment alongside the sequence files themselves.

ENA requires a study, sample, experiment and run object for each submission, described against one of ENA’s checklist templates (for example, the pathogen or invertebrate checklists), plus the raw read files.
SRA requires equivalent BioProject and BioSample records, submitted through NCBI’s submission portal, with reads in FASTQ or BAM/CRAM format.
GEO requires a MINSEQE-compliant description of the experimental design (samples, protocols, processed data matrix) and will route the corresponding raw reads to SRA as part of the same submission, generating a linked SRA accession automatically.

Because ENA and SRA mirror each other, a dataset submitted to one is not normally resubmitted to the other — submitting twice creates duplicate, unlinked accessions rather than better coverage.

Which metadata standards apply?

Metadata quality, not just file deposition, is what makes a dataset FAIR — Findable, Accessible, Interoperable and Reusable, per the FAIR data principles first published by Wilkinson et al. in 2016. GEO submissions are assessed against MIAME (Minimum Information About a Microarray Experiment) for array data and MINSEQE (Minimum Information about a high-throughput Nucleotide Sequencing Experiment) for sequencing-based expression studies. ENA and SRA submissions follow INSDC’s shared sample and experiment metadata schema, supplemented by checklist-specific fields for the sample type in question.

Consistent metadata is also what allows a dataset to be discovered through cross-repository registries such as re3data and FAIRsharing, both of which index genomic repositories alongside thousands of other subject and generalist repositories.

Do journals and funders accept all three equally?

Most journal data-availability policies name an INSDC-compliant repository — ENA, SRA or DDBJ — as the acceptable destination for raw sequence data, and GEO or ArrayExpress for expression data. PLOS, for example, states that authors should select field-appropriate repositories and lists ENA, SRA, GEO and DDBJ among its recommended sequencing repositories, while also pointing authors to re3data and FAIRsharing when no field-specific option exists.

Funder policy is generally repository-agnostic within the INSDC family: the NIH Genomic Data Sharing Policy and the 2023 NIH Data Management and Sharing Policy both accept SRA, dbGaP or an equivalent controlled-access archive for human data, without mandating SRA specifically over ENA. UK and European funders operating under UKRI or Horizon Europe open-science requirements similarly accept any INSDC-affiliated repository, reflecting the FAIR data principles rather than naming a single preferred database.

Frequently asked questions

What is the difference between ENA, GEO and SRA?

ENA and SRA both archive raw sequencing reads and mirror each other as INSDC members, differing mainly in which institution — EMBL-EBI or NCBI — hosts the submission. GEO instead archives processed gene-expression results and metadata, forwarding the associated raw reads to SRA automatically during submission.

Do I need to submit data to both GEO and SRA?

Not separately. When you submit a gene-expression study to GEO, the platform generates a linked SRA accession for the raw reads as part of the same workflow, so a single submission satisfies both repositories without duplicate uploads.

Is ENA the same as SRA?

No — they are separate databases run by different organisations that mirror the same underlying INSDC data. A dataset submitted to ENA in Europe becomes visible through SRA in the US within roughly one to two days, and vice versa, so researchers choose one, not both.

Which repository do funders require for genomic data?

Most funder policies, including NIH’s Genomic Data Sharing Policy and UKRI’s open research requirements, accept any INSDC-affiliated repository — ENA, SRA or DDBJ — for raw sequence data, plus GEO for expression data, rather than mandating one specific database.

What this means for research administrators

For institutions building data-management-plan templates or compliance checklists, the practical rule is to map deposition guidance to data type and access sensitivity rather than to a single named repository: raw non-identifiable reads to ENA or SRA, expression matrices to GEO, and any identifiable human genomic or clinical data to a controlled-access archive such as EGA or dbGaP. Framing repository choice this way keeps research administration guidance aligned with funder and journal policy regardless of which INSDC node an individual researcher prefers to use.

As funder mandates increasingly cite FAIR data principles explicitly rather than naming individual repositories, the durable compliance strategy is to select any INSDC-affiliated repository appropriate to the data type, document the accession number in the manuscript, and reserve controlled-access archives strictly for identifiable human data. Research offices that build this decision logic into deposit checklists now will need far less rework as funder policy language continues to converge on FAIR terminology rather than named databases.

July 3, 2026

Tag: research data repository

What Is a FAIR Data Point?

How Does the GO FAIR Initiative Use FAIR Data Points?

FAIR Data Point vs Machine-Actionable DMP: What Is the Difference?

How Is FAIRness Measured? The F-UJI Evaluator

Where Does DDI Fit Into the FAIR Data Point Stack?

Frequently Asked Questions

What is a FAIR data point?

What does FAIR data mean?

What are the four pillars of the FAIR data principles?

What This Means for Data Stewards and Developers

What is the NIST Materials Data Repository?

How does the repository support FAIR data principles?

How does it compare with other materials data infrastructure?

What does this mean for RDM programmes?

Answer-first Q&A

What is the purpose of a materials data repository?

What are examples of materials data repositories besides NIST’s?

Is it costly to deposit data in a repository like NIST’s?

What is the best materials data repository for FAIR compliance?

Where materials science RDM is heading

Why RDM policy gets treated as a cost centre

What the evidence actually says about FAIR and avoided cost

How funder compliance requirements are changing the calculus

The case for investing in data stewardship, not just policy text

Answer-first Q&A

What is a research data management policy?

What are the FAIR data principles?

Do UK and EU funders require a data management plan?

How much does poor research data management actually cost?

Implications for institutional leaders

What is the DDI metadata standard?

Who maintains DDI and which archives use it?

How does DDI support the FAIR data principles?

DDI-Codebook vs DDI-Lifecycle vs DDI-CDI: which do you need?

A practical checklist for adopting DDI

Answer-first Q&A

What is the metadata standard DDI?

What is the best metadata standard for survey data?

How does DDI support the FAIR data principles?

What is the difference between DDI-Codebook and DDI-Lifecycle?

What this means for research data repositories

What is the Australian Research Data Commons?

How is the ARDC funded and governed?

What infrastructure does the ARDC actually operate?

Centralised vs distributed: what does the ARDC model mean for institutions?

Answer-first questions on the ARDC

What is Research Data Australia?

How is the ARDC funded?

What did the ARDC replace?

What this means for institutions and funders

Outlook

What counts as a national data repository?

How does the UK mandate research data repositories?

How does the US enforce data-sharing requirements?

How does the EU mandate FAIR data through Horizon Europe?

Where do the three approaches converge and diverge?

Common questions on national data repository mandates

What are the FAIR data principles required by UKRI?

Does the NIH require a data management and sharing plan?

Is FAIR data mandatory under Horizon Europe?

Is there one single national data repository researchers must use?

What this means for institutions and researchers

What is indigenous data sovereignty?

How do CARE principles relate to FAIR data principles?

Do open data mandates override indigenous data sovereignty?

What does a CARE-informed consent layer look like in practice?

Data sharing agreement vs data processing agreement — which applies?

Implications for research administrators

Conclusion: consent is compatible with findability

What is the F-UJI FAIR evaluator?

How F-UJI’s automated scoring actually works

F-UJI vs FAIRshake vs manual maturity frameworks

What a high FAIR score does not prove

Common questions about automated FAIR scoring

What does F-UJI actually measure?

Is a high F-UJI score the same as genuinely FAIR data?

How does FAIRshake differ from F-UJI?

Do funders formally require automated FAIR scores?

Implications for repositories and funders