“FAIR data” is one of the most cited phrases in modern research data management, and one of the most frequently misunderstood. It is invoked in funder policies, journal requirements, and data management plans, often as a synonym for “put it online” — which is not what it means. Understanding what Findable, Accessible, Interoperable and Reusable actually require, in practice, is what turns FAIR from a slogan into a set of concrete actions. It is a foundational concern of the data-infrastructure domain, and the companion explainer on what FAIR data is sets out the background; this article is about doing it.
Where FAIR came from, and what it is for
The FAIR Guiding Principles for scientific data management and stewardship were published in 2016 by a broad group of researchers, publishers, and funders, and have since been adopted across the research landscape. Their purpose is specific and worth holding onto: FAIR is about making data usable by machines as well as people. A dataset that a human can eventually make sense of after emailing the author is not FAIR; a dataset that automated systems can find, access, combine, and reuse with minimal human intervention is. The principles describe four properties a dataset and, crucially, its metadata should have.
The four principles, in practice
Findable. Data cannot be reused if it cannot be found. In practice this means the dataset is deposited somewhere with a search index, is described by rich metadata, and — the linchpin — is assigned a globally unique, persistent identifier, typically a DOI. The metadata should be indexed and searchable, and should itself record the identifier. Findability is the property a hard drive or a personal website fundamentally cannot provide; a trusted repository is what supplies it.
Accessible. Once found, the data — or at least its metadata — must be retrievable through a standard, open protocol. Accessibility is the principle most often misread, so it is worth being precise: it does not mean the data must be open to all. It means the conditions of access are explicit and the retrieval mechanism is standardised. Sensitive data may be available only under controlled access, with an authentication and authorisation procedure — and that is still FAIR, provided the rules are clear and the metadata remains accessible even when the data themselves are not. The principle also asks that metadata persist even after the data are no longer available, so that a record of what existed survives.
Interoperable. Data are interoperable when they can be combined with other data and processed by tools without bespoke translation. In practice this means using standard, open file formats rather than proprietary ones; using shared vocabularies, ontologies, and standards to describe variables, so that “sex” or “temperature” mean the same thing across datasets; and including qualified references to other data and metadata, so a dataset declares its relationships rather than leaving them implicit. Interoperability is what lets datasets be aggregated and analysed at scale.
Reusable. The ultimate goal. For data to be genuinely reusable, three things are needed: a clear, accessible licence that states what may be done with the data; rich provenance describing where the data came from and how they were processed; and documentation that meets the relevant community standards, so a new user can understand the data well enough to use them correctly. A dataset with no licence is, in practice, not reusable — a cautious researcher will not build on data whose terms are unstated.
FAIR is not the same as open
The single most important clarification is this: FAIR is not a synonym for open. The principles are deliberately silent on whether data must be free to all; they are about how data are described, identified, and licensed, not about removing all access controls. This is precisely what makes FAIR workable for sensitive data — clinical, personal, commercially confidential, or culturally protected — that cannot ethically or legally be made fully open. Such data can be made findable through open metadata and a DOI, accessible under a documented controlled-access procedure, interoperable through standards, and reusable under an explicit licence. The watchword from the principles is “as open as possible, as closed as necessary.” Conflating FAIR with open leads people either to over-share data they should protect, or to dismiss FAIR as impossible for their field; both are mistakes.
A practical path to FAIR
- Deposit in a trusted repository — a generalist repository such as Zenodo, Figshare, or Dryad, or a discipline-specific one — rather than a lab server or “available on request.” This delivers findability and a persistent identifier in one step.
- Write rich metadata. Describe what the data are, how and when they were collected, and what each variable means, using community standards and vocabularies where they exist. The metadata is what machines read; thin metadata is the most common FAIR failure.
- Use open, standard formats in preference to proprietary ones, so the data can be opened and combined without specialist software.
- Apply an explicit licence. State clearly what may be done with the data; without this, the dataset is not reusable however well it scores on the other principles.
- Record provenance and version. Document the data’s origin and processing, and pin versions so that a citation can identify exactly what was used.
- Set access deliberately. Open where you can, controlled where you must — and keep the metadata accessible either way.
Crediting the work behind FAIR data
Making data FAIR is itself substantial, skilled labour — curating, documenting, standardising, and stewarding a dataset is real intellectual work that too often goes unrecognised. Contributor-role metadata can record it: the CRediT taxonomy includes a dedicated Data curation role, covering the management activities to annotate, scrub, and maintain data for use and reuse. Recording that role on the associated output ensures that the person who did the unglamorous work of making data reusable is credited for it, rather than that effort vanishing into a methods section.
Where shared vocabulary fits
“FAIR”, “findable”, “accessible”, “interoperable”, “reusable”, “metadata”, and “provenance” are used loosely — and “FAIR” is routinely conflated with “open” — which undermines the very interoperability the principles call for. A shared, federated vocabulary that defines these terms precisely is what lets a FAIR claim made in one community be understood in another. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the relevant terms sit in the data-infrastructure domain.
Leave a Reply