Explainer · Plain-language

What is metadata?

Metadata is structured information that describes, explains, or locates a resource — in research, the data about a dataset, publication, or other output that makes it findable, understandable, and reusable. Without metadata, a file is just bytes; with it, the resource can be discovered, interpreted, and cited.

CASRAI plain-language explainers — clear answers to recurring research-administration questions

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

Descriptive, administrative and structural metadata

Metadata is commonly divided into three broad types. Descriptive metadata supports discovery and identification — title, author or creator, abstract, keywords, subject. Administrative metadata supports management — technical details such as file format and size, rights and licensing information, and preservation data such as provenance. Structural metadata describes how a compound resource is organised — for instance how chapters relate to a book, or how files relate within a dataset. Together they let a resource be found, understood, managed, and preserved.

Schemas and standards

For metadata to be interpreted consistently, it follows agreed schemas and standards that define which fields exist and what they mean. Dublin Core provides a small, widely used core set of descriptive elements. The DataCite Metadata Schema underpins the metadata supplied when minting DOIs for datasets and other research outputs. schema.org provides vocabularies for marking up resources on the web so search engines can interpret them. Disciplines also maintain their own domain-specific standards for specialised description.

Metadata, discoverability and FAIR

Metadata is what makes a resource discoverable: search and indexing systems rely on it to surface relevant results, and persistent-identifier systems use it to describe and link records. It is central to the FAIR principles — the "F" (Findable) depends on rich metadata assigned a persistent identifier, and metadata supports the "A", "I", and "R" too by documenting access, using shared vocabularies, and recording provenance and licensing for reuse. Crucially, metadata can remain accessible even when the underlying data are restricted.

Good metadata practice

Effective metadata is complete, accurate, and uses controlled vocabularies and standard formats rather than free text alone, so that values are consistent and machine-actionable. Recording metadata as data are created — rather than reconstructing it later — improves quality, as does using persistent identifiers (DOIs, ORCID iDs, ROR IDs) to unambiguously reference people, organisations, and outputs. Well-formed, standards-based metadata is what turns a stored file into a findable, citable, and reusable research output.

Key facts

At a glance

Definition: Structured "data about data" that describes a resource
Main types: Descriptive, administrative, and structural metadata
Standards: Dublin Core, DataCite Metadata Schema, schema.org
Enables: Discoverability, citation, management, preservation
FAIR link: Rich metadata underpins the "Findable" principle
Best practice: Controlled vocabularies and persistent identifiers

Common misconceptions

What people often get wrong

Often heard: Metadata is the same as the data itself.

Actually: No — metadata describes the data (title, creator, format, rights). It can remain openly accessible and searchable even when the underlying data are restricted.

Often heard: Any free-text description counts as good metadata.

Actually: No — effective metadata follows shared schemas and controlled vocabularies (e.g. Dublin Core, DataCite) so it is consistent and machine-actionable, not just human-readable prose.

Often heard: Metadata only matters for librarians.

Actually: No — metadata drives search engines, repositories, and persistent-identifier systems, and is essential for any researcher who wants their outputs found, cited, and reused.

Going deeper