Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Explainer · Plain-language

Oai Pmh: Definition, Meaning & Examples | CASRAI

OAI-PMH — the Open Archives Initiative Protocol for Metadata Harvesting — is a long-established standard for sharing metadata between repositories and the services that aggregate them. Released in 2001, it lets harvesters systematically collect descriptive metadata from repositories so that distributed content can be discovered through a single search. Despite its age, OAI-PMH remains a foundational piece of open-access discovery infrastructure, underpinning major aggregators that index millions of records.

CASRAI plain-language explainers — clear answers to recurring research-administration questions

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

What OAI-PMH is for

OAI-PMH was created to solve the problem of discovery across many separate repositories. Each institutional or subject repository holds its own records; without a common way to gather their metadata, finding content scattered across them would be impractical. OAI-PMH provides that common mechanism. It is purely a metadata-harvesting protocol: it moves descriptions of resources, not the resources themselves. A harvester collects metadata records from many repositories and builds a combined index, which a discovery service then searches. The full text or data stays in the original repository; the metadata is what travels.

The six verbs

OAI-PMH defines exactly six request types, called verbs. Identify returns information about the repository itself. ListMetadataFormats lists the metadata formats the repository can supply. ListSets lists any "sets" — logical groupings used to selectively harvest part of a collection. The remaining three move records. ListIdentifiers returns just the headers (identifiers and datestamps) of records, useful for working out what to fetch. ListRecords returns full metadata records in bulk, supporting incremental and selective harvesting by date range or set. GetRecord retrieves a single record by its identifier. Together these six verbs are the entire vocabulary of the protocol.

Dublin Core as the common baseline

For interoperability, OAI-PMH mandates that every repository be able to deliver its metadata in unqualified (simple) Dublin Core. Dublin Core is a small, widely understood set of descriptive elements — title, creator, subject, date, and so on — and requiring it guarantees that any harvester can read any compliant repository, regardless of what richer formats that repository may also offer. Repositories are free to expose additional, more expressive metadata formats through ListMetadataFormats, but Dublin Core is the lowest common denominator that makes cross-repository harvesting possible in the first place.

Harvesting model and its place among protocols

The OAI-PMH model is one of aggregation: large discovery services harvest from many repositories and present a unified search. Aggregators such as BASE (Bielefeld Academic Search Engine), CORE, OpenAIRE, and the OpenDOAR directory rely on OAI-PMH to build their indexes of open-access content from repositories worldwide. OAI-PMH should not be confused with related standards. SWORD handles deposit (pushing content into repositories), the opposite direction to harvesting. ResourceSync, a later Open Archives Initiative standard, addresses synchronisation — keeping a copy continuously up to date — rather than periodic metadata harvesting. Although OAI-PMH dates from 2001 and newer approaches exist, its simplicity and ubiquity mean it remains widely supported and continues to underpin repository discovery today.

Key facts

At a glance

  • Full name: Open Archives Initiative Protocol for Metadata Harvesting
  • Released: 2001 — and still widely used today
  • Function: Harvesting metadata (not content) from repositories
  • Six verbs: Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, GetRecord
  • Mandatory format: Unqualified Dublin Core (richer formats optional)
  • Aggregators: BASE, CORE, OpenAIRE, OpenDOAR harvest via OAI-PMH

Common misconceptions

What people often get wrong

Often heard: OAI-PMH transfers the actual documents or datasets.

Actually: No — OAI-PMH harvests only metadata (descriptions of resources). The full text or data remains in the original repository; the metadata is what is collected so that resources can be discovered and then retrieved from their source.

Often heard: OAI-PMH and SWORD are alternatives for the same task.

Actually: No — they work in opposite directions. OAI-PMH pulls metadata out for aggregation (harvesting); SWORD pushes content in (deposit). A repository commonly supports both, each for its own purpose.

Often heard: OAI-PMH is obsolete because it dates from 2001.

Actually: No — despite its age, OAI-PMH remains widely implemented and continues to underpin major open-access aggregators such as BASE, CORE, and OpenAIRE. Its simplicity is precisely why it has endured.

LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →