bioRxiv API: A Developer's Guide to Metadata

Q: What is the rate limit for the bioRxiv API?

No official rate limit is published for api.biorxiv.org. In practice, pagination caps each call at 30 records for detail endpoints and 100 for publication and funder endpoints, and the community rbiorxiv client self-throttles to one request per second — a sensible default for any automated harvester.

Q: Is bioRxiv open access?

Yes. bioRxiv provides free and unrestricted access to every posted article, for both human readers and machine analysis via the API. This applies equally to medRxiv, and neither server charges a fee to read, download, or programmatically query preprint metadata.

Q: Who operates bioRxiv and medRxiv?

Since March 2025, both servers have been operated by openRxiv, an independent nonprofit spun out of Cold Spring Harbor Laboratory and backed by a $16 million grant from the Chan Zuckerberg Initiative. Its board includes CSHL President Bruce Stillman and medRxiv co-founder Harlan Krumholz.

The bioRxiv API is a free, unauthenticated REST interface at api.biorxiv.org that returns structured JSON or XML metadata — DOI, version number, posting date, subject category, licence, and author list — for any bioRxiv or medRxiv preprint, queryable by date range or by DOI, with no API key required. This guide sets out the endpoints, pagination rules, and field-level detail a developer needs to wire preprint metadata into a CRIS, discovery layer, or citation tool.

A preprint DOI in this ecosystem is a Digital Object Identifier issued under the 10.1101 prefix, registered with Crossref by Cold Spring Harbor Laboratory Press, and it resolves to a specific, versioned manuscript record — the same identifier the bioRxiv API uses as its primary lookup key.

What is the bioRxiv API?
Which endpoints return DOIs, versions, and subject categories?
How does the medRxiv API differ for integrators?
What are the rate limits and pagination rules?
Answer-first Q&A
Implications for CRIS and discovery-tool integrators

What is the bioRxiv API?

The bioRxiv API is a read-only HTTP interface, hosted at api.biorxiv.org, that exposes preprint metadata as JSON, XML (OAI-PMH), or HTML. It was built to support text and data mining, discovery-tool indexing, and institutional repository harvesting without scraping the public website. It requires no registration, no API key, and no OAuth flow — a plain HTTPS GET request is sufficient.

Because bioRxiv and medRxiv share the same underlying submission platform, the same API structure serves both servers; you select the server with a path segment (biorxiv or medrxiv) rather than a different base domain. This matters for CRIS and discovery-tool developers who need one integration pattern to cover both the life-sciences and health-sciences preprint corpora.

Which endpoints return DOIs, versions, and subject categories?

Metadata retrieval is split across five endpoint families. Each returns a defined JSON schema with a messages block (cursor position, total count) and a collection array of preprint records.

Endpoint	Purpose	Example
`/details/[server]/[DOI]/na/[format]`	Full metadata for one preprint by DOI, including every posted version	`api.biorxiv.org/details/biorxiv/10.1101/339747`
`/details/[server]/[start]/[end]/[cursor]`	Metadata for all preprints posted in a date range, paginated 30 per call	`api.biorxiv.org/details/biorxiv/2025-03-21/2025-03-28/0`
`/details/…?category=`	Filters the date-range endpoint by subject category (e.g. cell_biology)	`?category=cell_biology`
`/pubs/[server]/[DOI]/na/[format]`	Links a preprint DOI to its eventual published-journal DOI, once available	`api.biorxiv.org/pubs/medrxiv/10.1101/2021.04.29.21256344`
`/funder/[server]/[interval]/[ROR ID]/[cursor]`	Filters preprint metadata by funder, using a ROR identifier	`?ROR=00k4n6c32` (European Commission)

Each preprint record returned by the /details endpoint carries the following core fields:

doi — the versionless preprint DOI (prefix 10.1101)
version — an integer indicating which revision of the manuscript this record represents
date — the posting date of that specific version
category — the subject category assigned at submission (e.g. Bioinformatics, Genomics, Epidemiology)
title, authors, author_corresponding, abstract, license — standard bibliographic and rights fields

Because each version of a preprint is returned as a separate array entry under the same DOI, a CRIS integration must group records by doi and sort by version to reconstruct a manuscript’s full revision history — the API does not collapse versions for you.

How does the medRxiv API differ for integrators?

Structurally, the medRxiv API is not a separate product — it is the same api.biorxiv.org (or api.medrxiv.org, which mirrors the same routes) interface with medrxiv substituted as the server path segment. The field schema, pagination behaviour, and DOI prefix are identical.

The practical differences developers should code for are:

Subject categories differ in vocabulary: bioRxiv uses life-science categories (Cell Biology, Genomics, Neuroscience); medRxiv uses clinical and public-health categories (Cardiovascular Medicine, Infectious Diseases, Epidemiology).
medRxiv, co-founded in 2019 by Cold Spring Harbor Laboratory, Yale University, and BMJ, carries additional clinical-trial registration and conflict-of-interest declaration fields relevant to health-research governance that bioRxiv records omit.
medRxiv content volumes and posting cadence are lower than bioRxiv’s, so date-range polling for medRxiv can safely use wider intervals without hitting the 30-record-per-page ceiling as often.

What are the rate limits and pagination rules?

bioRxiv does not publish a formal published rate limit for the metadata API, but pagination is fixed: the /details family returns 30 records per call and the /pubs, /pub, /publisher, and /funder families return 100 records per call, advanced via the cursor parameter until the messages block reports no records remaining.

The community-maintained rbiorxiv R client — the top third-party wrapper indexed for this API — enforces a self-imposed one-second delay between paginated calls as good-citizen practice; developers building bulk harvesters for a CRIS or discovery index should adopt the same throttle even though it is not server-enforced.

For full-text or PDF-scale mining rather than metadata alone, bioRxiv and medRxiv separately publish bulk corpora via Amazon Web Services’ Open Data programme — a route the metadata API is not designed to serve and that sits outside the scope of this guide.

Answer-first Q&A

What is the rate limit for the bioRxiv API?

No official rate limit is published for api.biorxiv.org. In practice, pagination caps each call at 30 records for detail endpoints and 100 for publication and funder endpoints, and the community rbiorxiv client self-throttles to one request per second — a sensible default for any automated harvester.

Is bioRxiv open access?

Yes. bioRxiv provides free and unrestricted access to every posted article, for both human readers and machine analysis via the API. This applies equally to medRxiv, and neither server charges a fee to read, download, or programmatically query preprint metadata.

Is it okay to cite bioRxiv?

Yes. Every manuscript posted to bioRxiv or medRxiv receives a DOI under the 10.1101 Crossref prefix, making it a citable, versioned part of the scientific record. A correct biorxiv citation should reference the specific version number returned by the API, since the content of a DOI can change across revisions.

Who operates bioRxiv and medRxiv?

Since March 2025, both servers have been operated by openRxiv, an independent nonprofit spun out of Cold Spring Harbor Laboratory and backed by a $16 million grant from the Chan Zuckerberg Initiative. Its board includes CSHL President Bruce Stillman and medRxiv co-founder Harlan Krumholz — a governance change developers should note when citing the API’s institutional provenance.

Implications for CRIS and discovery-tool integrators

The March 2025 move to openRxiv governance is more than an institutional footnote for anyone building a research information system. openRxiv’s stated mandate is to expand — not just sustain — API access and machine-readable metadata as preprint volume grows, which means the endpoint contract described here should be treated as stable but not frozen; integrators should build a thin adapter layer rather than hard-coding field names.

For CRIS platforms harvesting outputs for institutional repositories, the /funder endpoint’s ROR-based filtering is the highest-value addition since the API’s original release: it lets an institution pull every preprint that declares a specific funder without post-hoc text matching. Combined with the /pubs endpoint’s preprint-to-published-DOI linking, a discovery layer can track a manuscript from first preprint version through to its eventual journal-of-record entry using DOIs alone.

Developers integrating author identity alongside this metadata should pair bioRxiv’s author_corresponding field with ORCID resolution rather than name-string matching, consistent with broader authorship attribution practice; teams building the CRIS side of this pipeline may also find it useful to cross-reference definitions in the research administration pillar and the CASRAI dictionary when mapping preprint metadata fields to internal schemas.

bioRxiv API: A Developer’s Guide to Metadata

Contents