Tag: biorxiv api

medRxiv API: Endpoints, Limits and Use Cases

The medRxiv API is a free, public REST interface — operated by openRxiv, the non-profit behind medRxiv and bioRxiv — that returns structured JSON or XML metadata for health-sciences preprints: titles, authors, abstracts, DOIs, dates, licences and subject categories. Developers query it by date range, by a fixed number of recent posts, or by a single DOI, with results paginated in batches of 100 via a cursor parameter. There is no API key requirement and no separate registration step.

A preprint, in this context, is a complete but not-yet-peer-reviewed manuscript; the medRxiv API is the machine-readable route to the metadata that describes it, distinct from the full-text PDF/XML mining pipeline hosted separately on Amazon S3.

What is the medRxiv API?
medRxiv API endpoints explained
Rate limits, formats and bulk access
Practical use cases for developers
Answer-first Q&A
Implications and what’s next

What is the medRxiv API?

The medRxiv API is a lightweight metadata service, not a search engine or a full-text repository. It shares its underlying infrastructure with the bioRxiv API — both are addressable via api.biorxiv.org or the medRxiv-branded mirror at api.medrxiv.org — because both preprint servers are co-managed by the same non-profit.

medRxiv and bioRxiv are operated by openRxiv, a non-profit founded by Cold Spring Harbor Laboratory (CSHL), Yale University and BMJ, and funded by contributions including the Chan Zuckerberg Initiative, Imperial College London and Stanford University. This governance detail matters for developers assessing service continuity: the API is not a commercial product with an SLA, but a grant- and institution-funded public good, published under the same operational umbrella that screens and posts the preprints themselves.

medRxiv API endpoints explained

There are two distinct endpoint families. Confusing them is the most common integration mistake developers make, since both accept the same server, interval and cursor parameters but return different content.

Endpoint	Purpose	Format
`api.medrxiv.org/details/[server]/[interval]/[cursor]/[format]`	Preprint metadata — title, authors, abstract, DOI, posting date, category, licence	json or xml
`api.medrxiv.org/details/[server]/[DOI]/na/[format]`	Single-manuscript lookup by DOI	json or xml
`api.biorxiv.org/pubs/[server]/[interval]/[cursor]`	Published-version linkage — which journal, when, and the published DOI	json

server takes the value medrxiv or biorxiv. interval accepts three forms: two YYYY-MM-DD dates separated by a slash; a plain number for the N most recent posts; or a number suffixed d for the most recent N days. cursor defaults to 0 and advances in steps of 100, matching the fixed page size medRxiv documents for both endpoint families.

A working example: https://api.medrxiv.org/details/medrxiv/2020-03-21/2020-03-24/45 returns up to 100 medRxiv records posted in that window, starting from the 45th result. Each response includes a messages array reporting the cursor position and total count, which developers should use to drive pagination rather than hard-coding offsets.

Rate limits, formats and bulk access

medRxiv does not publish a numeric requests-per-second quota for the metadata API. In practice, the service is engineered around cursor-based pagination capped at 100 records per call, which itself throttles realistic usage patterns without a documented rate-limit header. Developers building high-volume harvesters should paginate sequentially rather than firing parallel requests, since the API is a shared public resource funded by academic and philanthropic contributions, not a commercially provisioned endpoint.

For anything beyond metadata — full-text PDF and XML mining at scale — medRxiv explicitly separates that workload onto a dedicated Amazon S3 requester-pays bucket (s3://medrxiv-src-monthly), rather than serving it through the JSON API. This is a deliberate architectural boundary: the metadata API is for discovery and linkage, S3 is for bulk text and data mining (TDM), and mixing the two workloads against the wrong endpoint is the most common cause of developer friction.

Metadata endpoints return JSON (default) or XML in OAI-PMH format.
Bulk full-text files are delivered monthly as MECA-format zip packages containing manifest XML, full-text XML and PDF.
All preprints are permanently archived at Portico, independent of the live API.

Practical use cases for developers

The API’s realistic use cases cluster around discovery, linkage and bibliometrics rather than full-text analysis:

Systematic review screening — pulling all preprints in a date range and subject category to feed a title/abstract screening pipeline before formal database searches.
Publication-linkage tracking — using the /pubs/ endpoint to detect when a tracked preprint has since been formally published, and in which journal.
Institutional repository harvesting — research offices ingesting metadata for preprints affiliated with their institution to populate internal dashboards.
Bibliometric and disease-surveillance research — reproducing analyses of preprinting velocity by subject category, a pattern widely used during infectious-disease outbreaks.

A minimal Python request against the details endpoint needs no authentication:

import requests

url = "https://api.medrxiv.org/details/medrxiv/10d/0/json"
resp = requests.get(url, timeout=30)
data = resp.json()
print(data["messages"][0]["total"], "preprints in the last 10 days")

The R package medrxivr (rOpenSci) wraps the same endpoints for R users who prefer not to handle pagination and JSON parsing directly.

Answer-first Q&A

Does medRxiv have a public API?

Yes. medRxiv provides a free, unrestricted public API at api.medrxiv.org and api.biorxiv.org, returning JSON or XML preprint metadata by date range, recent-post count, or DOI. No API key or account is required, and the same infrastructure also serves bioRxiv metadata.

What is the difference between the bioRxiv and medRxiv APIs?

They are the same API distinguished only by a server parameter set to medrxiv or biorxiv. Both preprint servers are co-managed by openRxiv and share identical endpoint syntax, pagination and response schemas, though each server applies separate scope and screening policies to submissions.

How do I cite a medRxiv preprint retrieved via the API?

Cite the preprint by its DOI, exactly as medRxiv’s own guidance specifies: author names, year, title, and “medRxiv doi: 10.1101/…”. For a specific revision, append the version-specific URL, since each version keeps the same DOI but a distinct version suffix in its address.

Does medRxiv have an Impact Factor?

No. medRxiv’s own FAQ states plainly: “medRxiv is not a journal and so it has no Impact Factor.” Metadata pulled via the API includes posting dates, categories and licences, but never a journal-style Impact Factor field, because none exists for preprints on the server.

Implications and what’s next

For institutions and research administration offices, the practical implication is that medRxiv metadata is genuinely free to harvest at scale for compliance dashboards, preprint-to-publication tracking, and REF-style output monitoring — no licensing negotiation is required, unlike many commercial bibliographic APIs. The trade-off is that developers must build their own resilience: there is no documented rate-limit contract, no SLA, and no formal versioning notice channel beyond the help pages themselves.

The clearest forward risk is architectural drift rather than access restriction: because the API is maintained by a philanthropically funded non-profit rather than a commercial vendor, integrations should be built defensively — using the documented cursor pagination rather than assumptions about page size, and routing bulk text-mining workloads to the S3 bucket rather than the JSON endpoints, which is where undocumented load is most likely to cause friction for openRxiv’s infrastructure over time.

July 3, 2026

bioRxiv Alerts: Email, RSS or API Options

bioRxiv alerts let researchers and developers track newly posted preprints in a chosen subject area without manually rechecking the site — the three core options are subject-category email alerts, per-category RSS/Atom feeds, and the public bioRxiv API, each suited to a different workflow. bioRxiv is the preprint server for biology operated by openRxiv, a nonprofit dedicated to advancing scientific communication, and it exposes the same underlying content through all three channels plus social feeds on Bluesky, Mastodon and X.

This guide compares the four practical ways to follow new bioRxiv postings — email alerts, RSS/Atom feeds, the REST API, and social feeds — so you can pick the right combination for a literature-monitoring workflow, a lab dashboard, or an automated pipeline.

What are bioRxiv’s alert options?
How do bioRxiv email alerts work?
How do bioRxiv RSS feeds work?
What can the bioRxiv API do that alerts can’t?
Should you follow bioRxiv on Bluesky, Mastodon or X?
Which option should you choose?
Common questions about bioRxiv alerts

What are bioRxiv’s alert options?

bioRxiv is a preprint server for the biological sciences; a preprint is a complete scientific manuscript posted online before, or without, formal peer review. Because thousands of preprints are posted every week across dozens of subject categories, bioRxiv publishes the same feed of new content through four distinct channels rather than a single notification system.

Each channel trades off timeliness, filtering precision and technical effort differently. Email alerts and RSS feeds are built for passive monitoring by individual researchers; the API is built for developers who need structured metadata inside another tool; social feeds suit anyone already working inside those platforms.

How do bioRxiv email alerts work?

Email alerts are the lowest-effort option for an individual researcher who wants a periodic digest. You sign up on the bioRxiv Alerts page, select one or more of bioRxiv’s roughly 30 subject categories — from Bioinformatics to Zoology — and bioRxiv emails you when matching preprints are posted.

Alerts can be scoped to a subject category, a keyword search, or a specific author.
You can add or remove subject-area alerts at any time from the same sign-up page, without deleting your account.
No bioRxiv account or login is required simply to receive category alerts — the sign-up form only asks for an email address.

This makes email alerts the right default for anyone who wants new preprints in their inbox without building or maintaining anything.

How do bioRxiv RSS feeds work?

bioRxiv’s Alerts/RSS page publishes an Atom 1.0 feed for each subject category, plus a combined feed across all categories. Each feed returns only the most recent 30 posts for that category — a hard limit set by bioRxiv, not a filter you can extend — so an RSS reader that checks infrequently can silently miss older items once more than 30 new preprints accumulate.

Feeds can be combined by chaining subject categories with a plus sign in the URL, and multi-word category names use an underscore in place of a space. For example, a feed combining Genomics and Bioinformatics takes the form:

http://connect.biorxiv.org/biorxiv_xml.php?subject=genomics+bioinformatics

This lets a single feed reader subscription cover several adjacent subject areas — useful for interdisciplinary groups — without needing separate subscriptions per category.

What can the bioRxiv API do that alerts and RSS can’t?

The bioRxiv API is a pull-based REST interface returning structured JSON metadata — DOI, title, authors, category, posting date and abstract — for preprints on bioRxiv and medRxiv. Unlike email alerts or RSS, it has no built-in subject-category filter parameter and no push/webhook mechanism: a developer must query by date interval or DOI and filter the returned category field client-side.

That distinction matters for anyone building automated tooling:

The API suits scheduled polling jobs, institutional repository harvesters, and research-tool dashboards that need structured metadata, not just a headline and link.
RSS and email alerts remain the simpler choice for a single researcher who only wants to read new titles as they appear.
Because the API is pull-based, any “alert” built on top of it requires you to run your own polling schedule and de-duplication logic.

Detailed field definitions and endpoint syntax are published in bioRxiv’s own API documentation, which developers should consult directly before building a production integration.

bioRxiv also mirrors new postings to social platforms, and this is where the biggest recent change sits — one that generic alert guides tend to miss. Beyond the long-standing X/Twitter account (@biorxivpreprint, over 140,000 followers, plus a dedicated account per subject category), bioRxiv now runs an equivalent set of per-category streams on Bluesky (e.g. biorxiv-bioinfo.bsky.social) and Mastodon (e.g. biorxiv_bioinfo on biologists.social).

This matters because X restricted free API access in 2023, which reduced the reliability of X-based bots and dashboards that many labs had built to watch subject feeds. Bluesky and Mastodon’s open, API-friendly protocols make them a more dependable base for anyone building a custom preprint-monitoring bot today, rather than a nice-to-have alternative.

Which option should you choose?

The right channel depends on how much filtering precision you need and how much technical effort you are willing to invest.

Channel	Best for	Filtering	Setup effort	Key limitation
Email alerts	Individual researchers wanting a digest	Subject, keyword, author	None (email only)	No login needed, but digest cadence isn’t real time
RSS/Atom feed	Feed-reader users, interdisciplinary groups	Subject category, combinable	Low (add feed URL)	Capped at the most recent 30 posts per category
REST API	Developers, institutional tools, dashboards	None built-in; filter client-side	High (build a polling job)	Pull-based only, no webhook/push
Bluesky/Mastodon/X	Social monitoring, bot-building	Per subject-category account	Low–Medium	X reach reduced since 2023 API restrictions

For most individual researchers, subject-category email alerts remain the simplest reliable option. Developers building institutional or lab-wide monitoring tools should combine the API for structured metadata with RSS as a lightweight fallback.

Common questions about bioRxiv alerts

Why are my bioRxiv email alerts not working?

Missed bioRxiv alerts are usually caused by an out-of-date subject-category selection, an alert email landing in a spam or promotions folder, or an expired confirmation link. Re-visiting the bioRxiv Alerts page and re-confirming your chosen categories resolves most cases.

No account or login is required for basic email alerts — only an email address. A bioRxiv account is only needed for actions like submitting a manuscript, posting a comment, or managing an author profile, not for receiving subject-area notifications.

Does bioRxiv have a public API for developers?

Yes. bioRxiv publishes a public REST API returning JSON metadata — including DOI, title, category and abstract — for content on bioRxiv and medRxiv. It is pull-based, so developers must schedule their own queries rather than receive push notifications.

Should I track bioRxiv or arXiv for my subject area?

Choose based on discipline, not preference: bioRxiv covers biology-specific subject categories, while arXiv covers physics, mathematics, computer science and quantitative biology. Researchers working across both fields — for example in computational biology — often need alerts from both servers rather than treating them as interchangeable.

What this means for research-monitoring workflows

Preprint volume keeps growing across biology subject categories, and no single channel covers every use case. A researcher who only needs a daily digest is well served by email alerts; a developer building a literature-surveillance tool for an institution needs the API’s structured metadata and should plan for its pull-based, polling architecture from the outset. Teams that previously relied solely on X-based bots should treat the 2023 API restrictions as a prompt to add Bluesky or Mastodon, or the official RSS feed, as a more durable foundation.

Research administrators supporting open-scholarship workflows can pair these tracking methods with broader terminology in the CASRAI Dictionary when documenting how preprints fit into an institution’s research-administration processes.

July 3, 2026

Research Square vs bioRxiv: Ownership & Fees

Research Square vs bioRxiv is, at its core, a nonprofit-versus-commercial question: Research Square is a preprint platform owned by the for-profit publisher Springer Nature, while bioRxiv and medRxiv are nonprofit servers now governed by openRxiv, an independent 501(c)(3) that took over from Cold Spring Harbor Laboratory (CSHL) in March 2025. Both are free for authors to use, but the ownership structure behind each one shapes fees, licensing control, data governance and long-term archival continuity in ways that matter for anyone advising authors on where to post.

A preprint server is an online platform where researchers deposit manuscripts before, or independently of, formal peer review. Research Square, bioRxiv and medRxiv are three of the most widely used servers in the life, health and biomedical sciences, and authors are increasingly asked to choose between them without understanding what sits behind each brand.

What is the core difference between Research Square and bioRxiv?
Who pays, and how is each platform funded?
Who owns and controls author data?
What long-term archival guarantees does each model offer?
Common questions about Research Square and bioRxiv
What this means for authors and research administrators

What Is the Core Difference Between Research Square and bioRxiv?

The core difference is legal ownership and mission accountability, not scope or screening rigour. Research Square traces to American Journal Experts (AJE); Springer Nature took a minority stake in the Research Square platform in 2018, became majority owner in 2020, and completed full acquisition of Research Square Company in 2022. It is, today, a wholly commercial subsidiary of a for-profit publishing group.

bioRxiv was founded in 2013 by John Inglis and Richard Sever at CSHL, a nonprofit research institution. medRxiv followed in 2019 as a partnership between CSHL, Yale University and BMJ. In March 2025, governance of both servers passed from CSHL to openRxiv, a newly formed independent nonprofit whose stated mission is “creating opportunities for sharing, discovering, and advancing preprints in the life and health sciences” — with a dedicated board and a Scientific and Medical Advisory Board of researchers overseeing policy.

Feature	Research Square	bioRxiv / medRxiv (via openRxiv)
Governing entity	Springer Nature (for-profit publisher)	openRxiv (independent nonprofit, 501(c)(3))
Platform launched	2016, under Research Square Company	bioRxiv 2013; medRxiv 2019
Ownership shift	Minority stake 2018 → majority 2020 → full acquisition 2022	Transitioned from CSHL to independent nonprofit, March 2025
Author posting fee	Free	Free
Sustainability model	Cross-subsidised by Springer Nature publishing and AJE author-services revenue	Philanthropic and institutional grants (Chan Zuckerberg Initiative, Sergey Brin Family Foundation, Robert Lourie Foundation, partner universities)
Default licence	CC-BY 4.0 required for all preprints	Author’s choice: CC0, CC-BY, CC-BY-NC, CC-BY-ND, CC-BY-NC-ND, or no reuse without permission
Journal integration	In Review, tied to 1,000+ participating journals	No equivalent journal-submission integration
Bulk text-and-data-mining access	No published bulk TDM programme; access via Crossref metadata and the site	Monthly XML/PDF corpus via a requester-pays AWS S3 bucket, plus a public metadata API
Long-term preservation	Portico	Portico

Who Pays, and How Is Each Platform Funded?

Neither model charges authors to post a preprint — that much is identical. What differs is where the money to run the platform comes from, and what that implies about future incentives. Research Square’s operating costs are absorbed by Springer Nature’s commercial publishing business and by AJE’s paid author-services division (editing, translation and related products), which Research Square continues to cross-sell alongside free preprint posting.

openRxiv, by contrast, depends on renewable philanthropic and institutional grants rather than a parent company’s revenue. Its principal funders include the Chan Zuckerberg Initiative, the Sergey Brin Family Foundation, the Robert Lourie Foundation and a consortium of supporting universities including Caltech, MIT, Stanford, Yale and the University of Washington. That is a genuine trade-off, not a straightforward win for either side:

Research Square’s commercial backing gives it predictable, revenue-linked funding, but ties its long-term direction to Springer Nature’s corporate strategy.
openRxiv’s nonprofit funding is mission-locked by governance structure, but depends on grant renewal cycles rather than a guaranteed revenue stream.

Who Owns and Controls Author Data?

Ownership of the underlying manuscript stays with authors on both platforms — this is not a copyright grab by either side. The meaningful difference is licensing control and third-party data access. Research Square requires every posted preprint to carry a CC-BY 4.0 licence, which is the most permissive open licence and maximises reuse rights for readers, but leaves authors no choice in the matter.

bioRxiv and medRxiv give authors a menu of licence options — CC0, CC-BY, CC-BY-NC, CC-BY-ND, CC-BY-NC-ND, or a “no reuse without permission” setting — and authors can change the licence on an existing preprint after posting. That is more author control, though funders that mandate CC-BY (a growing norm, including under several cOAlition S-aligned policies) require authors to actively select it rather than receiving it by default.

The two models also diverge sharply on bulk data access. openRxiv publishes a full monthly XML/PDF text-and-data-mining corpus through a requester-pays AWS S3 bucket, alongside a public metadata API — an open-infrastructure commitment consistent with nonprofit, grant-funded governance. Research Square does not publish an equivalent bulk TDM feed; third-party discovery of Research Square content runs through Crossref DOI metadata and the platform’s own search interface rather than a dedicated open corpus.

What Long-Term Archival Guarantees Does Each Model Offer?

Both platforms use the same third-party preservation service: Portico provides perpetual-access archiving for preprints posted to Research Square, bioRxiv and medRxiv alike, so the archive itself is not where the two models diverge.

The real difference is organisational continuity risk. A commercial platform’s archival commitments are ultimately corporate policy that could change with ownership or strategy; a nonprofit platform’s commitments are set by a mission-bound board, though it carries the separate risk of grant-funding renewal. Advising authors on a multi-decade preprint record means treating “who governs the archive” as distinct from “where is the archive stored.”

Common Questions About Research Square and bioRxiv

Is bioRxiv reputable?

Yes. bioRxiv is widely cited across molecular and cell biology, screens submissions for plagiarism and non-scientific content, and is now governed by openRxiv, an independent nonprofit with a Scientific and Medical Advisory Board. Its reputation rests on community adoption and transparent, nonprofit governance rather than commercial incentives.

Does bioRxiv count as published?

No. A bioRxiv or medRxiv preprint is not peer-reviewed and does not constitute formal publication. The ICMJE treats preprints as legitimate scholarly communication, not duplicate publication, but funders and REF-style assessment exercises generally still require the peer-reviewed version for compliance credit.

Is bioRxiv a preprint?

bioRxiv is not itself a preprint — it is the server that hosts preprints. A preprint is the individual manuscript version posted before or independent of peer review; bioRxiv is the nonprofit infrastructure, now under openRxiv, that makes that posting possible for life-science research.

What are the alternatives to bioRxiv?

Alternatives include medRxiv for clinical and public-health research, Research Square for multidisciplinary and journal-integrated posting, and repository-style options such as arXiv, the Open Science Framework, Figshare and Zenodo. The right choice depends on discipline, human-subjects status and whether journal-integrated posting matters.

What This Means for Authors and Research Administrators

For most authors, the nonprofit-versus-commercial distinction will not change whether posting is free — it usually is, on both models. It should change how administrators frame the advice they give:

Explain that Research Square’s mandatory CC-BY licence maximises reuse but removes licensing choice, while bioRxiv/medRxiv give authors more control over which licence applies.
Flag that researchers doing large-scale corpus analysis will find far richer bulk access through openRxiv’s TDM feeds than through Research Square.
Note that archival preservation (Portico) is equivalent across models — the open question is who controls future platform policy, not the archive.
Treat commercial ownership as a disclosure point, not a disqualifier: Springer Nature’s backing gives Research Square’s In Review workflow journal-integration value a nonprofit model does not replicate.

As more research administration offices build formal preprint guidance into their researcher-facing documentation, the originating business model behind a server deserves the same disclosure as its discipline coverage or screening depth. Authors are entitled to know not just where their manuscript will sit, but who ultimately governs the platform holding it — a nonprofit board answerable to a research mission, or a commercial parent answerable to shareholders.

Last updated: 3 July 2026.

July 3, 2026

bioRxiv API: A Developer’s Guide to Metadata

The bioRxiv API is a free, unauthenticated REST interface at api.biorxiv.org that returns structured JSON or XML metadata — DOI, version number, posting date, subject category, licence, and author list — for any bioRxiv or medRxiv preprint, queryable by date range or by DOI, with no API key required. This guide sets out the endpoints, pagination rules, and field-level detail a developer needs to wire preprint metadata into a CRIS, discovery layer, or citation tool.

A preprint DOI in this ecosystem is a Digital Object Identifier issued under the 10.1101 prefix, registered with Crossref by Cold Spring Harbor Laboratory Press, and it resolves to a specific, versioned manuscript record — the same identifier the bioRxiv API uses as its primary lookup key.

What is the bioRxiv API?
Which endpoints return DOIs, versions, and subject categories?
How does the medRxiv API differ for integrators?
What are the rate limits and pagination rules?
Answer-first Q&A
Implications for CRIS and discovery-tool integrators

What is the bioRxiv API?

The bioRxiv API is a read-only HTTP interface, hosted at api.biorxiv.org, that exposes preprint metadata as JSON, XML (OAI-PMH), or HTML. It was built to support text and data mining, discovery-tool indexing, and institutional repository harvesting without scraping the public website. It requires no registration, no API key, and no OAuth flow — a plain HTTPS GET request is sufficient.

Because bioRxiv and medRxiv share the same underlying submission platform, the same API structure serves both servers; you select the server with a path segment (biorxiv or medrxiv) rather than a different base domain. This matters for CRIS and discovery-tool developers who need one integration pattern to cover both the life-sciences and health-sciences preprint corpora.

Which endpoints return DOIs, versions, and subject categories?

Metadata retrieval is split across five endpoint families. Each returns a defined JSON schema with a messages block (cursor position, total count) and a collection array of preprint records.

Endpoint	Purpose	Example
`/details/[server]/[DOI]/na/[format]`	Full metadata for one preprint by DOI, including every posted version	`api.biorxiv.org/details/biorxiv/10.1101/339747`
`/details/[server]/[start]/[end]/[cursor]`	Metadata for all preprints posted in a date range, paginated 30 per call	`api.biorxiv.org/details/biorxiv/2025-03-21/2025-03-28/0`
`/details/…?category=`	Filters the date-range endpoint by subject category (e.g. cell_biology)	`?category=cell_biology`
`/pubs/[server]/[DOI]/na/[format]`	Links a preprint DOI to its eventual published-journal DOI, once available	`api.biorxiv.org/pubs/medrxiv/10.1101/2021.04.29.21256344`
`/funder/[server]/[interval]/[ROR ID]/[cursor]`	Filters preprint metadata by funder, using a ROR identifier	`?ROR=00k4n6c32` (European Commission)

Each preprint record returned by the /details endpoint carries the following core fields:

doi — the versionless preprint DOI (prefix 10.1101)
version — an integer indicating which revision of the manuscript this record represents
date — the posting date of that specific version
category — the subject category assigned at submission (e.g. Bioinformatics, Genomics, Epidemiology)
title, authors, author_corresponding, abstract, license — standard bibliographic and rights fields

Because each version of a preprint is returned as a separate array entry under the same DOI, a CRIS integration must group records by doi and sort by version to reconstruct a manuscript’s full revision history — the API does not collapse versions for you.

How does the medRxiv API differ for integrators?

Structurally, the medRxiv API is not a separate product — it is the same api.biorxiv.org (or api.medrxiv.org, which mirrors the same routes) interface with medrxiv substituted as the server path segment. The field schema, pagination behaviour, and DOI prefix are identical.

The practical differences developers should code for are:

Subject categories differ in vocabulary: bioRxiv uses life-science categories (Cell Biology, Genomics, Neuroscience); medRxiv uses clinical and public-health categories (Cardiovascular Medicine, Infectious Diseases, Epidemiology).
medRxiv, co-founded in 2019 by Cold Spring Harbor Laboratory, Yale University, and BMJ, carries additional clinical-trial registration and conflict-of-interest declaration fields relevant to health-research governance that bioRxiv records omit.
medRxiv content volumes and posting cadence are lower than bioRxiv’s, so date-range polling for medRxiv can safely use wider intervals without hitting the 30-record-per-page ceiling as often.

What are the rate limits and pagination rules?

bioRxiv does not publish a formal published rate limit for the metadata API, but pagination is fixed: the /details family returns 30 records per call and the /pubs, /pub, /publisher, and /funder families return 100 records per call, advanced via the cursor parameter until the messages block reports no records remaining.

The community-maintained rbiorxiv R client — the top third-party wrapper indexed for this API — enforces a self-imposed one-second delay between paginated calls as good-citizen practice; developers building bulk harvesters for a CRIS or discovery index should adopt the same throttle even though it is not server-enforced.

For full-text or PDF-scale mining rather than metadata alone, bioRxiv and medRxiv separately publish bulk corpora via Amazon Web Services’ Open Data programme — a route the metadata API is not designed to serve and that sits outside the scope of this guide.

Answer-first Q&A

What is the rate limit for the bioRxiv API?

No official rate limit is published for api.biorxiv.org. In practice, pagination caps each call at 30 records for detail endpoints and 100 for publication and funder endpoints, and the community rbiorxiv client self-throttles to one request per second — a sensible default for any automated harvester.

Is bioRxiv open access?

Yes. bioRxiv provides free and unrestricted access to every posted article, for both human readers and machine analysis via the API. This applies equally to medRxiv, and neither server charges a fee to read, download, or programmatically query preprint metadata.

Is it okay to cite bioRxiv?

Yes. Every manuscript posted to bioRxiv or medRxiv receives a DOI under the 10.1101 Crossref prefix, making it a citable, versioned part of the scientific record. A correct biorxiv citation should reference the specific version number returned by the API, since the content of a DOI can change across revisions.

Who operates bioRxiv and medRxiv?

Since March 2025, both servers have been operated by openRxiv, an independent nonprofit spun out of Cold Spring Harbor Laboratory and backed by a $16 million grant from the Chan Zuckerberg Initiative. Its board includes CSHL President Bruce Stillman and medRxiv co-founder Harlan Krumholz — a governance change developers should note when citing the API’s institutional provenance.

Implications for CRIS and discovery-tool integrators

The March 2025 move to openRxiv governance is more than an institutional footnote for anyone building a research information system. openRxiv’s stated mandate is to expand — not just sustain — API access and machine-readable metadata as preprint volume grows, which means the endpoint contract described here should be treated as stable but not frozen; integrators should build a thin adapter layer rather than hard-coding field names.

For CRIS platforms harvesting outputs for institutional repositories, the /funder endpoint’s ROR-based filtering is the highest-value addition since the API’s original release: it lets an institution pull every preprint that declares a specific funder without post-hoc text matching. Combined with the /pubs endpoint’s preprint-to-published-DOI linking, a discovery layer can track a manuscript from first preprint version through to its eventual journal-of-record entry using DOIs alone.

Developers integrating author identity alongside this metadata should pair bioRxiv’s author_corresponding field with ORCID resolution rather than name-string matching, consistent with broader authorship attribution practice; teams building the CRIS side of this pipeline may also find it useful to cross-reference definitions in the research administration pillar and the CASRAI dictionary when mapping preprint metadata fields to internal schemas.

July 3, 2026

openRxiv Explained: Why bioRxiv and medRxiv Went Independent

openRxiv is the independent, researcher-led nonprofit that has run bioRxiv and medRxiv since March 2025, replacing Cold Spring Harbor Laboratory’s institutional stewardship with a six-member board, diversified funding, and a mandate to keep both preprint servers free to read and free to post. The spin-off was designed to insulate two of biomedicine’s most-used pieces of open-research infrastructure from dependence on any single institution or funder — a governance question every standards body and infrastructure provider eventually has to answer.

openRxiv is the independent nonprofit, launched on 11 March 2025, that now stewards the bioRxiv and medRxiv preprint servers on behalf of the global research community, rather than as a programme of a single host institution.

What is openRxiv, and what does it actually run?
Why did bioRxiv and medRxiv leave Cold Spring Harbor Laboratory?
Who governs openRxiv, and who pays for it?
What is openRxiv Labs, and what launched in June 2026?
Answer-first questions people are asking about openRxiv
What the openRxiv spin-off means for research-infrastructure stewardship

What is openRxiv, and what does it actually run?

openRxiv is the organisational and legal home of two preprint servers: bioRxiv, covering life sciences, and medRxiv, covering health and clinical research. Neither server changed its submission process, screening policy, or URL when the transition happened — researchers post to biorxiv.org and medrxiv.org exactly as before.

What changed is who is accountable for the platforms’ survival. bioRxiv was founded in 2013 at Cold Spring Harbor Laboratory (CSHL); medRxiv followed in 2019 as a joint initiative between CSHL, Yale University, and BMJ. Both grew into the dominant preprint venues for biomedicine, and by 2025 that success had outgrown the administrative capacity of a single laboratory to sustain indefinitely.

Why did bioRxiv and medRxiv leave Cold Spring Harbor Laboratory?

CSHL’s own account of the move calls it a “natural evolution,” not a rupture. Bruce Stillman, CSHL’s President and CEO, joined openRxiv’s board rather than severing ties, and co-founders John Inglis and Richard Sever moved with the platforms into the new entity.

The stated rationale centres on three risks that concentrated stewardship inside one institution:

Sustainability risk — a single laboratory’s budget cycle is not designed to guarantee decades of continuity for global research infrastructure.
Governance risk — decisions about screening policy, features, and funding priorities benefited from a board drawn from outside CSHL alone.
Funder-concentration risk — the platforms needed a structure that could accept diversified funding without any one funder gaining outsized influence.

openRxiv formally launched as an independent nonprofit on 11 March 2025, with the Chan Zuckerberg Initiative (CZI) providing three years of seed funding for the transition, according to openRxiv’s own governance Q&A published that May. In October 2025, arXiv — the physics, mathematics, and computer science preprint server run by Cornell University — joined openRxiv in submitting a joint response to a National Institutes of Health Request for Information on preprints, signalling a wider coalition forming around shared preprint-infrastructure interests, though arXiv itself remains a separate service.

Who governs openRxiv, and who pays for it?

openRxiv is governed by a six-member board of directors: Scott Fraser (University of Southern California and the CZI Imaging Institute), Edith Heard (Francis Crick Institute), Jeff Huber (Triatomic Capital), Harlan Krumholz (Yale School of Medicine; medRxiv co-founder), Bruce Stillman (CSHL), and Shirley Tilghman (Princeton University). A separate Scientific and Medical Advisory Board, chaired by John Inglis with medRxiv co-founder Theo Bloom as deputy, advises on content policy.

The funding question is where most scrutiny has landed, given CZI’s long involvement with both servers before the spin-off:

Question	openRxiv’s public answer (governance Q&A, May 2025)
How long has CZI funded the servers?	Eight years for bioRxiv, four years for medRxiv, plus three years of dedicated seed funding for the openRxiv transition itself.
Does CZI have editorial or operational control?	No. openRxiv states funding agreements carry no stipulations affecting editorial or operational independence.
How much board influence does CZI hold?	One of six directors (Scott Fraser) has a CZI affiliation; the board is not CZI-appointed as a bloc.
Is openRxiv against traditional peer review?	No — openRxiv reports roughly 75% of bioRxiv and medRxiv preprints go on to formal peer-reviewed publication, with direct-submission links to 350 journals.

openRxiv itself frames the governance model as a direct answer to funder-concentration concerns: the organisation states its mission is to be “governed by and for the research community, not a single funder, founder, or any one stakeholder.” Whether a philanthropic vehicle tied to a single tech-sector family remains structurally sufficient as the largest funder of a nonprofit intended to resist single-funder capture is a debate that predates this specific spin-off and will likely recur as openRxiv pursues its stated goal of diversifying revenue further.

What is openRxiv Labs, and what launched in June 2026?

openRxiv Labs launched on 1 June 2026 as a structured experimentation programme sitting on top of the core bioRxiv and medRxiv infrastructure. Rather than running many small tests at once, openRxiv committed to a small number of larger, hypothesis-driven pilots with predefined success metrics and durations, publishing results — including failures — openly on a dedicated Labs blog.

The first Labs pilot, built with the platform Curvenote, tests an interactive preprint-reading interface layered onto openRxiv’s existing corpus of preprints, figures, and metadata. openRxiv named a broad partner list for the programme, including CZI, CSHL, the Sergey Brin Family Foundation, Caltech, CNRS, Fred Hutchinson Cancer Center, Imperial College London, MIT, Stanford, the University of Washington, and Vrije Universiteit Amsterdam — underscoring that the funder-diversification effort begun at launch has continued into 2026 rather than stalling after the initial CZI seed grant.

Answer-first questions people are asking about openRxiv

Who is the CEO of openRxiv?

Dr Tracy Teal is openRxiv’s first Chief Executive Officer, appointed on 18 August 2025 after serving as interim COO since the March 2025 launch. She previously led The Carpentries and Dryad, two established open-research infrastructure nonprofits, giving her direct prior experience running community-governed scientific platforms.

Who owns medRxiv?

No single institution “owns” medRxiv today. It was founded in 2019 by Cold Spring Harbor Laboratory, Yale University, and BMJ, but operational and governance responsibility now sits with openRxiv, the independent nonprofit created specifically to steward it and bioRxiv without institutional or single-funder control.

Is medRxiv a credible source?

medRxiv preprints are screened but not peer-reviewed, so they should be cited with that caveat clearly stated. openRxiv reports around 75% of postings eventually complete formal peer review; until then, findings represent unverified claims from qualified researchers, useful for rapid awareness but not equivalent to a published, peer-reviewed article.

What is openRxiv, in one line?

openRxiv is the independent 501(c) nonprofit, launched 11 March 2025, that operates bioRxiv and medRxiv under a six-member board and a diversified-funding mandate, replacing their prior status as programmes hosted by Cold Spring Harbor Laboratory.

What the openRxiv spin-off means for research-infrastructure stewardship

The openRxiv case is a useful reference point for any organisation weighing how to govern shared research infrastructure once it outgrows its founding institution. The pattern — an originating body incubates a tool, the tool becomes essential community infrastructure, and stewardship then transfers to an independent, multi-stakeholder body — is not unique to preprints.

CASRAI originated the CRediT contributor role taxonomy in 2014. The standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. That is the same “originator, not owner” pattern openRxiv is now navigating in public: CSHL originated bioRxiv and medRxiv, and stewardship has since passed to a body structured explicitly to prevent any one funder, founder, or institution from controlling research infrastructure the whole field depends on.

For research administrators and institutional leaders, the practical takeaway is to watch governance structure, not just funding source, when assessing an infrastructure provider’s long-term reliability. A named, multi-institutional board; published funding-independence commitments; and open reporting of pilot outcomes (as with openRxiv Labs) are the concrete signals worth checking — independent of who wrote the first cheque.

July 3, 2026

Tag: biorxiv api

medRxiv API: Endpoints, Limits and Use Cases

What is the medRxiv API?

medRxiv API endpoints explained

Rate limits, formats and bulk access

Practical use cases for developers

Answer-first Q&A

Does medRxiv have a public API?

What is the difference between the bioRxiv and medRxiv APIs?

How do I cite a medRxiv preprint retrieved via the API?

Does medRxiv have an Impact Factor?

Implications and what’s next

bioRxiv Alerts: Email, RSS or API Options

What are bioRxiv’s alert options?

How do bioRxiv email alerts work?

How do bioRxiv RSS feeds work?

What can the bioRxiv API do that alerts and RSS can’t?

Should you follow bioRxiv on Bluesky, Mastodon or X?

Which option should you choose?

Common questions about bioRxiv alerts

Why are my bioRxiv email alerts not working?

Do I need a bioRxiv account or login to set up alerts?

Does bioRxiv have a public API for developers?

Should I track bioRxiv or arXiv for my subject area?

What this means for research-monitoring workflows

Research Square vs bioRxiv: Ownership & Fees

What Is the Core Difference Between Research Square and bioRxiv?

Who Pays, and How Is Each Platform Funded?

Who Owns and Controls Author Data?

What Long-Term Archival Guarantees Does Each Model Offer?

Common Questions About Research Square and bioRxiv

Is bioRxiv reputable?

Does bioRxiv count as published?

Is bioRxiv a preprint?

What are the alternatives to bioRxiv?

What This Means for Authors and Research Administrators

bioRxiv API: A Developer’s Guide to Metadata

Contents

What is the bioRxiv API?

Which endpoints return DOIs, versions, and subject categories?

How does the medRxiv API differ for integrators?

What are the rate limits and pagination rules?

Answer-first Q&A

What is the rate limit for the bioRxiv API?

Is bioRxiv open access?

Is it okay to cite bioRxiv?

Who operates bioRxiv and medRxiv?

Implications for CRIS and discovery-tool integrators

openRxiv Explained: Why bioRxiv and medRxiv Went Independent

What is openRxiv, and what does it actually run?

Why did bioRxiv and medRxiv leave Cold Spring Harbor Laboratory?

Who governs openRxiv, and who pays for it?

What is openRxiv Labs, and what launched in June 2026?

Answer-first questions people are asking about openRxiv

Who is the CEO of openRxiv?

Who owns medRxiv?

Is medRxiv a credible source?

What is openRxiv, in one line?

What the openRxiv spin-off means for research-infrastructure stewardship