Tag: openrxiv

BioRxiv PubMed Indexing: How the NIH Pilot Works

BioRxiv PubMed indexing is not automatic. Preprints reach PubMed through a single federal mechanism — the NIH Preprint Pilot, run by the U.S. National Library of Medicine (NLM) — which pulls in preprints that acknowledge direct NIH funding or carry an NIH-affiliated author, provided they were posted from 1 January 2023 onward under the pilot’s current phase.

The NIH Preprint Pilot is an NLM programme that makes NIH-funded preprints from eligible servers — bioRxiv, medRxiv, arXiv, and Research Square — discoverable through PubMed Central (PMC) and PubMed ahead of formal peer review, with a corresponding citation added on a weekly cycle.

What is the NIH Preprint Pilot?
How a preprint moves from bioRxiv to PubMed
Which preprint servers qualify
What this means for discoverability, DOIs, and citation
Answer-first questions about bioRxiv and PubMed
Why other funders are watching the pilot

What is the NIH Preprint Pilot?

The NIH Preprint Pilot began in June 2020 as a narrow, COVID-19-only initiative. NLM made more than 3,300 preprints reporting NIH-supported SARS-CoV-2 research discoverable in PMC and PubMed between June 2020 and June 2022, testing whether preprint records could accelerate discovery during a public-health emergency.

Phase 2 launched on 30 January 2023 and dropped the COVID-only restriction. It now covers any preprint that acknowledges direct NIH support and/or lists an NIH-affiliated author, posted to an eligible server on or after 1 January 2023. Eligible preprints are added to PMC on a weekly basis and receive a corresponding PubMed citation automatically — authors do not submit anything separately.

How a preprint moves from bioRxiv to PubMed

The pipeline is largely invisible to authors and runs on a fixed weekly cadence. NLM does not wait for a submission; it identifies eligible content and pulls it in automatically, then layers PubMed on top of the PMC record.

Identification: NLM text-mines new bioRxiv and medRxiv postings for NIH-support acknowledgements and cross-checks the NIH Office of Portfolio Analysis tool for NIH-affiliated authors.
PMC ingestion: Citation and abstract metadata are pulled from the preprint server’s machine-readable feed to build an “article header” record, and a PMCID is assigned immediately to enable rapid discovery.
PubMed record creation: Once the PMC record exists, NLM generates the corresponding PubMed citation the same week, tagged with publication type “Preprint.”
Full-text conversion: Preprints posted under a Creative Commons licence enter a separate workflow to produce archival full-text XML, a process NLM says takes a few days and enables full-text search within PMC.

Every record carries a prominent yellow information panel confirming the work has not been peer-reviewed, and NLM runs weekly checks — against the bioRxiv API, the Crossref API, and the Europe PMC API — to link a preprint to its eventual journal version, updating the PubMed status to “Updated” once that link is confirmed.

Which preprint servers qualify

Only four servers currently feed the pilot. NLM evaluates candidate servers against a published checklist — clear non-peer-review labelling, transparent versioning, open licensing information, machine-readable metadata, and a public archiving policy — modelled on NIH’s 2017 interim-research-products guidance (NOT-OD-17-050) and COPE’s preprint discussion document.

Server	Subject scope	Operator	DOI registration
bioRxiv	Life sciences	openRxiv (independent nonprofit, formerly a Cold Spring Harbor Laboratory service)	Crossref
medRxiv	Health and clinical sciences	openRxiv, with Yale University and BMJ as founding partners	Crossref
arXiv	Physics, mathematics, computer science, quantitative biology	Cornell University	Crossref
Research Square	Multidisciplinary	Research Square Company	Crossref

bioRxiv and medRxiv are the two servers most relevant to biomedical research administrators, since both fall under openRxiv, the independent nonprofit that took over operation of both platforms from Cold Spring Harbor Laboratory. openRxiv’s separation from a single host institution was framed explicitly around long-term sustainability for the two servers NIH now indexes directly — a governance detail that matters for anyone assessing the pilot’s durability, since NLM’s own eligibility criteria require a “publicly stated archiving strategy to ensure long-term access.”

What this means for discoverability, DOIs, and citation

PubMed indexing changes where a preprint can be found, not whether it can be cited. Every bioRxiv preprint already receives a DOI registered through Crossref at posting, which is what makes it part of the citable scientific record regardless of NIH eligibility.

According to bioRxiv’s own FAQ, preprints are indexed by “Google, all other search engines, Google Scholar, Crossref, Semantic Scholar, Europe PubMed Central, and Preprint Citation Index (connected to the Web of Science)” independent of the NIH pilot — PubMed indexing is an additional, funder-gated channel layered on top of that baseline discoverability.

One clarification worth making explicitly: bioRxiv and medRxiv do not carry a Scimago Journal Rank or an impact factor. Both metrics are journal-level indicators computed from peer-reviewed citation data; a preprint server is a distribution platform, not a journal, so no SJR score exists for bioRxiv as a whole, and any figure circulating under “bioRxiv impact factor” searches is not an NLM, Crossref, or Scimago-sourced metric.

Indexing also does not substitute for compliance. NLM is explicit that even when a preprint sits in PMC under the pilot, the NIH Public Access Policy still requires the peer-reviewed, accepted author manuscript to be separately deposited via NIHMS, with its own PMCID reported as proof of compliance.

Answer-first questions about bioRxiv and PubMed

Does bioRxiv show up in PubMed?

Yes, but only conditionally. A bioRxiv preprint appears in PubMed only if it acknowledges direct NIH funding or lists an NIH-affiliated author and was posted under Phase 2 of the NIH Preprint Pilot (from 1 January 2023). Non-NIH preprints stay discoverable via Google Scholar, Crossref, and Europe PMC instead.

What is a preprint in PubMed?

In PubMed, a preprint is a record carrying the publication type “Preprint,” which separates it from peer-reviewed literature in search filters. It displays a yellow information panel stating the work has not undergone peer review, and PubMed links it automatically to the journal version once one is published.

Does bioRxiv count as published?

No. bioRxiv distributes complete but unpublished manuscripts, so posting there is not equivalent to journal publication. A preprint carries a DOI and is part of the citable record, but it lacks the peer-review certification that ICMJE and COPE norms attach to a published article.

Is it okay to cite bioRxiv?

Yes. bioRxiv preprints receive a DOI through Crossref, making them formally citable, and are indexed by Google Scholar, Crossref, Semantic Scholar, and Europe PMC. Authors citing them should flag that the underlying findings have not yet completed peer review.

Why other funders are watching the pilot

NIH’s approach is unusual because it is infrastructural rather than a mandate: it does not require authors to preprint, it simply makes eligible preprints easier to find once posted. That distinction is why other funders are studying it rather than replicating it wholesale.

cOAlition S, the funder coalition behind Plan S, already treats preprints as an acceptable route to satisfying immediate open-access requirements, but no cOAlition S member currently operates an equivalent centralised indexing pipeline into a national biomedical database. UKRI’s open access policy similarly recognises preprints as compliant interim outputs without building comparable PMC-style ingestion.

For research administrators, the practical takeaway is that discoverability infrastructure and funder mandates remain two separate policy levers. NIH has built the first at meaningful scale; whether other national funders follow with their own PMC-equivalent indexing pipeline — rather than policy language alone — is the open question institutions tracking preprint compliance should watch through 2026 and beyond.

July 3, 2026

bioRxiv API: A Developer’s Guide to Metadata

The bioRxiv API is a free, unauthenticated REST interface at api.biorxiv.org that returns structured JSON or XML metadata — DOI, version number, posting date, subject category, licence, and author list — for any bioRxiv or medRxiv preprint, queryable by date range or by DOI, with no API key required. This guide sets out the endpoints, pagination rules, and field-level detail a developer needs to wire preprint metadata into a CRIS, discovery layer, or citation tool.

A preprint DOI in this ecosystem is a Digital Object Identifier issued under the 10.1101 prefix, registered with Crossref by Cold Spring Harbor Laboratory Press, and it resolves to a specific, versioned manuscript record — the same identifier the bioRxiv API uses as its primary lookup key.

What is the bioRxiv API?
Which endpoints return DOIs, versions, and subject categories?
How does the medRxiv API differ for integrators?
What are the rate limits and pagination rules?
Answer-first Q&A
Implications for CRIS and discovery-tool integrators

What is the bioRxiv API?

The bioRxiv API is a read-only HTTP interface, hosted at api.biorxiv.org, that exposes preprint metadata as JSON, XML (OAI-PMH), or HTML. It was built to support text and data mining, discovery-tool indexing, and institutional repository harvesting without scraping the public website. It requires no registration, no API key, and no OAuth flow — a plain HTTPS GET request is sufficient.

Because bioRxiv and medRxiv share the same underlying submission platform, the same API structure serves both servers; you select the server with a path segment (biorxiv or medrxiv) rather than a different base domain. This matters for CRIS and discovery-tool developers who need one integration pattern to cover both the life-sciences and health-sciences preprint corpora.

Which endpoints return DOIs, versions, and subject categories?

Metadata retrieval is split across five endpoint families. Each returns a defined JSON schema with a messages block (cursor position, total count) and a collection array of preprint records.

Endpoint	Purpose	Example
`/details/[server]/[DOI]/na/[format]`	Full metadata for one preprint by DOI, including every posted version	`api.biorxiv.org/details/biorxiv/10.1101/339747`
`/details/[server]/[start]/[end]/[cursor]`	Metadata for all preprints posted in a date range, paginated 30 per call	`api.biorxiv.org/details/biorxiv/2025-03-21/2025-03-28/0`
`/details/…?category=`	Filters the date-range endpoint by subject category (e.g. cell_biology)	`?category=cell_biology`
`/pubs/[server]/[DOI]/na/[format]`	Links a preprint DOI to its eventual published-journal DOI, once available	`api.biorxiv.org/pubs/medrxiv/10.1101/2021.04.29.21256344`
`/funder/[server]/[interval]/[ROR ID]/[cursor]`	Filters preprint metadata by funder, using a ROR identifier	`?ROR=00k4n6c32` (European Commission)

Each preprint record returned by the /details endpoint carries the following core fields:

doi — the versionless preprint DOI (prefix 10.1101)
version — an integer indicating which revision of the manuscript this record represents
date — the posting date of that specific version
category — the subject category assigned at submission (e.g. Bioinformatics, Genomics, Epidemiology)
title, authors, author_corresponding, abstract, license — standard bibliographic and rights fields

Because each version of a preprint is returned as a separate array entry under the same DOI, a CRIS integration must group records by doi and sort by version to reconstruct a manuscript’s full revision history — the API does not collapse versions for you.

How does the medRxiv API differ for integrators?

Structurally, the medRxiv API is not a separate product — it is the same api.biorxiv.org (or api.medrxiv.org, which mirrors the same routes) interface with medrxiv substituted as the server path segment. The field schema, pagination behaviour, and DOI prefix are identical.

The practical differences developers should code for are:

Subject categories differ in vocabulary: bioRxiv uses life-science categories (Cell Biology, Genomics, Neuroscience); medRxiv uses clinical and public-health categories (Cardiovascular Medicine, Infectious Diseases, Epidemiology).
medRxiv, co-founded in 2019 by Cold Spring Harbor Laboratory, Yale University, and BMJ, carries additional clinical-trial registration and conflict-of-interest declaration fields relevant to health-research governance that bioRxiv records omit.
medRxiv content volumes and posting cadence are lower than bioRxiv’s, so date-range polling for medRxiv can safely use wider intervals without hitting the 30-record-per-page ceiling as often.

What are the rate limits and pagination rules?

bioRxiv does not publish a formal published rate limit for the metadata API, but pagination is fixed: the /details family returns 30 records per call and the /pubs, /pub, /publisher, and /funder families return 100 records per call, advanced via the cursor parameter until the messages block reports no records remaining.

The community-maintained rbiorxiv R client — the top third-party wrapper indexed for this API — enforces a self-imposed one-second delay between paginated calls as good-citizen practice; developers building bulk harvesters for a CRIS or discovery index should adopt the same throttle even though it is not server-enforced.

For full-text or PDF-scale mining rather than metadata alone, bioRxiv and medRxiv separately publish bulk corpora via Amazon Web Services’ Open Data programme — a route the metadata API is not designed to serve and that sits outside the scope of this guide.

Answer-first Q&A

What is the rate limit for the bioRxiv API?

No official rate limit is published for api.biorxiv.org. In practice, pagination caps each call at 30 records for detail endpoints and 100 for publication and funder endpoints, and the community rbiorxiv client self-throttles to one request per second — a sensible default for any automated harvester.

Is bioRxiv open access?

Yes. bioRxiv provides free and unrestricted access to every posted article, for both human readers and machine analysis via the API. This applies equally to medRxiv, and neither server charges a fee to read, download, or programmatically query preprint metadata.

Is it okay to cite bioRxiv?

Yes. Every manuscript posted to bioRxiv or medRxiv receives a DOI under the 10.1101 Crossref prefix, making it a citable, versioned part of the scientific record. A correct biorxiv citation should reference the specific version number returned by the API, since the content of a DOI can change across revisions.

Who operates bioRxiv and medRxiv?

Since March 2025, both servers have been operated by openRxiv, an independent nonprofit spun out of Cold Spring Harbor Laboratory and backed by a $16 million grant from the Chan Zuckerberg Initiative. Its board includes CSHL President Bruce Stillman and medRxiv co-founder Harlan Krumholz — a governance change developers should note when citing the API’s institutional provenance.

Implications for CRIS and discovery-tool integrators

The March 2025 move to openRxiv governance is more than an institutional footnote for anyone building a research information system. openRxiv’s stated mandate is to expand — not just sustain — API access and machine-readable metadata as preprint volume grows, which means the endpoint contract described here should be treated as stable but not frozen; integrators should build a thin adapter layer rather than hard-coding field names.

For CRIS platforms harvesting outputs for institutional repositories, the /funder endpoint’s ROR-based filtering is the highest-value addition since the API’s original release: it lets an institution pull every preprint that declares a specific funder without post-hoc text matching. Combined with the /pubs endpoint’s preprint-to-published-DOI linking, a discovery layer can track a manuscript from first preprint version through to its eventual journal-of-record entry using DOIs alone.

Developers integrating author identity alongside this metadata should pair bioRxiv’s author_corresponding field with ORCID resolution rather than name-string matching, consistent with broader authorship attribution practice; teams building the CRIS side of this pipeline may also find it useful to cross-reference definitions in the research administration pillar and the CASRAI dictionary when mapping preprint metadata fields to internal schemas.

July 3, 2026

openRxiv Explained: Why bioRxiv and medRxiv Went Independent

openRxiv is the independent, researcher-led nonprofit that has run bioRxiv and medRxiv since March 2025, replacing Cold Spring Harbor Laboratory’s institutional stewardship with a six-member board, diversified funding, and a mandate to keep both preprint servers free to read and free to post. The spin-off was designed to insulate two of biomedicine’s most-used pieces of open-research infrastructure from dependence on any single institution or funder — a governance question every standards body and infrastructure provider eventually has to answer.

openRxiv is the independent nonprofit, launched on 11 March 2025, that now stewards the bioRxiv and medRxiv preprint servers on behalf of the global research community, rather than as a programme of a single host institution.

What is openRxiv, and what does it actually run?
Why did bioRxiv and medRxiv leave Cold Spring Harbor Laboratory?
Who governs openRxiv, and who pays for it?
What is openRxiv Labs, and what launched in June 2026?
Answer-first questions people are asking about openRxiv
What the openRxiv spin-off means for research-infrastructure stewardship

What is openRxiv, and what does it actually run?

openRxiv is the organisational and legal home of two preprint servers: bioRxiv, covering life sciences, and medRxiv, covering health and clinical research. Neither server changed its submission process, screening policy, or URL when the transition happened — researchers post to biorxiv.org and medrxiv.org exactly as before.

What changed is who is accountable for the platforms’ survival. bioRxiv was founded in 2013 at Cold Spring Harbor Laboratory (CSHL); medRxiv followed in 2019 as a joint initiative between CSHL, Yale University, and BMJ. Both grew into the dominant preprint venues for biomedicine, and by 2025 that success had outgrown the administrative capacity of a single laboratory to sustain indefinitely.

Why did bioRxiv and medRxiv leave Cold Spring Harbor Laboratory?

CSHL’s own account of the move calls it a “natural evolution,” not a rupture. Bruce Stillman, CSHL’s President and CEO, joined openRxiv’s board rather than severing ties, and co-founders John Inglis and Richard Sever moved with the platforms into the new entity.

The stated rationale centres on three risks that concentrated stewardship inside one institution:

Sustainability risk — a single laboratory’s budget cycle is not designed to guarantee decades of continuity for global research infrastructure.
Governance risk — decisions about screening policy, features, and funding priorities benefited from a board drawn from outside CSHL alone.
Funder-concentration risk — the platforms needed a structure that could accept diversified funding without any one funder gaining outsized influence.

openRxiv formally launched as an independent nonprofit on 11 March 2025, with the Chan Zuckerberg Initiative (CZI) providing three years of seed funding for the transition, according to openRxiv’s own governance Q&A published that May. In October 2025, arXiv — the physics, mathematics, and computer science preprint server run by Cornell University — joined openRxiv in submitting a joint response to a National Institutes of Health Request for Information on preprints, signalling a wider coalition forming around shared preprint-infrastructure interests, though arXiv itself remains a separate service.

Who governs openRxiv, and who pays for it?

openRxiv is governed by a six-member board of directors: Scott Fraser (University of Southern California and the CZI Imaging Institute), Edith Heard (Francis Crick Institute), Jeff Huber (Triatomic Capital), Harlan Krumholz (Yale School of Medicine; medRxiv co-founder), Bruce Stillman (CSHL), and Shirley Tilghman (Princeton University). A separate Scientific and Medical Advisory Board, chaired by John Inglis with medRxiv co-founder Theo Bloom as deputy, advises on content policy.

The funding question is where most scrutiny has landed, given CZI’s long involvement with both servers before the spin-off:

Question	openRxiv’s public answer (governance Q&A, May 2025)
How long has CZI funded the servers?	Eight years for bioRxiv, four years for medRxiv, plus three years of dedicated seed funding for the openRxiv transition itself.
Does CZI have editorial or operational control?	No. openRxiv states funding agreements carry no stipulations affecting editorial or operational independence.
How much board influence does CZI hold?	One of six directors (Scott Fraser) has a CZI affiliation; the board is not CZI-appointed as a bloc.
Is openRxiv against traditional peer review?	No — openRxiv reports roughly 75% of bioRxiv and medRxiv preprints go on to formal peer-reviewed publication, with direct-submission links to 350 journals.

openRxiv itself frames the governance model as a direct answer to funder-concentration concerns: the organisation states its mission is to be “governed by and for the research community, not a single funder, founder, or any one stakeholder.” Whether a philanthropic vehicle tied to a single tech-sector family remains structurally sufficient as the largest funder of a nonprofit intended to resist single-funder capture is a debate that predates this specific spin-off and will likely recur as openRxiv pursues its stated goal of diversifying revenue further.

What is openRxiv Labs, and what launched in June 2026?

openRxiv Labs launched on 1 June 2026 as a structured experimentation programme sitting on top of the core bioRxiv and medRxiv infrastructure. Rather than running many small tests at once, openRxiv committed to a small number of larger, hypothesis-driven pilots with predefined success metrics and durations, publishing results — including failures — openly on a dedicated Labs blog.

The first Labs pilot, built with the platform Curvenote, tests an interactive preprint-reading interface layered onto openRxiv’s existing corpus of preprints, figures, and metadata. openRxiv named a broad partner list for the programme, including CZI, CSHL, the Sergey Brin Family Foundation, Caltech, CNRS, Fred Hutchinson Cancer Center, Imperial College London, MIT, Stanford, the University of Washington, and Vrije Universiteit Amsterdam — underscoring that the funder-diversification effort begun at launch has continued into 2026 rather than stalling after the initial CZI seed grant.

Answer-first questions people are asking about openRxiv

Who is the CEO of openRxiv?

Dr Tracy Teal is openRxiv’s first Chief Executive Officer, appointed on 18 August 2025 after serving as interim COO since the March 2025 launch. She previously led The Carpentries and Dryad, two established open-research infrastructure nonprofits, giving her direct prior experience running community-governed scientific platforms.

Who owns medRxiv?

No single institution “owns” medRxiv today. It was founded in 2019 by Cold Spring Harbor Laboratory, Yale University, and BMJ, but operational and governance responsibility now sits with openRxiv, the independent nonprofit created specifically to steward it and bioRxiv without institutional or single-funder control.

Is medRxiv a credible source?

medRxiv preprints are screened but not peer-reviewed, so they should be cited with that caveat clearly stated. openRxiv reports around 75% of postings eventually complete formal peer review; until then, findings represent unverified claims from qualified researchers, useful for rapid awareness but not equivalent to a published, peer-reviewed article.

What is openRxiv, in one line?

openRxiv is the independent 501(c) nonprofit, launched 11 March 2025, that operates bioRxiv and medRxiv under a six-member board and a diversified-funding mandate, replacing their prior status as programmes hosted by Cold Spring Harbor Laboratory.

What the openRxiv spin-off means for research-infrastructure stewardship

The openRxiv case is a useful reference point for any organisation weighing how to govern shared research infrastructure once it outgrows its founding institution. The pattern — an originating body incubates a tool, the tool becomes essential community infrastructure, and stewardship then transfers to an independent, multi-stakeholder body — is not unique to preprints.

CASRAI originated the CRediT contributor role taxonomy in 2014. The standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. That is the same “originator, not owner” pattern openRxiv is now navigating in public: CSHL originated bioRxiv and medRxiv, and stewardship has since passed to a body structured explicitly to prevent any one funder, founder, or institution from controlling research infrastructure the whole field depends on.

For research administrators and institutional leaders, the practical takeaway is to watch governance structure, not just funding source, when assessing an infrastructure provider’s long-term reliability. A named, multi-institutional board; published funding-independence commitments; and open reporting of pilot outcomes (as with openRxiv Labs) are the concrete signals worth checking — independent of who wrote the first cheque.

July 3, 2026

Tag: openrxiv

BioRxiv PubMed Indexing: How the NIH Pilot Works

What is the NIH Preprint Pilot?

How a preprint moves from bioRxiv to PubMed

Which preprint servers qualify

What this means for discoverability, DOIs, and citation

Answer-first questions about bioRxiv and PubMed

Does bioRxiv show up in PubMed?

What is a preprint in PubMed?

Does bioRxiv count as published?

Is it okay to cite bioRxiv?

Why other funders are watching the pilot

bioRxiv API: A Developer’s Guide to Metadata

Contents

What is the bioRxiv API?

Which endpoints return DOIs, versions, and subject categories?

How does the medRxiv API differ for integrators?

What are the rate limits and pagination rules?

Answer-first Q&A

What is the rate limit for the bioRxiv API?

Is bioRxiv open access?

Is it okay to cite bioRxiv?

Who operates bioRxiv and medRxiv?

Implications for CRIS and discovery-tool integrators

openRxiv Explained: Why bioRxiv and medRxiv Went Independent

What is openRxiv, and what does it actually run?

Why did bioRxiv and medRxiv leave Cold Spring Harbor Laboratory?

Who governs openRxiv, and who pays for it?

What is openRxiv Labs, and what launched in June 2026?

Answer-first questions people are asking about openRxiv

Who is the CEO of openRxiv?

Who owns medRxiv?

Is medRxiv a credible source?

What is openRxiv, in one line?

What the openRxiv spin-off means for research-infrastructure stewardship