BioRxiv PubMed indexing is not automatic. Preprints reach PubMed through a single federal mechanism — the NIH Preprint Pilot, run by the U.S. National Library of Medicine (NLM) — which pulls in preprints that acknowledge direct NIH funding or carry an NIH-affiliated author, provided they were posted from 1 January 2023 onward under the pilot’s current phase.
The NIH Preprint Pilot is an NLM programme that makes NIH-funded preprints from eligible servers — bioRxiv, medRxiv, arXiv, and Research Square — discoverable through PubMed Central (PMC) and PubMed ahead of formal peer review, with a corresponding citation added on a weekly cycle.
- What is the NIH Preprint Pilot?
- How a preprint moves from bioRxiv to PubMed
- Which preprint servers qualify
- What this means for discoverability, DOIs, and citation
- Answer-first questions about bioRxiv and PubMed
- Why other funders are watching the pilot
What is the NIH Preprint Pilot?
The NIH Preprint Pilot began in June 2020 as a narrow, COVID-19-only initiative. NLM made more than 3,300 preprints reporting NIH-supported SARS-CoV-2 research discoverable in PMC and PubMed between June 2020 and June 2022, testing whether preprint records could accelerate discovery during a public-health emergency.
Phase 2 launched on 30 January 2023 and dropped the COVID-only restriction. It now covers any preprint that acknowledges direct NIH support and/or lists an NIH-affiliated author, posted to an eligible server on or after 1 January 2023. Eligible preprints are added to PMC on a weekly basis and receive a corresponding PubMed citation automatically — authors do not submit anything separately.
How a preprint moves from bioRxiv to PubMed
The pipeline is largely invisible to authors and runs on a fixed weekly cadence. NLM does not wait for a submission; it identifies eligible content and pulls it in automatically, then layers PubMed on top of the PMC record.
- Identification: NLM text-mines new bioRxiv and medRxiv postings for NIH-support acknowledgements and cross-checks the NIH Office of Portfolio Analysis tool for NIH-affiliated authors.
- PMC ingestion: Citation and abstract metadata are pulled from the preprint server’s machine-readable feed to build an “article header” record, and a PMCID is assigned immediately to enable rapid discovery.
- PubMed record creation: Once the PMC record exists, NLM generates the corresponding PubMed citation the same week, tagged with publication type “Preprint.”
- Full-text conversion: Preprints posted under a Creative Commons licence enter a separate workflow to produce archival full-text XML, a process NLM says takes a few days and enables full-text search within PMC.
Every record carries a prominent yellow information panel confirming the work has not been peer-reviewed, and NLM runs weekly checks — against the bioRxiv API, the Crossref API, and the Europe PMC API — to link a preprint to its eventual journal version, updating the PubMed status to “Updated” once that link is confirmed.
Which preprint servers qualify
Only four servers currently feed the pilot. NLM evaluates candidate servers against a published checklist — clear non-peer-review labelling, transparent versioning, open licensing information, machine-readable metadata, and a public archiving policy — modelled on NIH’s 2017 interim-research-products guidance (NOT-OD-17-050) and COPE’s preprint discussion document.
| Server | Subject scope | Operator | DOI registration |
|---|---|---|---|
| bioRxiv | Life sciences | openRxiv (independent nonprofit, formerly a Cold Spring Harbor Laboratory service) | Crossref |
| medRxiv | Health and clinical sciences | openRxiv, with Yale University and BMJ as founding partners | Crossref |
| arXiv | Physics, mathematics, computer science, quantitative biology | Cornell University | Crossref |
| Research Square | Multidisciplinary | Research Square Company | Crossref |
bioRxiv and medRxiv are the two servers most relevant to biomedical research administrators, since both fall under openRxiv, the independent nonprofit that took over operation of both platforms from Cold Spring Harbor Laboratory. openRxiv’s separation from a single host institution was framed explicitly around long-term sustainability for the two servers NIH now indexes directly — a governance detail that matters for anyone assessing the pilot’s durability, since NLM’s own eligibility criteria require a “publicly stated archiving strategy to ensure long-term access.”
What this means for discoverability, DOIs, and citation
PubMed indexing changes where a preprint can be found, not whether it can be cited. Every bioRxiv preprint already receives a DOI registered through Crossref at posting, which is what makes it part of the citable scientific record regardless of NIH eligibility.
According to bioRxiv’s own FAQ, preprints are indexed by “Google, all other search engines, Google Scholar, Crossref, Semantic Scholar, Europe PubMed Central, and Preprint Citation Index (connected to the Web of Science)” independent of the NIH pilot — PubMed indexing is an additional, funder-gated channel layered on top of that baseline discoverability.
One clarification worth making explicitly: bioRxiv and medRxiv do not carry a Scimago Journal Rank or an impact factor. Both metrics are journal-level indicators computed from peer-reviewed citation data; a preprint server is a distribution platform, not a journal, so no SJR score exists for bioRxiv as a whole, and any figure circulating under “bioRxiv impact factor” searches is not an NLM, Crossref, or Scimago-sourced metric.
Indexing also does not substitute for compliance. NLM is explicit that even when a preprint sits in PMC under the pilot, the NIH Public Access Policy still requires the peer-reviewed, accepted author manuscript to be separately deposited via NIHMS, with its own PMCID reported as proof of compliance.
Answer-first questions about bioRxiv and PubMed
Does bioRxiv show up in PubMed?
Yes, but only conditionally. A bioRxiv preprint appears in PubMed only if it acknowledges direct NIH funding or lists an NIH-affiliated author and was posted under Phase 2 of the NIH Preprint Pilot (from 1 January 2023). Non-NIH preprints stay discoverable via Google Scholar, Crossref, and Europe PMC instead.
What is a preprint in PubMed?
In PubMed, a preprint is a record carrying the publication type “Preprint,” which separates it from peer-reviewed literature in search filters. It displays a yellow information panel stating the work has not undergone peer review, and PubMed links it automatically to the journal version once one is published.
Does bioRxiv count as published?
No. bioRxiv distributes complete but unpublished manuscripts, so posting there is not equivalent to journal publication. A preprint carries a DOI and is part of the citable record, but it lacks the peer-review certification that ICMJE and COPE norms attach to a published article.
Is it okay to cite bioRxiv?
Yes. bioRxiv preprints receive a DOI through Crossref, making them formally citable, and are indexed by Google Scholar, Crossref, Semantic Scholar, and Europe PMC. Authors citing them should flag that the underlying findings have not yet completed peer review.
Why other funders are watching the pilot
NIH’s approach is unusual because it is infrastructural rather than a mandate: it does not require authors to preprint, it simply makes eligible preprints easier to find once posted. That distinction is why other funders are studying it rather than replicating it wholesale.
cOAlition S, the funder coalition behind Plan S, already treats preprints as an acceptable route to satisfying immediate open-access requirements, but no cOAlition S member currently operates an equivalent centralised indexing pipeline into a national biomedical database. UKRI’s open access policy similarly recognises preprints as compliant interim outputs without building comparable PMC-style ingestion.
For research administrators, the practical takeaway is that discoverability infrastructure and funder mandates remain two separate policy levers. NIH has built the first at meaningful scale; whether other national funders follow with their own PMC-equivalent indexing pipeline — rather than policy language alone — is the open question institutions tracking preprint compliance should watch through 2026 and beyond.
Leave a Reply