Tag: biorxiv journal

  • bioRxiv Microbiology: 2026 Subject Growth

    bioRxiv’s microbiology collection holds more than 41,000 preprints as of July 2026, making it the platform’s third-largest subject area behind neuroscience (over 90,000) and bioinformatics (nearly 43,000). Together, these three fields account for close to two-fifths of every preprint ever posted to bioRxiv since its 2013 launch — a concentration that says as much about where biology’s fastest-moving fields are as it does about the platform itself.

    bioRxiv is a free, non-profit preprint repository for the biological sciences, now operated by openRxiv, on which authors post manuscripts before or independent of journal peer review, sorted into 27 subject-specific collections spanning everything from paleontology to synthetic biology.

    What is bioRxiv, and how are preprints organised by subject?

    bioRxiv was co-founded by John Inglis and Richard Sever in November 2013 as an open-access preprint repository hosted by Cold Spring Harbor Laboratory. In March 2025, bioRxiv and its clinical-sciences counterpart medRxiv transferred to openRxiv, a newly formed non-profit created specifically to steward both platforms, as reported by Science.

    Every submission is placed into one of 27 subject collections at the point of posting. There is no fee to submit to bioRxiv, and authors self-select the collection that best matches their manuscript. This subject taxonomy is what makes volume comparisons across fields possible — and what this analysis draws on directly.

    One structural exception worth noting: the Epidemiology collection is now closed to new submissions following the completion of bioRxiv’s clinical-research pilot project, meaning its growth curve has effectively flattened while other collections continue to expand.

    How does bioRxiv microbiology compare to neuroscience and other subjects by volume?

    Based on a live count of bioRxiv’s own subject-collection pages taken on 3 July 2026, neuroscience is the platform’s largest single collection at 90,290 preprints — a 19.4% share of the roughly 465,700 preprints posted across all 27 collections to date. Bioinformatics follows at 42,825 (9.2%), with microbiology close behind at 41,133 (8.8%).

    Cell biology, evolutionary biology, genomics and biophysics round out the next tier, each holding between roughly 21,000 and 26,000 preprints. At the other end of the scale, paleontology (678) and clinical trials (138) remain niche collections by comparison, while epidemiology’s 2,067 total is now largely fixed given its closure to new submissions.

    Full ranking of bioRxiv’s largest subject collections

    Rank Subject collection Cumulative preprints Share of total
    1 Neuroscience 90,290 19.4%
    2 Bioinformatics 42,825 9.2%
    3 Microbiology 41,133 8.8%
    4 Cell Biology 25,753 5.5%
    5 Evolutionary Biology 24,737 5.3%
    6 Genomics 22,868 4.9%
    7 Biophysics 21,837 4.7%
    8 Ecology 20,284 4.4%
    9 Cancer Biology 18,775 4.0%
    10 Biochemistry 18,098 3.9%

    Source: CASRAI analysis of live bioRxiv subject-collection article counts, recorded 3 July 2026. These are cumulative totals since bioRxiv’s 2013 launch, not annual submission rates, so they reflect sustained field-level adoption of preprinting rather than a single year’s activity.

    Microbiology’s position just behind bioinformatics is notable given how differently the two fields work: bioinformatics preprints are often fast, computational and low-cost to produce, while microbiology preprints typically follow wet-lab experimental cycles. That microbiology has nonetheless built a corpus within a few thousand papers of bioinformatics points to strong, sustained preprinting culture within microbiology specifically — likely reinforced by the field’s pandemic-era experience with rapid-dissemination norms.

    Why does subject-level concentration matter for research administrators?

    For institutional leaders and research-administration teams, subject-level preprint concentration is a proxy for where scholarly communication norms are shifting fastest. A field with tens of thousands of preprints has, in effect, normalised pre-peer-review dissemination as a routine step in its publication workflow — with direct implications for how institutions track outputs, credit early dissemination in tenure and promotion review, and advise researchers on preprint policy.

    • Grant and promotion committees increasingly need clear policy on whether preprints count as citable outputs, particularly in high-volume fields like neuroscience and microbiology.
    • Research offices supporting microbiology, bioinformatics or genomics groups should expect preprint-first workflows to already be the norm, not the exception, among active researchers.
    • Fields with low preprint volume (pathology, zoology, clinical trials) may need different guidance, since preprinting culture there remains comparatively immature.

    This is also a live concern for research administrators and institutional leaders tracking how open-research norms diffuse unevenly across disciplines — subject-level data of this kind gives institutions a concrete basis for that assessment, rather than relying on anecdote.

    Common questions about bioRxiv preprints

    Is bioRxiv a preprint server?

    Yes. bioRxiv is a dedicated preprint server for the biological sciences, distributing manuscripts before or alongside formal peer review. It is operated by openRxiv, a non-profit created in 2025 specifically to run bioRxiv and medRxiv, and hosts subject collections spanning microbiology, neuroscience, genomics and 24 other biology-related fields.

    Can anyone submit to bioRxiv?

    Authors can deposit a manuscript in draft or final form provided it concerns a relevant scientific field, is unpublished at the time of submission, and all co-authors have consented. Authors must first register on the platform. bioRxiv screens submissions for basic scope and ethical compliance before posting, but does not conduct peer review.

    How much does it cost to publish in bioRxiv?

    There is no fee to submit a preprint to bioRxiv. This free-to-post model is a key driver of its growth across every subject collection, including the microbiology and neuroscience volumes analysed above, since it removes the cost barrier that applies to many open-access journal publication routes.

    Does bioRxiv count as published?

    A bioRxiv preprint is not equivalent to a peer-reviewed publication. It establishes a timestamped, citable public record of the work, and many journals allow later submission of the same manuscript, but it has not undergone formal peer review at the point of posting. Institutions and funders vary in how they weight preprints in assessment.

    Implications and outlook for scholarly communication

    The concentration of preprint volume in neuroscience, bioinformatics and microbiology is likely to persist rather than reverse. These fields combine large, active researcher populations with production cycles well suited to rapid dissemination, and none shows structural barriers comparable to epidemiology’s now-closed pilot pathway.

    For research-administration teams, the practical takeaway is to treat preprint-volume data by subject as a planning input: policy on preprint citation, researcher guidance, and repository integration should be calibrated to each discipline’s actual adoption level rather than applied uniformly across an institution’s full research portfolio.

  • bioRxiv Link to Published Paper: What the New Linkage Dataset Shows

    A bioRxiv link to published paper is created automatically, usually within two weeks of journal publication, once bioRxiv’s matching system confirms that the preprint and the paper share a title, author list, and DOI. A newly published dataset, PreprintToPaper, has now mapped this process across 145,517 bioRxiv preprints, showing exactly how long that journey takes and how much the underlying science changes along the way.

    The PreprintToPaper dataset is an openly available metadata collection — created by researchers Fidan Badalova, Julian Sienkiewicz, and Philipp Mayr and published in Scientific Data in 2026 — that connects bioRxiv preprints to their eventual journal publications using automated title-similarity, author-similarity, and DOI matching.

    What is the PreprintToPaper dataset?

    PreprintToPaper is a metadata dataset covering 145,517 bioRxiv preprints across two periods: 34,246 preprints from 2016–2018 (pre-pandemic) and 111,271 from 2020–2022 (pandemic era). Records were built by querying the bioRxiv API for preprint metadata and the Crossref API for journal-publication metadata, then linking the two sets algorithmically.

    The dataset sorts every preprint into one of three categories:

    Category Definition Count Share
    Published Formally linked to a journal article on bioRxiv, with a DOI to the version of record 90,614 62.3%
    Preprint Only No matching journal publication identified 35,813 24.6%
    Gray Zone Highly likely published, based on title and author matching, but with no DOI link recorded on bioRxiv 19,090 13.1%

    The Gray Zone category is the dataset’s key methodological contribution. Earlier work — including Abdill and Blekhman’s 2019 analysis in eLife, cited via PubMed Central, which found 42.0% of 15,797 sampled bioRxiv preprints had been formally linked to a published version — relied only on bioRxiv’s own DOI links. PreprintToPaper shows that a further 13.1% of preprints were very likely published but never picked up by that automatic link.

    How does bioRxiv link a preprint to its published paper?

    bioRxiv’s own linking mechanism is largely automatic. According to bioRxiv’s official FAQ, the platform “will usually automatically add a link to the published version within approximately two (2) weeks of journal publication,” after which the corresponding author receives a confirmation email.

    Matching fails occasionally — usually when the title, author list, or venue changes substantially between versions. bioRxiv advises authors to wait two to three weeks after publication before contacting staff directly if no link appears. PreprintToPaper formalises this same matching logic for research purposes, using:

    • A title-similarity score (via Python’s SequenceMatcher, measuring longest common subsequence) with a 0.75 threshold for a probable match;
    • An author-similarity score and an author-count difference to validate borderline cases;
    • Human annotation of 299 borderline records by two independent reviewers, reaching a Cohen’s kappa of 0.86 — a strong agreement level for a manual validation exercise.

    Records with an author-match score above 0.47 were used to reclassify apparent non-publications into the Gray Zone, which is what allows the dataset to correct for bioRxiv’s own linking gaps rather than simply repeating them.

    What publication delays does the dataset reveal?

    Publication rates were not stable across the study window. PreprintToPaper’s authors report that the confirmed publication rate ranged from 71% for preprints posted in 2016 down to 49% for those posted in 2022 — an apparent decline that is substantially narrowed once Gray Zone cases with an author-match score above 0.47 are counted as published rather than unlinked.

    This pattern is consistent with independent findings on preprint-to-publication timing. Earlier tracking studies of bioRxiv preprints reported a pre-pandemic median delay of around 166 days between posting and journal publication, while pandemic-era analyses of COVID-19 preprints found a much shorter median lag, reflecting accelerated peer review for urgent public-health findings. The apparent fall in 2022 publication rates most likely reflects a right-censoring effect — recent preprints simply have not yet had time to complete peer review and appear as “published” in the dataset’s snapshot — rather than a genuine drop in eventual publication.

    How much do titles and abstracts change before publication?

    PreprintToPaper stores both the initial submitted metadata and the final published metadata for each linked record — title, abstract, author list, journal name, and publication date — explicitly to support research on linguistic and structural change between preprint and published versions, including title reformulations and author-order shifts.

    This matters because bioRxiv’s own FAQ already flags a related, more mundane source of variation: metadata such as the manuscript title, author list, and abstract are initially supplied by the author at submission, then replaced with metadata extracted from the PDF once full-text HTML is generated — meaning small differences can appear even before any journal ever sees the paper. Distinguishing that housekeeping-level drift from substantive, peer-review-driven revision is precisely the analytical opportunity the new version-history subset unlocks, and is why the dataset’s authors built author-count-difference and title-similarity fields as first-class, machine-readable variables rather than leaving them buried in free text.

    Answer-first Q&A: common preprint-linkage questions

    For bioRxiv preprints, no manual action is normally required: bioRxiv’s system detects the journal publication and adds the link automatically, typically within two weeks of publication. If no link appears after two to three weeks, authors should contact bioRxiv staff directly so the match can be verified and added manually.

    Does bioRxiv count as published?

    No. A bioRxiv preprint is not peer-reviewed, edited, or certified by a journal, so it does not count as a formal publication. It is, however, a citable, DOI-bearing scholarly record that is indexed by Crossref, Google Scholar, Semantic Scholar, and Europe PMC, and NIH explicitly encourages citing preprints as interim research products.

    Can I cite a preprint in my paper?

    Yes. bioRxiv preprints should be cited by their DOI, in the format “Author AN, Author BT. Year. Title. bioRxiv doi: 10.1101/…”. If citing a specific revision, the version-specific URL should be added, since each preprint version remains permanently accessible under the same DOI.

    How do I update bioRxiv with a published paper if the automatic link fails?

    Authors should first wait two to three weeks past journal publication, since matching runs on a delay. If the link still has not appeared, the corresponding author should email bioRxiv staff or leave a comment on the preprint page; bioRxiv states it will verify all such requests before manually linking the record.

    What are the implications for institutions and publishers?

    For research administrators tracking outputs, PreprintToPaper’s Gray Zone category is a practical warning: relying solely on bioRxiv’s own “published” flag will undercount real publication rates by roughly 13 percentage points in this sample. Institutional repositories and research-information systems that harvest bioRxiv metadata directly should therefore treat unlinked-but-matched preprints as a distinct, reviewable category rather than as simply unpublished.

    For publishers and editors, the dataset’s version-history subset offers a reusable framework for auditing how much a manuscript’s core claims shift between preprint and version of record — separating genuine post-review revision from routine metadata clean-up. That distinction is directly relevant to authorship practice, where author-order and contributor-list changes between preprint and publication are common but rarely tracked systematically, and to broader definitional work maintained in the CASRAI Dictionary of scholarly-communication terms.

    The dataset itself, along with its code, is openly deposited on Zenodo, giving any institution the means to replicate or extend the analysis against its own output list rather than treating bioRxiv’s publication status as a black box.

  • BioRxiv ISSN Explained: Why It’s Not a Journal

    BioRxiv holds ISSN 2692-8205, but an ISSN is a serial-registration number, not proof of peer review. BioRxiv is a preprint repository, not a peer-reviewed journal: it has no Scimago Journal Rank, no Scopus record and no impact factor, because those metrics apply only to indexed journals, and bioRxiv does not perform peer review.

    BioRxiv is an open-access preprint repository for the biological sciences, launched in November 2013 by John Inglis and Richard Sever and now operated by the nonprofit openRxiv. Confusion about its status is common because bioRxiv looks and behaves like a journal platform — it has a citable DOI, a formal ISSN and a Wikipedia entry — while lacking the editorial infrastructure that “indexing” actually measures.

    Does bioRxiv have an ISSN, and what does that prove?

    BioRxiv is registered with ISSN 2692-8205, listed in the ISSN Portal and cross-referenced in the NLM Catalog under record ID 101680187, where the U.S. National Library of Medicine lists its electronic ISSN and title abbreviation “bioRxiv: the preprint server for biology”. An ISSN is issued by the ISSN International Centre to any continuing resource — journals, newspapers, monograph series, and repositories that publish serially.

    Holding an ISSN confirms only that a publication is a recognised, ongoing serial with a stable identity. It carries no implication about peer review, editorial oversight, or scholarly indexing. Many predatory journals and informal newsletters also carry valid ISSNs, which is precisely why the number is frequently mistaken for a quality signal.

    Is bioRxiv indexed in Scimago or Scopus?

    No. Scimago Journal & Country Rank derives its rankings exclusively from the Scopus citation database, which indexes peer-reviewed journals, conference proceedings and book series — not preprint servers. Because bioRxiv preprints are not peer-reviewed at the point of posting, they fall outside Scopus’s inclusion criteria, and bioRxiv correspondingly has no Scimago Journal Rank (SJR) or quartile ranking.

    Search results that appear to show “bioRxiv” scientometric profiles, such as third-party aggregator pages listing publication and citation counts, are counting citations to the individual preprints hosted on the platform, not a journal-level metric assigned to bioRxiv itself. This distinction matters for anyone assessing where a piece of research sits in the scholarly record.

    ISSN record vs. Scimago-indexed journal
    Attribute bioRxiv (ISSN 2692-8205) Typical Scimago/Scopus-indexed journal
    Peer review before posting No — basic screening only Yes — mandatory
    ISSN Yes Yes
    Scopus/Scimago listing No Yes (if indexed)
    Impact factor / SJR None Assigned annually
    Editorial board with reject/accept decisions No Yes
    DOI registration Yes, via Crossref (prefix 10.1101) Yes, via Crossref or DataCite

    What does bioRxiv’s Wikipedia entry actually describe?

    The Wikipedia article for bioRxiv describes it plainly as “an open access preprint repository for the biological sciences”, founded by John Inglis and Richard Sever in November 2013 and inspired by arXiv, the physics and mathematics preprint server launched by Paul Ginsparg in 1991. The entry documents bioRxiv’s ownership history in detail: it was hosted by Cold Spring Harbor Laboratory (CSHL) until 11 March 2025, when ownership transferred to openRxiv, a newly formed nonprofit created to run bioRxiv and its clinical-sciences counterpart, medRxiv.

    Nowhere does the entry describe bioRxiv as a peer-reviewed journal. It explicitly notes that submissions “undergo a basic scrutinisation process, which includes safeguarding checks, an automated plagiarism screening and an assessment of appropriateness” — a moderation gate, not editorial peer review. The article also cites a 2019 eLife meta-research study (Abdill and Blekhman) finding that roughly two-thirds of bioRxiv preprints are subsequently published in peer-reviewed journals, underscoring that bioRxiv functions as a pre-publication staging ground rather than a publication venue in its own right.

    Is bioRxiv a journal, and what does “indexing” really mean?

    BioRxiv is not a journal. In scholarly-communication terms, “indexing” means a database such as Scopus, Web of Science, PubMed or the Directory of Open Access Journals has evaluated a title against inclusion criteria — regular publication schedule, peer review, editorial governance, ethical standards — and added its articles to a searchable, citation-tracked index. bioRxiv preprints are discoverable and citable via Google Scholar, PubMed Central (in some cases) and their own DOIs, but that is discovery, not journal indexing.

    • ISSN registration confirms serial identity only.
    • DOI registration (via Crossref) confirms a persistent, citable identifier for a specific preprint version.
    • Scopus/Web of Science indexing confirms a journal has passed a database’s editorial and peer-review vetting process.
    • Scimago/impact factor are journal-level citation metrics computed only for indexed journals — bioRxiv has neither.

    The bioRxiv-to-Journals (B2J) initiative, which by May 2020 allowed authors at 177 participating journals to submit a posted preprint directly into a journal’s manuscript system, illustrates the actual relationship: bioRxiv is a feeder and archive that sits upstream of formal, indexed publication, not a substitute for it. For definitions of related scholarly-communication terms, see the CASRAI Dictionary.

    Answer-first Q&A

    Does bioRxiv have an ISSN?

    Yes. BioRxiv holds ISSN 2692-8205, registered with the ISSN International Centre and cross-listed in the NLM Catalog (record 101680187). An ISSN is a serial-identification number confirming bioRxiv is a continuing publication series — it does not certify that content has passed peer review or editorial vetting.

    Is bioRxiv considered a journal?

    No. BioRxiv is a preprint repository, not a peer-reviewed journal. Submissions undergo only basic screening for plagiarism, safeguarding and appropriateness, not scientific peer review. A 2019 eLife study found roughly two-thirds of bioRxiv preprints are later published in peer-reviewed journals.

    Is bioRxiv a publisher?

    BioRxiv describes itself as an archive and distribution service, operated by the nonprofit openRxiv since March 2025 (previously hosted by Cold Spring Harbor Laboratory). It distributes manuscripts rather than publishing them editorially — authors remain free to submit the same work to a journal afterwards.

    How do you cite bioRxiv?

    Cite bioRxiv preprints using their DOI (prefix 10.1101, registered via Crossref), per bioRxiv’s own FAQ guidance. If multiple versions exist, cite the version-specific URL. ICMJE-aligned journals typically require the citation to flag the work explicitly as a preprint, unlike a peer-reviewed indexed article.

    What this means for authors and institutions

    For research administrators and institutional leaders verifying publication records, the practical takeaway is definitive: a bioRxiv deposit is not equivalent to a peer-reviewed, indexed publication for the purposes of research assessment exercises, promotion dossiers, or funder reporting, regardless of how citable or ISSN-bearing the platform is. Research administration teams verifying publication records for compliance purposes should treat a bioRxiv ISSN or DOI as evidence of deposit and discoverability, not as evidence of peer review or journal-level standing.

    Authors should continue citing bioRxiv preprints by DOI, clearly labelled as preprints, and should track whether a peer-reviewed version has since appeared in an indexed journal — since roughly two-thirds eventually do. Terminology precision matters here: conflating “has an ISSN” with “is indexed” or “is a journal” produces avoidable errors in CVs, grant reports and library catalogues. As preprint servers proliferate across disciplines, the ISSN-versus-indexing distinction bioRxiv illustrates will only become more relevant to how research administrators, publishers and funders classify the scholarly record.