Tag: biorxiv microbiology

  • bioRxiv Microbiology: 2026 Subject Growth

    bioRxiv’s microbiology collection holds more than 41,000 preprints as of July 2026, making it the platform’s third-largest subject area behind neuroscience (over 90,000) and bioinformatics (nearly 43,000). Together, these three fields account for close to two-fifths of every preprint ever posted to bioRxiv since its 2013 launch — a concentration that says as much about where biology’s fastest-moving fields are as it does about the platform itself.

    bioRxiv is a free, non-profit preprint repository for the biological sciences, now operated by openRxiv, on which authors post manuscripts before or independent of journal peer review, sorted into 27 subject-specific collections spanning everything from paleontology to synthetic biology.

    What is bioRxiv, and how are preprints organised by subject?

    bioRxiv was co-founded by John Inglis and Richard Sever in November 2013 as an open-access preprint repository hosted by Cold Spring Harbor Laboratory. In March 2025, bioRxiv and its clinical-sciences counterpart medRxiv transferred to openRxiv, a newly formed non-profit created specifically to steward both platforms, as reported by Science.

    Every submission is placed into one of 27 subject collections at the point of posting. There is no fee to submit to bioRxiv, and authors self-select the collection that best matches their manuscript. This subject taxonomy is what makes volume comparisons across fields possible — and what this analysis draws on directly.

    One structural exception worth noting: the Epidemiology collection is now closed to new submissions following the completion of bioRxiv’s clinical-research pilot project, meaning its growth curve has effectively flattened while other collections continue to expand.

    How does bioRxiv microbiology compare to neuroscience and other subjects by volume?

    Based on a live count of bioRxiv’s own subject-collection pages taken on 3 July 2026, neuroscience is the platform’s largest single collection at 90,290 preprints — a 19.4% share of the roughly 465,700 preprints posted across all 27 collections to date. Bioinformatics follows at 42,825 (9.2%), with microbiology close behind at 41,133 (8.8%).

    Cell biology, evolutionary biology, genomics and biophysics round out the next tier, each holding between roughly 21,000 and 26,000 preprints. At the other end of the scale, paleontology (678) and clinical trials (138) remain niche collections by comparison, while epidemiology’s 2,067 total is now largely fixed given its closure to new submissions.

    Full ranking of bioRxiv’s largest subject collections

    Rank Subject collection Cumulative preprints Share of total
    1 Neuroscience 90,290 19.4%
    2 Bioinformatics 42,825 9.2%
    3 Microbiology 41,133 8.8%
    4 Cell Biology 25,753 5.5%
    5 Evolutionary Biology 24,737 5.3%
    6 Genomics 22,868 4.9%
    7 Biophysics 21,837 4.7%
    8 Ecology 20,284 4.4%
    9 Cancer Biology 18,775 4.0%
    10 Biochemistry 18,098 3.9%

    Source: CASRAI analysis of live bioRxiv subject-collection article counts, recorded 3 July 2026. These are cumulative totals since bioRxiv’s 2013 launch, not annual submission rates, so they reflect sustained field-level adoption of preprinting rather than a single year’s activity.

    Microbiology’s position just behind bioinformatics is notable given how differently the two fields work: bioinformatics preprints are often fast, computational and low-cost to produce, while microbiology preprints typically follow wet-lab experimental cycles. That microbiology has nonetheless built a corpus within a few thousand papers of bioinformatics points to strong, sustained preprinting culture within microbiology specifically — likely reinforced by the field’s pandemic-era experience with rapid-dissemination norms.

    Why does subject-level concentration matter for research administrators?

    For institutional leaders and research-administration teams, subject-level preprint concentration is a proxy for where scholarly communication norms are shifting fastest. A field with tens of thousands of preprints has, in effect, normalised pre-peer-review dissemination as a routine step in its publication workflow — with direct implications for how institutions track outputs, credit early dissemination in tenure and promotion review, and advise researchers on preprint policy.

    • Grant and promotion committees increasingly need clear policy on whether preprints count as citable outputs, particularly in high-volume fields like neuroscience and microbiology.
    • Research offices supporting microbiology, bioinformatics or genomics groups should expect preprint-first workflows to already be the norm, not the exception, among active researchers.
    • Fields with low preprint volume (pathology, zoology, clinical trials) may need different guidance, since preprinting culture there remains comparatively immature.

    This is also a live concern for research administrators and institutional leaders tracking how open-research norms diffuse unevenly across disciplines — subject-level data of this kind gives institutions a concrete basis for that assessment, rather than relying on anecdote.

    Common questions about bioRxiv preprints

    Is bioRxiv a preprint server?

    Yes. bioRxiv is a dedicated preprint server for the biological sciences, distributing manuscripts before or alongside formal peer review. It is operated by openRxiv, a non-profit created in 2025 specifically to run bioRxiv and medRxiv, and hosts subject collections spanning microbiology, neuroscience, genomics and 24 other biology-related fields.

    Can anyone submit to bioRxiv?

    Authors can deposit a manuscript in draft or final form provided it concerns a relevant scientific field, is unpublished at the time of submission, and all co-authors have consented. Authors must first register on the platform. bioRxiv screens submissions for basic scope and ethical compliance before posting, but does not conduct peer review.

    How much does it cost to publish in bioRxiv?

    There is no fee to submit a preprint to bioRxiv. This free-to-post model is a key driver of its growth across every subject collection, including the microbiology and neuroscience volumes analysed above, since it removes the cost barrier that applies to many open-access journal publication routes.

    Does bioRxiv count as published?

    A bioRxiv preprint is not equivalent to a peer-reviewed publication. It establishes a timestamped, citable public record of the work, and many journals allow later submission of the same manuscript, but it has not undergone formal peer review at the point of posting. Institutions and funders vary in how they weight preprints in assessment.

    Implications and outlook for scholarly communication

    The concentration of preprint volume in neuroscience, bioinformatics and microbiology is likely to persist rather than reverse. These fields combine large, active researcher populations with production cycles well suited to rapid dissemination, and none shows structural barriers comparable to epidemiology’s now-closed pilot pathway.

    For research-administration teams, the practical takeaway is to treat preprint-volume data by subject as a planning input: policy on preprint citation, researcher guidance, and repository integration should be calibrated to each discipline’s actual adoption level rather than applied uniformly across an institution’s full research portfolio.

  • bioRxiv or medRxiv? Choosing the Right Server for Clinical vs Basic Research

    bioRxiv or medRxiv? Choose bioRxiv for basic, non-clinical life-sciences research such as genetics, microbiology or neuroscience, and medRxiv for clinical, epidemiological or public-health research that could influence patient care. The two preprint servers do not overlap: posting the same manuscript to both is prohibited and can result in withdrawal.

    A preprint server is an open-access repository where researchers post a scientific manuscript publicly before it has completed formal peer review. bioRxiv and medRxiv are the two sibling servers operated by openRxiv for the life and health sciences respectively, and the correct choice between them depends on subject scope, not on which sounds more prestigious.

    On this page:

    What is the difference between bioRxiv and medRxiv?

    bioRxiv launched in 2013 at Cold Spring Harbor Laboratory (CSHL) as a preprint server for basic biology; medRxiv followed in 2019 as a dedicated server for clinical and health-sciences manuscripts. In March 2025, CSHL transferred governance of both platforms to openRxiv, a newly formed independent nonprofit, marking the most significant structural change since bioRxiv’s founding.

    Neither server is a journal. medRxiv is not a journal — it is a repository, and nothing posted there has been peer reviewed or certified. Both platforms carry explicit caution notices stating that preprints should not guide clinical practice, inform health-related behaviour, or be reported as established findings.

    The practical distinction authors need is scope, not scale: bioRxiv covers fundamental biological research with new data, while medRxiv is reserved for work that could plausibly influence a clinical decision, a public-health response, or patient behaviour.

    Where should clinical trials and health research go?

    Any manuscript reporting a clinical trial, an epidemiological study, or research with direct implications for diagnosis, treatment or public-health policy belongs on medRxiv. bioRxiv’s own submission guidance is explicit that new clinical trial reports and most epidemiology submissions must now go to medRxiv rather than bioRxiv.

    medRxiv applies stricter screening than bioRxiv precisely because misinterpreted clinical claims carry public-harm risk. One detail authors frequently miss: medRxiv does not accept case reports or case series, so single-patient or small-series clinical write-ups need a different outlet even when the subject matter is unambiguously medical.

    • Randomised controlled trials and other interventional studies
    • Epidemiological and public-health surveillance research
    • Studies involving patient-level clinical or health-behaviour data
    • Infectious disease, oncology, cardiovascular medicine and psychiatry manuscripts

    Where should microbiology, neuroscience and basic biology go?

    bioRxiv is the correct venue when the research advances fundamental biological understanding without a direct clinical application. Its subject categories include microbiology, neuroscience, genetics, immunology, cell biology and bioinformatics, among others, and submissions are screened by volunteer bioRxiv Affiliates chiefly for scope, plagiarism and public-harm potential.

    A microbiology paper characterising a novel bacteriophage, or a neuroscience paper mapping neural circuitry in a model organism, sits comfortably on bioRxiv provided it does not extend into patient data or treatment recommendations. The moment a microbiology study becomes an infectious-disease outbreak analysis, or a neuroscience study becomes a neurology or psychiatry treatment study, the correct server changes to medRxiv.

    How do you decide when a study sits on the border?

    Most submission confusion happens in a handful of predictable grey zones where a basic-science category on bioRxiv has a clinical counterpart on medRxiv. openRxiv’s own subject-category lists make the pairing explicit, and mapping them side by side is the fastest way to resolve a borderline decision.

    bioRxiv category (basic science) medRxiv category (clinical counterpart) Decision rule
    Genetics / Genomics Genetic and Genomic Medicine Patient-directed diagnosis or therapy → medRxiv
    Neuroscience Neurology / Psychiatry and Clinical Psychology Patient treatment or behaviour outcomes → medRxiv
    Microbiology Infectious Diseases Outbreak, surveillance or patient-cohort data → medRxiv
    Pharmacology and Toxicology Pharmacology and Therapeutics Human dosing, trial or therapeutic outcome data → medRxiv

    As a working test: if the manuscript’s conclusion could reasonably change what a clinician does at the bedside, or what a public-health body recommends, it belongs on medRxiv regardless of how “basic” the underlying technique feels. If it reports mechanism, model-organism data or method development with no direct patient or population-health claim, bioRxiv is the right home.

    Under the International Committee of Medical Journal Editors’ recommendations, posting to a recognised preprint server does not count as prior or duplicate publication and does not preclude subsequent journal submission — but authors should still confirm the target journal’s own preprint policy before posting either version.

    Questions authors ask

    Is bioRxiv reputable?

    Yes. bioRxiv is a well-established, widely used life-sciences preprint server operated by openRxiv, screened by volunteer affiliates for plagiarism, scope and biosafety concerns. It is not peer reviewed, but it is recognised across academic biology as a legitimate venue for early-stage research dissemination.

    Is medRxiv trustworthy?

    medRxiv applies a stricter, additional screening layer beyond bioRxiv’s because of the public-harm risk in clinical and health content. Every posted manuscript carries a prominent caution notice stating it has not been certified by peer review and should not guide clinical practice, making its scope and limitations transparent to readers.

    What is the difference between bioRxiv and medRxiv?

    bioRxiv covers basic, non-clinical life sciences; medRxiv is reserved for clinical, epidemiological and health-sciences research with potential patient or public-health impact. Screening intensity, disclaimer wording and accepted article types differ accordingly, and a single manuscript cannot be posted to both servers simultaneously.

    What are the alternatives to bioRxiv?

    Depending on field, authors also use arXiv for quantitative and computational biology work, Research Square or journal-integrated “In Review” services, and discipline-specific repositories such as ChemRxiv. None of these substitute for medRxiv when a manuscript is clinically actionable.

    What this means for authors and institutions

    For individual authors, the server choice is a compliance decision, not a branding one: submitting a clinical manuscript to the wrong server risks a request to withdraw and resubmit, delaying the timestamp priority a preprint is meant to secure. Research administrators tracking institutional preprint activity — an increasingly routine part of research administration workflows — should build the bioRxiv/medRxiv scope test into pre-submission checklists rather than leaving it to individual author judgement.

    For institutions and publishers, the March 2025 move to independent openRxiv governance is worth tracking: it signals that preprint infrastructure for biology and medicine is now managed as permanent scholarly-communication infrastructure rather than a single laboratory’s side project, with implications for long-term archival stability and policy planning. Definitions of related terms, including preprint, postprint and version of record, are maintained in the CASRAI Research Administration Dictionary.

    The practical rule holds regardless of field: match the manuscript’s real-world consequence, not its disciplinary label, to the server’s scope, and treat the bioRxiv/medRxiv boundary as a public-harm question rather than a prestige one.

  • scGPT bioRxiv: AI Biology Models Bypass Review

    scGPT bioRxiv preprints, alongside ESM3, AlphaFold-Multimer, Geneformer, EvolvePro and Chai-2, illustrate a 2026 pattern: AI foundation models for biology now reach bioRxiv months or years before — and sometimes instead of — formal peer review, shifting scrutiny onto the research community itself.

    A foundation model in biology is a large neural network pretrained on a broad corpus of sequence, structure or single-cell data, then fine-tuned for specific downstream tasks such as cell-type annotation, protein design or complex-structure prediction. bioRxiv is the open-access preprint server, now operated by the nonprofit openRxiv, where most of these models first appear.

    What is the bioRxiv wave of AI biology preprints?

    Since 2021, a cluster of high-profile AI foundation models for biology has appeared first as bioRxiv preprints rather than journal articles. scGPT, ESM3, AlphaFold-Multimer, Geneformer, EvolvePro and Chai-2 each disclosed model weights, training corpora and benchmark results on bioRxiv before, or without, completing formal peer review.

    This is not unique to biology, but the scale is notable. bioRxiv’s bioinformatics collection alone now holds over 42,000 preprints, and many of the field’s most-cited foundation-model papers spent a year or more circulating in preprint form before any journal version existed.

    Which models are driving this trend?

    Each model targets a different layer of biology — from single cells to protein complexes — but all six followed the same preprint-first disclosure pattern, with varying paths to formal review.

    Model Domain bioRxiv preprint date Peer-review status Headline result
    scGPT Single-cell multi-omics 1 May 2023 Nature Methods, 2024 Pretrained on over 10 million cells; preprint drew 1,490+ citations before formal publication
    ESM3 Protein sequence/structure/function 2 July 2024 Science, January 2025 Generated esmGFP, a novel fluorescent protein only 58% identical to its nearest known relative
    AlphaFold-Multimer Protein complex structure 4 October 2021 Still bioRxiv-only 67% success rate on heteromeric interfaces despite ubiquitous structural-biology use
    Geneformer Single-cell network biology No precursor preprint; v2 update posted August 2024 Nature, 31 May 2023 Pretrained on Genecorpus-30M, 29.9 million single-cell transcriptomes
    EvolvePro Protein engineering 17 July 2024 Still bioRxiv-only 2- to 515-fold activity gains across five therapeutic proteins
    Chai-2 Antibody and miniprotein design 6 July 2025 Still bioRxiv-only 16% hit rate in de novo antibody design, over 100x prior computational methods

    Two patterns stand out. First, Geneformer’s core 2023 paper went directly to Nature without a bioRxiv precursor, showing the pattern is not universal. Second, AlphaFold-Multimer, EvolvePro and Chai-2 remain, as of mid-2026, without any confirmed journal record despite being cited and deployed across thousands of downstream studies.

    Why publish before peer review?

    Competitive priority and speed dominate. Posting to bioRxiv creates a timestamped, public record of a result the moment it exists, which matters in a field where multiple labs often chase the same architecture within weeks of each other.

    • Immediate community stress-testing of code, weights and benchmark claims, often faster than a journal’s reviewer pool can respond.
    • Priority establishment ahead of competing labs working on the same problem class.
    • Faster onward use: downstream researchers can build on and cite a preprint immediately rather than waiting through a multi-month review cycle.

    Journals have adapted to this reality. Many now formally accept bioRxiv-posted work, and scGPT’s own trajectory — a 2023 preprint that drew over 1,490 citations before its 2024 Nature Methods publication — shows how much scientific traffic a foundation model can carry while still formally unreviewed.

    What are the research-integrity and attribution risks?

    The lack of independent review before wide reuse is the core risk. A 2026 bioRxiv preprint on researcher perceptions found that scientists rely heavily on author reputation, rather than review status, as their main heuristic for judging a preprint’s credibility — a fragile substitute for structured peer review, particularly for tools other labs adopt wholesale.

    Attribution is a related, distinct problem. When a foundation model like Chai-2 or ESM3 generates a candidate sequence that a human team then validates experimentally, contributor-credit questions arise: who conceived the method, who ran validation, and who is accountable for the claim. Both the International Committee of Medical Journal Editors and the Committee on Publication Ethics have stated that AI tools cannot be listed as authors, because they cannot take responsibility for the work’s accuracy or integrity.

    Structured contributor-role frameworks help resolve this. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. Applying CRediT roles to preprint co-authorship — distinguishing methodology, software, validation and formal analysis — gives institutions a documented way to assign human accountability even when an AI foundation model contributed materially to the output. See the broader CRediT framework overview and CASRAI’s authorship resources for related guidance.

    Answer-first Q&A

    Has the scGPT bioRxiv preprint been peer reviewed?

    Yes. The original scGPT preprint was posted to bioRxiv on 1 May 2023 and later passed formal peer review, publishing in Nature Methods in 2024. The preprint itself had already drawn more than 1,490 citations while still formally unreviewed.

    Why do AI foundation models for biology publish on bioRxiv before peer review?

    Competitive pressure and pace drive it. Posting to bioRxiv establishes priority and lets the wider research community stress-test claims, code and weights immediately, rather than waiting the months or years a formal peer-review cycle can take in a fast-moving field.

    Is AlphaFold-Multimer peer reviewed?

    No confirmed journal record exists for AlphaFold-Multimer itself; DeepMind’s preprint has remained on bioRxiv since 4 October 2021. It is nonetheless used routinely across structural biology — a stark example of a foundational tool that never completed formal peer review.

    Who owns bioRxiv?

    bioRxiv is operated by openRxiv, an independent nonprofit that assumed ownership from Cold Spring Harbor Laboratory in March 2025. The transfer aimed to secure the preprint server’s long-term governance as its role in disseminating AI foundation model research has grown.

    Implications for institutions and publishers

    Research offices and publishers now need explicit policy on how preprinted AI foundation models are cited, credited and re-used before formal review completes. Institutional research-integrity offices should treat a bioRxiv-only model — such as AlphaFold-Multimer, EvolvePro or Chai-2 — as provisionally validated, not settled science, when it underpins funded work or clinical-adjacent claims.

    Research administrators managing grant compliance and output tracking should build preprint-status checks into their reporting workflows; CASRAI’s research administration resources outline how contributor-role and output-tracking practices adapt to fast-moving, preprint-first fields. As more foundation models follow this path, the distinction between “published” and “peer reviewed” will matter more, not less, for research integrity.