Tag: alphafold-multimer biorxiv

  • eLife BioRxiv Model: Review After Posting Changes Peer Review

    eLife biorxiv review works in reverse order to a conventional journal: the paper is posted publicly on bioRxiv first, and eLife’s editors and reviewers evaluate it only after it is already visible to the world, publishing the result as a “Reviewed Preprint” rather than issuing an accept-or-reject verdict.

    A Reviewed Preprint is a bioRxiv or medRxiv manuscript that has been through eLife’s editorial and peer-review process and is published, alongside public reviews and an eLife Assessment, without a binary publication decision attached to it.

    What Is eLife’s Preprint-Only Review Model?

    eLife requires every submission to already exist as a preprint, typically on bioRxiv or medRxiv, before its editors will consider it. Editors — themselves active researchers — screen incoming preprints and select a subset for full review. In 2023, eLife formalised this into its Publish, Review, Curate model, removing the accept/reject gate entirely: any preprint that goes through full review is published as a Reviewed Preprint, regardless of how favourable the assessment turns out to be.

    This inverts the journal’s traditional role. Instead of deciding whether a paper reaches readers, eLife’s reviewers now decide how a paper readers can already see should be interpreted, through a public review and a standardised eLife Assessment describing the significance of the findings and the strength of the evidence.

    How Does eLife Review a Preprint Already on bioRxiv?

    The workflow eLife uses is consultative rather than adversarial, and it produces a single, consolidated verdict rather than several disconnected reviewer reports. In practice it runs through six stages:

    1. The author posts the manuscript to bioRxiv or medRxiv as a preprint.
    2. The author submits the same preprint to eLife for consideration.
    3. A reviewing editor screens the preprint and decides whether to send it for full review; many submissions are declined at this stage.
    4. Two or three external reviewers and the editor hold a consultative discussion to produce one consolidated set of comments rather than separate, sometimes-conflicting reports, with authorship and contribution details carried over from the original preprint.
    5. eLife publishes the preprint together with the public reviews and an eLife Assessment as a Reviewed Preprint.
    6. The author chooses whether, and when, to revise the work, resubmit it for further review, or declare it a Version of Record.

    This builds on a service eLife had already run since May 2020, when it launched “Preprint Review” to bring peer review to manuscripts already on bioRxiv, and on a submission pathway available since 2017 that let authors upload to bioRxiv while submitting to eLife in parallel.

    How Does This Differ From Traditional Pre-Publication Peer Review?

    The core difference is sequencing: in a conventional journal, review happens before the public ever sees the manuscript, and the outcome of that review is a gatekeeping decision. In eLife’s model, the manuscript is already public, and review adds an evaluative layer on top of it rather than deciding whether it exists at all.

    Feature eLife’s model Traditional pre-publication review
    Timing Publish first, review second Review first, publish only if accepted
    Outcome No accept/reject; all reviewed work is published as a Reviewed Preprint Binary accept/reject decision
    Transparency Reviews and eLife Assessment published openly Reviewer identities and comments usually confidential
    Author control Author decides when to revise or declare a Version of Record Author must satisfy editor/reviewers to be published at all
    Unit of evaluation Article-level assessment Journal-level acceptance, often read as a proxy for quality

    The trade-off is real, not just structural. Because Clarivate’s Journal Impact Factor methodology requires an indexed journal to publish only papers that editors have formally validated as acceptable, eLife’s decision to publish every reviewed preprint — regardless of the assessment’s verdict — led Clarivate to discontinue eLife’s Journal Impact Factor from its 2025 Journal Citation Reports release, ending a metric that had stood at 6.4.

    Where Does bioRxiv Fit Among Preprint Servers?

    bioRxiv (pronounced “bio-archive”) is a free preprint server for the life sciences, operated by openRxiv, a nonprofit dedicated to advancing scientific communication. It sits within a wider ecosystem of subject-specific preprint servers, several of which are frequently confused with one another or with journal-run review platforms such as Research Square’s In Review.

    Server Field Screening model
    bioRxiv Life sciences Basic screening only; operated by nonprofit openRxiv
    medRxiv Health sciences / clinical Additional screening for clinical risk; also run by openRxiv
    arXiv Physics, maths, computer science Moderated but not peer-reviewed; run by Cornell University
    Research Square Multidisciplinary Preprint posting plus optional “In Review” integrated peer review, tied to Springer Nature journals
    SSRN Social sciences, economics, law Basic screening; owned by Elsevier
    ChemRxiv Chemistry Basic screening; run by chemical societies

    The distinction that matters for the “biorxiv or arxiv” question is disciplinary scope, not rigour: arXiv predates bioRxiv by more than two decades and serves physical sciences, while bioRxiv (launched 2013) was purpose-built for biology. Neither performs peer review itself — that is precisely the gap eLife’s model was designed to fill for bioRxiv content.

    What Does This Mean for Research Administrators and Institutions?

    For research administration offices, the practical question is no longer whether a preprint has been reviewed, but whether assessment, promotion, and funding-reporting processes recognise a Reviewed Preprint as equivalent to a conventional accepted article. That question is not yet uniformly answered.

    • The US National Institutes of Health has permitted preprints to be cited in grant applications and biosketches since 2017, establishing precedent that funders can recognise unpublished-but-posted work.
    • eLife reports that a growing number of funders now explicitly recognise Reviewed Preprints, rather than only the eventual Version of Record, in research assessment.
    • Institutions signed to the San Francisco Declaration on Research Assessment (DORA) already commit to evaluating research on its own merits rather than journal-level metrics — directly compatible with article-level eLife Assessments, since Clarivate no longer supplies a journal Impact Factor to fall back on.
    • Research administrators handling REF-style exercises, tenure dossiers, or grant reports need local guidance on whether the Reviewed Preprint, the eLife Assessment, or the Version of Record is the citable unit — under the 2023 model, all three can exist for one piece of work, each with its own DOI in a single version log.

    A data point often missing from commentary on the model: a 2019 eLife study by Abdill and Blekhman tracking bioRxiv preprint outcomes found eLife published almost as many bioRxiv preprints (394) in 2018 as any other single journal — over a third of its 1,172 articles that year — years before the 2023 model made this the default route.

    Common Questions About eLife and bioRxiv

    Is eLife a preprint?

    No. eLife is a journal, not a preprint server. It reviews manuscripts that authors have already posted as preprints on bioRxiv or medRxiv and publishes the result as a Reviewed Preprint — the preprint plus public reviews and an eLife Assessment, distinct from the original unreviewed posting.

    What is bioRxiv used for?

    bioRxiv is used to share life-sciences research immediately, before or independent of journal peer review. Researchers post manuscripts to establish priority, gather early feedback, and make findings available while formal review — at eLife or elsewhere — is still under way, sometimes for months.

    Why did eLife lose its impact factor?

    Clarivate discontinued eLife’s Journal Impact Factor because eLife now publishes every peer-reviewed submission as a Reviewed Preprint regardless of the review outcome, rather than issuing conventional accept/reject decisions. Clarivate’s indexing rules require journals to publish only editorially validated papers, so eLife’s model fell outside that requirement from the 2025 Journal Citation Reports release.

    Is eLife a high-impact journal?

    eLife’s citation performance was historically strong — its last Journal Impact Factor was 6.4 — but it no longer carries a Clarivate-assigned Impact Factor. Its standing is now judged through article-level eLife Assessments and public reviews rather than a single journal-wide citation metric.

    As more funders and institutions formalise how they treat Reviewed Preprints, Public Reviews, and eLife Assessments in research assessment, eLife’s model looks less like an isolated experiment and more like an early test case for peer review as a layer added on top of open preprints, rather than a gate placed in front of them. Research offices that decide this now — before it becomes a routine dossier question — will have a real advantage over those that wait for a funder mandate to force the issue.

  • Preprint Servers List by Discipline: 2026 Guide

    The right preprint server depends entirely on discipline: bioRxiv and medRxiv serve biomedicine, arXiv still dominates physics, mathematics and computer science, TechRxiv and engrXiv cover engineering, PsyArXiv leads psychology, and Preprints.org is one of the few platforms that formally accepts review articles alongside original research. This preprint servers list compares scope, governance, screening rules and 2026 policy changes across each field, so researchers and research offices can match a manuscript to the right platform rather than defaulting to the best-known name.

    A preprint server is an online repository where researchers deposit a complete but not-yet-peer-reviewed manuscript so it becomes citable and publicly readable before formal journal publication. Coverage, screening rigour and accepted article types vary sharply by field, which is why a single “best preprint server” answer is misleading.

    What is a preprint server, and why does discipline matter?

    A preprint server is a repository that posts a complete scholarly manuscript before it has undergone formal peer review, giving it a timestamp, a DOI and open readability. Screening is typically limited to checking that a submission is genuinely scholarly, complete and does not pose a public-health or safety risk — it is not equivalent to peer review.

    Disciplines differ in what they will screen for and what article types they will accept. A biology preprint about a novel protein structure and a psychology preprint reporting a null replication result face entirely different moderation standards, which is why choosing the correct preprint server list entry for your field matters more than choosing the largest or most famous platform.

    Which preprint server should biomedical and clinical researchers use?

    Biomedicine is served by two related but distinct platforms, both operated by openRxiv, the nonprofit spun out of Cold Spring Harbor Laboratory. bioRxiv covers basic life-sciences research, while medRxiv — described on its own site as “the preprint server for Health Sciences” — is reserved for clinical, epidemiological and public-health manuscripts and applies stricter screening because its content can influence clinical practice.

    • A manuscript cannot be posted to both bioRxiv and medRxiv simultaneously.
    • medRxiv states plainly in its FAQ that “there is no fee to submit manuscripts.”
    • medRxiv screening includes clinicians who check for content that could mislead patients or clinical decision-making.

    Which preprint server leads for physics, mathematics and computer science?

    arXiv, founded in 1991 and hosting more than a million articles, remains the dominant server for physics, mathematics, computer science, quantitative biology, statistics and quantitative finance. Its moderation relies on volunteer subject-area moderators rather than paid editorial staff.

    Two 2026 developments matter for anyone comparing arXiv to newer platforms. First, arXiv formally declared operational independence from Cornell University in March 2026, a governance shift reported by Science that separates its stewardship from a single host institution. Second, arXiv tightened its new-author policy: as of January 2026, first-time submitters in all categories need either an institutional email address plus a prior publication record on arXiv, or a personal endorsement from an established arXiv author — and in computer science categories specifically, review articles and position papers must already be accepted by a recognised journal or conference before they can be posted.

    Which preprint servers cover engineering and psychology?

    Engineering does not have a single dominant server in the way physics or biology do. TechRxiv, backed by the Institute of Electrical and Electronics Engineers (IEEE), and engrXiv, supported by the Center for Open Science, both accept a broad range of engineering and technology manuscripts, alongside arXiv’s own electrical-engineering and systems-science categories.

    PsyArXiv, hosted on the Open Science Framework and managed by the Society for the Improvement of Psychological Science, is the closest thing psychology has to a discipline-wide default. It moderates submissions for scholarly relevance and, in 2026, moved to stricter verification of authors’ publication records for certain submission types, alongside its existing encouragement of preregistration and data-availability statements.

    Server Primary discipline Governing body Accepts review articles Notable 2026 development
    bioRxiv Biology / life sciences openRxiv (nonprofit) Not as a standalone article type
    medRxiv Medicine / health sciences openRxiv (nonprofit) No No submission fee (confirmed in FAQ)
    arXiv Physics, maths, CS, stats Independent nonprofit (formerly Cornell-hosted) Restricted; CS reviews need prior journal/conference acceptance Declared independence from Cornell, March 2026
    TechRxiv Engineering & technology IEEE Yes
    engrXiv Engineering sciences Center for Open Science Yes
    PsyArXiv Psychology Society for the Improvement of Psychological Science / OSF Yes Stricter author-verification moderation, 2026
    Preprints.org Multidisciplinary MDPI Yes — explicit “Review” article type Passed 124,000+ hosted preprints

    Which preprint server accepts review articles — Preprints.org vs arXiv?

    This is where discipline-agnostic platforms diverge sharply from field-specific ones. Preprints.org, governed by MDPI and hosting over 124,000 preprints, explicitly lists “Review” as one of its recognised submission types alongside original articles, communications and data descriptors — making it one of the more accommodating multidisciplinary choices for authors of literature reviews and systematic reviews.

    arXiv, by contrast, treats review and position papers as a special case rather than a default article type: in its computer science categories, such papers must already have been accepted by a recognised journal or conference before arXiv will host them. bioRxiv similarly does not treat “review article” as a standard submission category — its FAQ describes comment-based peer discussion, not narrative reviews, as the mechanism for post-publication critique.

    For authors specifically searching for where to deposit a review manuscript, this is a genuine and under-reported distinction: Preprints.org and general-purpose repositories such as SSRN or Research Square are structurally more open to review articles than the flagship subject-specific servers.

    Frequently asked questions

    What is a preprint server?

    A preprint server is an online repository where researchers deposit a complete, unpublished manuscript before peer review, so it receives a timestamp, a citable DOI and open access. It performs basic scholarly and safety screening but does not certify the findings the way peer review does.

    Is medRxiv free to use?

    Yes. medRxiv’s own FAQ states there is no fee to submit manuscripts. Authors do not pay to post, and readers access preprints without a paywall, consistent with its role as an open, nonprofit health-sciences repository operated by openRxiv.

    Does bioRxiv accept review papers?

    Not as a standard submission type. bioRxiv is built around original research reports, and its FAQ describes structured comments — not narrative or systematic review articles — as its mechanism for post-posting critique. Authors of review manuscripts typically use Preprints.org or a discipline-general server instead.

    What are the disadvantages of preprints?

    Preprints have not been peer-reviewed, so findings can be incomplete, later revised, or misreported by media before formal validation. Negative public comments on a preprint may also influence subsequent peer review, and some journals still restrict submissions that overlap heavily with an already-public preprint.

    Implications for research administrators and institutions

    Research offices advising authors on open-access compliance need a discipline-aware view, not a single institutional default. A biomedical clinical trial preprint belongs on medRxiv given its clinician screening; a systematic review destined for a multidisciplinary audience is far more likely to be accepted on Preprints.org than on arXiv or bioRxiv. Institutions building preprint guidance pages should map manuscript type and discipline to platform before recommending “post it on arXiv” as a blanket instruction.

    Funders and publishers referencing preprint policy should also note governance changes such as arXiv’s 2026 separation from Cornell, since institutional affiliation and stewardship arrangements can affect long-term archiving guarantees that research administrators rely on when advising on data-management and preservation plans.

    Conclusion: choosing by discipline, not by brand

    There is no universal “best” preprint server. bioRxiv and medRxiv fit biomedicine, arXiv still defines physics, mathematics and computer science despite tightened 2026 submission rules, TechRxiv and engrXiv split the engineering space, PsyArXiv anchors psychology, and Preprints.org stands out as the multidisciplinary option most open to review articles. Authors and research offices get the best outcome by treating this preprint servers list as a field-by-field decision, not a single default choice.

  • Chai-2 bioRxiv: Comparing AI Biology Preprints Ahead of Peer Review

    The Chai-2 bioRxiv preprint, posted by Chai Discovery on 5 July 2025, reports a 16% hit rate in fully de novo antibody design — more than 100-fold above prior computational methods — but like the ESM3 and Geneformer foundation models it sits alongside, the claim has not yet cleared peer review. All three are part of a wider pattern: AI biology foundation models are increasingly disseminated as bioRxiv preprints first, journal articles later (if at all), which changes how institutions, publishers, and funders must scrutinise their claims.

    A bioRxiv preprint is a manuscript posted to the Cold Spring Harbor Laboratory’s biology preprint server before, or instead of, formal peer review. This article compares how Chai-2, ESM3, Geneformer, EvolvePro, and AlphaFold-Multimer have each used that route, and what the differences mean for reproducibility.

    What is Chai-2, and why was it posted as a bioRxiv preprint?

    Chai-2 is a multimodal generative model from Chai Discovery that designs antibodies and nanobodies from scratch, taking a target structure and epitope as input and returning a complete antibody design. The original preprint, “Zero-shot antibody design in a 24-well plate”, reported a 16% success rate in de novo design against 52 diverse targets, completed from AI design to wet-lab validation in under two weeks.

    Chai Discovery followed with an updated bioRxiv preprint on 29 November 2025, “Drug-like antibody design against challenging targets”, reporting that more than 86% of designed full-length monoclonal antibodies showed developability profiles comparable to approved therapeutics. Neither preprint has yet been published in a peer-reviewed journal. The company has since raised a $130 million Series B round, taking total funding above $225 million at a $1.3 billion valuation, according to Genetic Engineering & Biotechnology News.

    How do ESM3 and Geneformer differ from Chai-2 in preprint dissemination?

    ESM3 and Geneformer address different biological scales entirely, and their publication paths diverge from Chai-2’s in an instructive way. ESM3, from EvolutionaryScale, is a general-purpose protein language model trained on roughly 2.78 billion protein sequences with a 98-billion-parameter flagship configuration. It was posted as a preprint before its 2025 publication in Science — meaning it eventually completed the peer-review cycle that Chai-2’s antibody preprints have not yet reached.

    Geneformer operates at the cellular level rather than the molecular level. Built on a transformer-encoder architecture pretrained across tens of millions of single-cell RNA-sequencing profiles, it classifies cell types and predicts disease-relevant genes. Its foundational description, credited to Christina Theodoris and colleagues, circulated as a preprint before formal publication in Nature in 2023.

    EvolvePro and AlphaFold-Multimer extend the comparison further. EvolvePro is a few-shot protein-engineering framework that uses language-model embeddings to guide directed evolution from very few labelled variants, disseminated via bioRxiv. AlphaFold-Multimer, Google DeepMind’s extension of AlphaFold2 for multi-chain complex prediction, is the starkest case: its 2021 bioRxiv preprint (Evans et al.) has been cited thousands of times and underpins structural biology workflows worldwide, yet it has never been published in a peer-reviewed journal.

    Model Domain bioRxiv posting Weight access Peer-review status
    Chai-2 De novo antibody design v1 Jul 2025; updated Nov 2025 Platform/API access, not fully open weights Preprint only
    ESM3 General protein sequence/structure/function Preprint, then Science (2025) Smaller checkpoints open; 98B flagship gated via Forge API Peer-reviewed
    Geneformer Single-cell transcriptomics Preprint, then Nature (2023) Fully open-weight release Peer-reviewed
    EvolvePro Few-shot directed protein evolution bioRxiv preprint Open code/model release Preprint at time of posting
    AlphaFold-Multimer Multi-chain complex structure prediction bioRxiv preprint (2021) Code and weights open-sourced Never published in a peer-reviewed journal

    Why does preprint-first publication intensify reproducibility scrutiny?

    Preprint-first publication compresses the interval between a headline result and its public citation, which is valuable for fast-moving fields but removes a layer of independent verification before claims circulate. AlphaFold-Multimer shows this can persist indefinitely: a preprint can become de facto infrastructure without ever completing formal review.

    • Model weight access varies sharply: Geneformer and AlphaFold-Multimer are fully open, while Chai-2 and ESM3’s largest configuration require platform or API access, limiting independent replication of the exact reported result.
    • Benchmark scale differs: Chai-2’s 16% hit rate is drawn from a company-run benchmark across 52 targets, not an externally adjudicated challenge such as CASP or CAPRI.
    • Versioning matters: Chai-2’s updated November 2025 preprint extends claims to full-length monoclonal antibodies, meaning readers must track which version underlies any given statistic.

    For research administrators and institutional evaluators, the practical implication is that a citation to “Chai-2” or “ESM3” is not self-evidently a citation to peer-reviewed work — the preprint status, version, and weight-access terms all need checking before the claim is treated as settled.

    Common questions about AI biology preprints on bioRxiv

    Is the Chai-2 bioRxiv preprint peer-reviewed?

    No. As of publication, both Chai-2 preprints — the July 2025 original and the November 2025 update — remain bioRxiv preprints. Neither has completed formal peer review, so the reported 16% hit rate and 86% developability figures should be read as company-reported, not journal-vetted, results.

    Has ESM3 been published in a peer-reviewed journal?

    Yes. ESM3 was first circulated as a preprint before EvolutionaryScale’s results were published in Science in 2025, giving it a completed peer-review path that Chai-2’s antibody-design claims currently lack.

    What is Geneformer used for?

    Geneformer analyses single-cell RNA-sequencing data to classify cell types, model gene regulatory networks, and identify disease-relevant genes, using a transformer architecture trained on large single-cell transcriptome corpora rather than protein or antibody sequences.

    What is the difference between Chai-2 and AlphaFold-Multimer?

    AlphaFold-Multimer predicts the 3D structure of existing multi-chain protein complexes, while Chai-2 generates entirely new antibody sequences and structures for a chosen target — structure prediction versus de novo generative design.

    What are the implications for institutions, publishers, and funders?

    Research administrators citing Chai-2, ESM3, Geneformer, or comparable models in grant reports, technology assessments, or institutional communications should distinguish preprint claims from peer-reviewed findings explicitly, note the exact preprint version, and record whether model weights are open or platform-gated. Publishers and editors evaluating manuscripts that build on these models should likewise verify which version of the underlying preprint is cited, since headline metrics can shift between versions.

    The broader lesson is structural rather than model-specific: as AI biology moves faster than journal review cycles, the preprint-to-journal gap itself becomes a due-diligence checkpoint that institutions, funders, and publishers now need to track as routinely as they track the results themselves.

  • Credit Authorship Taxonomy: The Preprint Gap

    The credit authorship taxonomy (CRediT) is largely absent from arXiv and bioRxiv preprints because neither platform has an editorial office empowered to enforce it, neither offers a dedicated contribution-metadata field, and a preprint is not yet a fixed version of record. CRediT statements are collected later, when a manuscript reaches a journal that mandates them.

    CRediT is a controlled vocabulary of 14 defined contributor roles used to describe, role by role, what each named author actually did on a research output. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022.

    Contents

    What Is the CRediT Authorship Taxonomy?

    CRediT (Contributor Roles Taxonomy) assigns one or more of 14 standard role labels — Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, and Writing – review & editing — to each named contributor on a research output.

    • CASRAI originated the taxonomy in 2014 to complement, not replace, traditional authorship bylines.
    • NISO approved it as ANSI/NISO Z39.104-2022, the current formal reference standard.
    • It is licensed CC-BY 4.0 and is distinct from the ICMJE authorship criteria, which govern who qualifies as an author at all rather than what each author contributed.

    The taxonomy is now embedded in the submission systems of major publishers, including Elsevier, Wiley, Taylor & Francis, Sage and Nature Portfolio journals — almost always at the point of formal peer-reviewed submission or acceptance, not at the preprint stage.

    Why Don’t arXiv and bioRxiv Require CRediT Statements?

    Preprint servers skip CRediT largely because they have no editorial office analogous to a journal’s. arXiv and bioRxiv operate a lightweight moderation or screening check — confirming the submission is on-topic and not obviously unscientific — rather than the editorial and peer-review workflow that gives journals a natural checkpoint at which to demand a structured contributorship disclosure.

    A second reason is version-of-record ambiguity. A preprint can be revised multiple times before, or instead of, formal publication, and co-authorship or individual roles can change between versions — for example when a reviewer at the eventual journal requests new experiments performed by a newly added contributor. Locking a CRediT statement to an early preprint version risks misrepresenting the contributions behind the paper that ultimately gets cited.

    Neither arXiv nor bioRxiv has published an official policy explaining the omission; the absence reflects infrastructure and governance gaps rather than a stated objection to the taxonomy itself.

    The Submission and Metadata Gap Behind the Absence

    The practical blocker is metadata architecture. arXiv collects author information as a single free-text field with no dedicated structure for role-level contribution data. bioRxiv and medRxiv, run by Cold Spring Harbor Laboratory, capture somewhat richer structured metadata — including funder information — but likewise have no CRediT field in their submission forms.

    This differs from what happens downstream. Crossref’s deposit schema supports embedding CRediT contributor-role metadata against a published journal article’s DOI record, which is how a reader can eventually see machine-readable contribution data attached to the version of record. Preprint DOI records typically carry no equivalent CRediT element, because the preprint servers do not populate it and have no requirement to.

    Feature arXiv / bioRxiv (preprint) Typical CRediT-mandating journal
    Screening body Moderators (topic/scope check) Editorial board + peer reviewers
    Author metadata field Free-text author list Structured CRediT role fields in submission system
    Version status Multiple revisable versions Single accepted version of record
    CRediT statement required No Often yes, per publisher policy
    DOI metadata (CRediT roles) Generally absent Supported via Crossref deposit schema

    What Changes When a Preprint Reaches a CRediT-Mandating Journal?

    Once a manuscript that began life as an arXiv or bioRxiv preprint is accepted by a journal that mandates CRediT, the contribution statement is captured during that journal’s own submission or production workflow — not retrofitted onto the preprint record itself.

    Authors typically complete role selections in the publisher’s manuscript system (for example, at revision or acceptance stage), and the resulting statement appears on the published article page and, where supported, in the article’s Crossref-deposited metadata. bioRxiv and medRxiv link out to the published version once available, but the CRediT statement itself lives with the publisher’s version of record, not the earlier preprint.

    Answer-First Q&A

    What is the CRediT taxonomy?

    The CRediT taxonomy is a standardised, 14-role controlled vocabulary — covering roles such as Conceptualization, Investigation, and Writing – original draft — used to describe each named author’s specific contribution to a research output, distinct from authorship order or byline position.

    What are the 14 roles of the CRediT taxonomy?

    The 14 roles are Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, and Writing – review & editing, as defined under ANSI/NISO Z39.104-2022.

    Do preprints need a CRediT statement?

    No. Neither arXiv nor bioRxiv currently requires a CRediT statement, since neither maintains the editorial enforcement mechanism or the structured metadata field that journals use to collect this information at submission or acceptance.

    What happens to author contributions when a preprint is later published?

    The CRediT statement is generated at the journal stage, through the publisher’s own submission system, and appears on the published version of record — it is not added retroactively to the original preprint page on arXiv or bioRxiv.

    Implications for Research Administrators and Institutions

    Institutions relying on contributorship data for research assessment, promotion cases, or authorship-dispute resolution should treat preprints as an incomplete contributorship record. The Contributor Roles Taxonomy resource maintained at CASRAI’s CRediT contributor roles hub and CASRAI’s broader authorship guidance both point research offices toward the published, CRediT-tagged version rather than the preprint when contributorship needs to be verified or cited formally.

    • Do not assume a preprint’s author order reflects final contribution roles — roles can shift before formal publication.
    • Check the journal’s published version, and its Crossref metadata where available, for the authoritative CRediT statement.
    • Use CASRAI’s research administration dictionary to confirm terminology when drafting institutional authorship policy.

    Outlook: Will Preprint Servers Adopt CRediT?

    Momentum toward richer preprint metadata is real but has so far concentrated on discoverability and version-linking rather than contributorship. Until arXiv or bioRxiv add a structured contribution field, and until a body with editorial standing is prepared to enforce it, CRediT statements will remain a journal-stage artefact rather than a preprint-stage one. Research offices and funders that want contributor-level accountability earlier in the research lifecycle will need to look to journal policy, not preprint infrastructure, for now.

  • scGPT bioRxiv: AI Biology Models Bypass Review

    scGPT bioRxiv preprints, alongside ESM3, AlphaFold-Multimer, Geneformer, EvolvePro and Chai-2, illustrate a 2026 pattern: AI foundation models for biology now reach bioRxiv months or years before — and sometimes instead of — formal peer review, shifting scrutiny onto the research community itself.

    A foundation model in biology is a large neural network pretrained on a broad corpus of sequence, structure or single-cell data, then fine-tuned for specific downstream tasks such as cell-type annotation, protein design or complex-structure prediction. bioRxiv is the open-access preprint server, now operated by the nonprofit openRxiv, where most of these models first appear.

    What is the bioRxiv wave of AI biology preprints?

    Since 2021, a cluster of high-profile AI foundation models for biology has appeared first as bioRxiv preprints rather than journal articles. scGPT, ESM3, AlphaFold-Multimer, Geneformer, EvolvePro and Chai-2 each disclosed model weights, training corpora and benchmark results on bioRxiv before, or without, completing formal peer review.

    This is not unique to biology, but the scale is notable. bioRxiv’s bioinformatics collection alone now holds over 42,000 preprints, and many of the field’s most-cited foundation-model papers spent a year or more circulating in preprint form before any journal version existed.

    Which models are driving this trend?

    Each model targets a different layer of biology — from single cells to protein complexes — but all six followed the same preprint-first disclosure pattern, with varying paths to formal review.

    Model Domain bioRxiv preprint date Peer-review status Headline result
    scGPT Single-cell multi-omics 1 May 2023 Nature Methods, 2024 Pretrained on over 10 million cells; preprint drew 1,490+ citations before formal publication
    ESM3 Protein sequence/structure/function 2 July 2024 Science, January 2025 Generated esmGFP, a novel fluorescent protein only 58% identical to its nearest known relative
    AlphaFold-Multimer Protein complex structure 4 October 2021 Still bioRxiv-only 67% success rate on heteromeric interfaces despite ubiquitous structural-biology use
    Geneformer Single-cell network biology No precursor preprint; v2 update posted August 2024 Nature, 31 May 2023 Pretrained on Genecorpus-30M, 29.9 million single-cell transcriptomes
    EvolvePro Protein engineering 17 July 2024 Still bioRxiv-only 2- to 515-fold activity gains across five therapeutic proteins
    Chai-2 Antibody and miniprotein design 6 July 2025 Still bioRxiv-only 16% hit rate in de novo antibody design, over 100x prior computational methods

    Two patterns stand out. First, Geneformer’s core 2023 paper went directly to Nature without a bioRxiv precursor, showing the pattern is not universal. Second, AlphaFold-Multimer, EvolvePro and Chai-2 remain, as of mid-2026, without any confirmed journal record despite being cited and deployed across thousands of downstream studies.

    Why publish before peer review?

    Competitive priority and speed dominate. Posting to bioRxiv creates a timestamped, public record of a result the moment it exists, which matters in a field where multiple labs often chase the same architecture within weeks of each other.

    • Immediate community stress-testing of code, weights and benchmark claims, often faster than a journal’s reviewer pool can respond.
    • Priority establishment ahead of competing labs working on the same problem class.
    • Faster onward use: downstream researchers can build on and cite a preprint immediately rather than waiting through a multi-month review cycle.

    Journals have adapted to this reality. Many now formally accept bioRxiv-posted work, and scGPT’s own trajectory — a 2023 preprint that drew over 1,490 citations before its 2024 Nature Methods publication — shows how much scientific traffic a foundation model can carry while still formally unreviewed.

    What are the research-integrity and attribution risks?

    The lack of independent review before wide reuse is the core risk. A 2026 bioRxiv preprint on researcher perceptions found that scientists rely heavily on author reputation, rather than review status, as their main heuristic for judging a preprint’s credibility — a fragile substitute for structured peer review, particularly for tools other labs adopt wholesale.

    Attribution is a related, distinct problem. When a foundation model like Chai-2 or ESM3 generates a candidate sequence that a human team then validates experimentally, contributor-credit questions arise: who conceived the method, who ran validation, and who is accountable for the claim. Both the International Committee of Medical Journal Editors and the Committee on Publication Ethics have stated that AI tools cannot be listed as authors, because they cannot take responsibility for the work’s accuracy or integrity.

    Structured contributor-role frameworks help resolve this. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. Applying CRediT roles to preprint co-authorship — distinguishing methodology, software, validation and formal analysis — gives institutions a documented way to assign human accountability even when an AI foundation model contributed materially to the output. See the broader CRediT framework overview and CASRAI’s authorship resources for related guidance.

    Answer-first Q&A

    Has the scGPT bioRxiv preprint been peer reviewed?

    Yes. The original scGPT preprint was posted to bioRxiv on 1 May 2023 and later passed formal peer review, publishing in Nature Methods in 2024. The preprint itself had already drawn more than 1,490 citations while still formally unreviewed.

    Why do AI foundation models for biology publish on bioRxiv before peer review?

    Competitive pressure and pace drive it. Posting to bioRxiv establishes priority and lets the wider research community stress-test claims, code and weights immediately, rather than waiting the months or years a formal peer-review cycle can take in a fast-moving field.

    Is AlphaFold-Multimer peer reviewed?

    No confirmed journal record exists for AlphaFold-Multimer itself; DeepMind’s preprint has remained on bioRxiv since 4 October 2021. It is nonetheless used routinely across structural biology — a stark example of a foundational tool that never completed formal peer review.

    Who owns bioRxiv?

    bioRxiv is operated by openRxiv, an independent nonprofit that assumed ownership from Cold Spring Harbor Laboratory in March 2025. The transfer aimed to secure the preprint server’s long-term governance as its role in disseminating AI foundation model research has grown.

    Implications for institutions and publishers

    Research offices and publishers now need explicit policy on how preprinted AI foundation models are cited, credited and re-used before formal review completes. Institutional research-integrity offices should treat a bioRxiv-only model — such as AlphaFold-Multimer, EvolvePro or Chai-2 — as provisionally validated, not settled science, when it underpins funded work or clinical-adjacent claims.

    Research administrators managing grant compliance and output tracking should build preprint-status checks into their reporting workflows; CASRAI’s research administration resources outline how contributor-role and output-tracking practices adapt to fast-moving, preprint-first fields. As more foundation models follow this path, the distinction between “published” and “peer reviewed” will matter more, not less, for research integrity.