bioRxiv and medRxiv preprints are free to read for anyone online, but free access is not the same as an open licence. Each preprint carries an author-selected licence — from CC0 to an all-rights-reserved “no reuse” option — and that licence, not the free posting, governs mining, translation and redistribution.
A preprint licence is the copyright permission an author selects on submission, separate from the server’s free hosting, that determines whether others may copy, mine, translate or commercially reuse the manuscript. Treating free preprints as automatically open-licensed is one of the most common compliance errors made by research offices, publishers and developers building text-mining pipelines.
- What licences do bioRxiv and medRxiv actually use?
- Free to read is not free to reuse
- Text-and-data-mining: what “free access” really permits
- Translation and derivative works
- Answer-first questions on preprint licensing
- Implications for institutions, publishers and funders
- The bottom line
What licences do bioRxiv and medRxiv actually use?
Authors who post to bioRxiv or medRxiv retain copyright of their manuscript and choose one licence from a fixed menu at submission. According to bioRxiv’s own FAQ, the options are CC0, CC-BY, CC-BY-NC, CC-BY-ND, CC-BY-NC-ND, and a non-Creative-Commons option labelled “no reuse/adaptation without permission.” The chosen licence, and the copyright holder’s name, are displayed beneath the abstract and on the Info/History tab of every preprint page.
bioRxiv and medRxiv are co-managed by openRxiv, a non-profit founded by Cold Spring Harbor Laboratory and funded by organisations including the Chan Zuckerberg Initiative, Caltech, Fred Hutchinson Cancer Center, Imperial College London, MIT, Stanford, the University of Edinburgh and the University of Washington. medRxiv launched in June 2019, following what openRxiv describes as a “restricted preprint pilot for clinical research” run on bioRxiv, and now handles clinical trial and most epidemiology submissions that bioRxiv no longer accepts.
| Licence | Share unaltered | Adapt or translate | Commercial reuse | Bulk TDM redistribution |
|---|---|---|---|---|
| CC0 | Yes, no credit required | Yes | Yes | Yes |
| CC-BY | Yes, with credit | Yes, with credit | Yes | Yes, with credit |
| CC-BY-NC | Yes, with credit | Yes, with credit | No | Non-commercial only |
| CC-BY-ND | Yes, with credit | No – permission required | Yes, unaltered copies only | No |
| CC-BY-NC-ND | Yes, with credit, non-commercial | No – permission required | No | No |
| No reuse without permission | No – permission required | No – permission required | No | No (fair-use TDM only) |
Free to read is not free to reuse
Every preprint on bioRxiv and medRxiv can be read and downloaded at no cost. That access right is fixed and identical for every article on the servers. Reuse rights are not fixed — they vary preprint by preprint, set individually by the corresponding author.
This distinction matters most for the no reuse/adaptation without permission option, which openRxiv’s FAQ defines as requiring anyone who wants to “share, reuse, remix, or adapt this material” to first contact the corresponding author. A preprint under this licence is fully open to read but functionally closed to redistribution, quotation beyond fair dealing, or reuse in a derivative dataset, review or AI training corpus without direct author permission.
Funders that require CC-BY licensing for the outputs they fund add a further compliance layer. UKRI’s open access policy requires a CC-BY licence for the peer-reviewed journal articles it funds; a preprint of that same work posted under CC-BY-NC-ND would not carry the same reuse terms as the eventual published version, creating a mismatch that research offices must reconcile before claiming policy compliance.
Text-and-data-mining: what “free access” really permits
bioRxiv operates a dedicated Amazon S3 resource offering bulk full-text access for text-and-data-mining (TDM). This is a separate mechanism from ordinary free reading, and it is governed by its own rules, not by the reader-facing licence badge alone.
Two facts define the limits of this access. First, bulk TDM access is delivered through a requester-pays S3 bucket — the user, not bioRxiv, covers Amazon’s retrieval charges. Second, and more consequential for developers, bioRxiv states explicitly that the TDM repository “is not intended as a source for further redistribution of articles posted on bioRxiv, or their derivatives, nor does it grant others permission to re-host content.” Bulk access is granted under a fair-use rationale that authors consent to on submission; it does not override the individual reuse licence selected for each manuscript.
- Indexing tools built on the TDM feed must link back to the article on bioRxiv or medRxiv rather than re-host the text.
- Redistributing or building a derivative dataset from an individual article still requires checking that article’s specific licence in its metadata.
- Where the licence is CC-BY or CC0, redistribution of that article’s content is permitted; where it is CC-BY-ND, CC-BY-NC-ND, or “no reuse,” redistribution requires the corresponding author’s prior permission regardless of TDM access.
Translation and derivative works
Translation is a derivative work under copyright law, so the same licence table that governs adaptation governs translation. A CC-BY or CC-BY-NC preprint can be translated and republished (non-commercially, in the NC case) provided the original authors and source are credited. A CC-BY-ND, CC-BY-NC-ND, or “no reuse” preprint cannot be translated and redistributed without first obtaining the corresponding author’s written permission, because a translation alters the work.
bioRxiv’s submission rules already anticipate this gap: preprints must be submitted in English because screening staff can only review content in that language, and authors wishing to make a non-English version available are directed to provide it themselves or point readers to the corresponding author. There is no server-run translation service on either platform — translation always sits downstream of the licence the author chose.
Answer-first questions on preprint licensing
Are preprints free to read?
Yes. bioRxiv and medRxiv provide free, unrestricted reading and downloading of every posted preprint, with no paywall, subscription or registration requirement for readers. Free access applies uniformly regardless of which reuse licence the author has selected for that manuscript.
What are the disadvantages of restrictive preprint licences?
A restrictive licence — CC-BY-ND, CC-BY-NC-ND, or “no reuse without permission” — blocks translation, text-mining redistribution, and reuse in derivative datasets or reviews without direct author contact. This slows systematic reviews, AI training-data curation, and cross-language dissemination, even though the underlying manuscript remains free to read.
Can I text-and-data-mine bioRxiv preprints in bulk?
Yes, via a requester-pays Amazon S3 bucket that bioRxiv operates specifically for bulk TDM. This access is granted under a fair-use rationale for mining, not for re-hosting; redistributing extracted content still depends on each article’s individual licence, checked in its metadata.
Can I translate a bioRxiv or medRxiv preprint into another language?
Only if the licence permits it. CC-BY and CC-BY-NC preprints allow translation with attribution (and non-commercially, for NC). CC-BY-ND, CC-BY-NC-ND, and “no reuse” preprints require the corresponding author’s written permission before any translated version can be shared.
Implications for institutions, publishers and funders
Research offices tracking open-access compliance cannot rely on a preprint’s presence on a free repository as proof of open licensing; the licence field in the article metadata is the only reliable signal, and it must be checked per manuscript, not per server. Institutional repositories harvesting bioRxiv or medRxiv metadata should surface the licence alongside the DOI so that downstream reuse decisions do not default to an assumption of openness.
Publishers operating the growing bioRxiv-to-journal and journal-to-bioRxiv integration — openRxiv reports 249 participating journals and counting — need workflows that carry the preprint’s original licence forward or reconcile it against the published article’s licence, since the two need not match. Funders requiring CC-BY for funded outputs should confirm that grantees apply a compatible licence at the preprint stage, not only at final publication, to avoid a compliance gap that only surfaces at audit.
Developers building large-scale training or mining corpora should treat the S3 TDM feed as an access mechanism, not a licensing determination, and filter by the per-article reuse licence before including preprint text in any redistributed dataset.
The bottom line
Free preprints and open-licensed preprints are not the same category, and the gap between them is now a standard compliance question for research offices, publishers, funders and toolmakers working across bioRxiv and medRxiv. As TDM and generative-AI use of the scholarly record grows, expect indexers, funders and repositories to push harder for licence metadata to be checked automatically rather than assumed — the free-to-read badge on a preprint page will keep meaning exactly what openRxiv says it means: access, not reuse.
Leave a Reply