MedRxiv Goldacre: OpenSAFELY’s Preprint Audit Trail

MedRxiv Goldacre refers to the publication route pioneered by Ben Goldacre’s OpenSAFELY platform, in which NHS data analyses run inside a secure trusted research environment (TRE) are cleared through disclosure control and institutional sign-off, then posted as a citable preprint on medRxiv — often weeks, not years, after the underlying question was asked. This route matters because it creates an auditable, dated record connecting a specific dataset, a specific analysis, and a specific claim, before formal peer review has occurred.

OpenSAFELY is a trusted research environment operated by the Bennett Institute for Applied Data Science at the University of Oxford, built to analyse NHS primary care data without that data ever leaving NHS-controlled infrastructure. Since 2020, OpenSAFELY has used medRxiv as its default destination for rapid-turnaround findings, and understanding how it gets there reveals a governance model that other TRE operators, funders, and publishers are now studying closely.

What is OpenSAFELY and why does it publish through medRxiv?
How does an OpenSAFELY analysis become a medRxiv preprint?
How does this route compare with other rapid-turnaround platforms?
Answer-first Q&A: medRxiv, Goldacre, and preprint trust
What this means for the audit trail between analysis and citation

What is OpenSAFELY and why does it publish through medRxiv?

OpenSAFELY is a secure analytics platform, not a data warehouse: researchers write and test analysis code against artificial “dummy data” on the open-source code-sharing platform GitHub, then submit that same code to run inside the secure environment against real, pseudonymised patient records. No researcher ever downloads or directly views raw patient-level data.

The platform emerged from the same policy pressure that produced the 2022 Goldacre Review, commissioned by the Department of Health and Social Care (DHSC), which recommended trusted research environments as the default access model for NHS data rather than one-off data extracts sent to individual research teams. medRxiv became the natural publication venue because it accepts clinical and health-services research preprints without requiring the months-long peer-review cycle that a TRE’s rapid-response use cases — outbreak monitoring, prescribing-safety signals, vaccine coverage gaps — cannot wait for.

How does an OpenSAFELY analysis become a medRxiv preprint?

The route from a locked-down NHS dataset to a public, citable preprint follows a fixed sequence of checkpoints, each of which leaves a documented trace. This is the audit trail that distinguishes a TRE-originated preprint from an ordinary manuscript upload.

Stage	What happens	Who checks it
Code development	Analysis code is written and version-controlled on GitHub against dummy data	Study authors (publicly logged commit history)
Secure execution	The same code runs against real pseudonymised records inside the TRE	OpenSAFELY platform
Disclosure control	Only aggregated tables and figures may leave the environment	At least two trained reviewers
Clinical DATAPAST review	Manuscript interpretation and clinical soundness are checked	Bennett Institute reviewers
Sponsor approval	Confirms the analysis matches its originally approved purpose	Department of Health and Social Care
Preprint submission	Manuscript is submitted to medRxiv and publicly logged on opensafely.org	Study authors

Two features make this chain unusually strong evidence-wise. First, the code that produced the result is public and version-dated on GitHub before the data ever ran — it cannot be quietly rewritten after the fact to fit a conclusion. Second, disclosure control is performed by at least two trained and qualified individuals independently of the study authors, closing the gap between “the analyst says this is safe to release” and “an independent reviewer confirmed it.”

Only after Clinical DATAPAST review and DHSC sign-off can a manuscript go to medRxiv. This ordering matters for research administrators building provenance records: the preprint’s medRxiv timestamp is not the start of the audit trail, it is the end of it.

How does this route compare with other rapid-turnaround platforms?

OpenSAFELY is not the only initiative that has used medRxiv as a staging post for time-critical clinical findings. The REMAP-CAP adaptive platform trial, an international collaboration testing COVID-19 treatments across intensive care units, posted interim domain results — such as its tocilizumab and corticosteroid findings — as preprints ahead of full journal publication, allowing treatment guidance to move faster than a conventional submission-to-print cycle would allow. Industry-sponsored trials have followed a similar pattern: the Oxford–AstraZeneca vaccine programme’s later-stage US trial data also moved through preprint posting before appearing in peer-reviewed form, illustrating that even regulator-scrutinised, commercially sponsored research increasingly uses preprint servers for rapid, dated disclosure rather than waiting on the peer-review queue.

OpenSAFELY: TRE-based NHS primary care and hospital data; governed by Bennett Institute DATAPAST review plus DHSC approval; destination is medRxiv.
REMAP-CAP: multi-country adaptive ICU trial; governed by its international trial steering committee; interim domain results posted as preprints, typically to medRxiv.
Oxford–AstraZeneca trials: industry-sponsored vaccine trials; governed by trial sponsors and independent data safety monitoring boards; later data disclosed via preprint ahead of journal publication.

What differentiates OpenSAFELY is that its governance sits inside the data infrastructure itself — disclosure control and clinical review happen before a manuscript exists in a shareable form — rather than being layered on afterwards by a trial sponsor or committee.

Answer-first Q&A: medRxiv, Goldacre, and preprint trust

Who owns medRxiv?

medRxiv was founded in June 2019 by John Inglis and Richard Sever of Cold Spring Harbor Laboratory, Theodora Bloom and Claire Rawlinson of BMJ, and Joseph Ross and Harlan Krumholz of Yale University. Cold Spring Harbor Laboratory operated the server until 11 March 2025, when ownership transferred to the newly formed non-profit openRxiv, which also runs the sister site bioRxiv.

Is medRxiv peer reviewed?

No. medRxiv preprints are screened for basic completeness and potential harm before posting, but they are not formally peer reviewed. medRxiv itself states that preprints are preliminary reports whose content and results should not be used to guide clinical practice until they have passed through certified peer review.

Is medRxiv trustworthy?

medRxiv’s trustworthiness depends on the rigour applied before posting, not on the server itself. Findings that pass through a governed pathway — such as OpenSAFELY’s disclosure control, Clinical DATAPAST review, and DHSC approval — carry a stronger provenance signal than an unreviewed independent submission, even though both appear on the same platform.

Is medRxiv a preprint server?

Yes. medRxiv is a disciplinary preprint repository for the health sciences, distributing unpublished manuscripts free of charge. As of December 2024 it held more than 61,000 preprints and is indexed by Crossref, Google Scholar, Semantic Scholar, Europe PMC, and Web of Science’s Preprint Citation Index; NIH-funded preprints are additionally indexed in PubMed.

What this means for the audit trail between analysis and citation

For research administrators and institutional data governance teams, the OpenSAFELY-to-medRxiv route offers a template worth studying regardless of discipline: version-controlled code, independent disclosure control, named institutional review, and sponsor approval, each completed before the manuscript becomes public, rather than reconstructed afterwards for an inquiry or correction.

This has direct implications for how contributor roles and provenance are recorded around preprints. A CRediT-style breakdown of who wrote the code, who ran the disclosure check, and who gave clinical sign-off would make an already strong audit trail explicit at the point of publication rather than leaving it implicit in institutional records. CASRAI originated the CRediT contributor role taxonomy in 2014; the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022, and applying its role vocabulary to TRE-originated preprints is a natural extension of the model medRxiv and OpenSAFELY have already built.

As more TRE operators — across the NHS’s Secure Data Environment network and equivalent bodies internationally — adopt comparable review chains, the distinguishing question for readers and funders will shift from “is this on a preprint server?” to “can I see the governance chain that got it there?”. OpenSAFELY’s public logging of its approval steps on opensafely.org, alongside medRxiv’s own indexing infrastructure, points toward that more transparent norm becoming standard practice rather than an exception.

MedRxiv Goldacre: OpenSAFELY’s Preprint Audit Trail

Contents

What is OpenSAFELY and why does it publish through medRxiv?

How does an OpenSAFELY analysis become a medRxiv preprint?

How does this route compare with other rapid-turnaround platforms?

Answer-first Q&A: medRxiv, Goldacre, and preprint trust

Who owns medRxiv?

Is medRxiv peer reviewed?

Is medRxiv trustworthy?

Is medRxiv a preprint server?

What this means for the audit trail between analysis and citation

Comments

Leave a Reply Cancel reply

More posts

Horizon Europe Associated Countries Open Science Requirements

Science Europe Members’ Open Science Commitments

Horizon Europe Evaluation Criteria: Open Science

Horizon Europe Text and Data Mining Rights for AI Developers