The systematic review is one of the most demanding forms of scholarship. Its authority rests not on a single clever insight but on a method so transparent that another team could repeat it and reach the same conclusions. Every decision — which databases were searched, with which terms, how records were screened and data extracted, how risk of bias was judged — is meant to be reported in enough detail to be checked. It is precisely this commitment to method that makes the arrival of artificial intelligence in evidence synthesis so consequential. Review teams now use AI and machine-learning tools to help screen thousands of abstracts, classify studies, extract data and even draft text. These tools save effort, but they introduce decisions that must be reported with the same rigour as every other step, or the chain of transparency breaks. This article examines how disclosure norms are forming, drawing on the generative-AI disclosure domain of the CASRAI Dictionary.
Why AI in evidence synthesis is different
Disclosing AI use in an ordinary research article is largely about honesty and attribution. In a systematic review the stakes are higher, because the AI is not merely an aid to writing but a participant in the method itself. When a machine-learning classifier helps decide which abstracts are worth full-text screening, it shapes which evidence enters the review and which is excluded. An undisclosed, undocumented automated screening step is a hole in the method through which bias and error can enter unseen. The reader cannot judge a review’s reliability without knowing that part of the screening was automated, which tool was used, how it was configured, and how its decisions were checked. Transparency about AI is therefore not an optional courtesy; it is part of the reproducibility that gives a review its standing.
PRISMA 2020 and reporting completeness
The dominant standard for reporting systematic reviews is PRISMA 2020 — the Preferred Reporting Items for Systematic Reviews and Meta-Analyses. PRISMA does not tell authors how to conduct a review; it specifies what they must report so that readers can assess and reproduce it. Its items cover the search, the selection process, the data-collection process and much else, and its flow diagram tracks records from identification through screening to inclusion. The logic of PRISMA maps naturally onto the question of AI: wherever an automated tool participated in identification, screening, data extraction or synthesis, the completeness PRISMA demands implies that this participation be described. The reporting community has been extending this thinking, with guidance and PRISMA-style extensions clarifying how the use of automation tools should appear in the methods and in the flow of records, so that an AI-assisted review is documented to the same standard as a wholly manual one.
Cochrane and the careful adoption of automation
Few organisations have engaged with automation in reviews as deeply as Cochrane, whose systematic reviews are a benchmark for the field. Cochrane has cautiously adopted machine-learning tools for tasks such as study classification and screening prioritisation, while insisting they be used in ways that preserve the rigour and transparency reviews require. The consistent themes are instructive: automation may assist human reviewers but should not silently replace human judgement on consequential decisions; a tool’s performance and limitations must be understood; and its use must be reported. Cochrane’s measured approach offers a model for the field, demonstrating that the answer to AI in evidence synthesis is neither prohibition nor uncritical enthusiasm but disciplined, transparent use.
The RAISE recommendations
As AI tools have proliferated, the community has worked towards shared recommendations for using them responsibly in evidence synthesis, captured in efforts under the banner of RAISE — responsible AI in evidence synthesis. The thrust of such work is to articulate principles that let reviewers benefit from AI without compromising the integrity of their conclusions. These principles recur across the emerging guidance:
- Human responsibility. The review team remains accountable for every decision, including those an AI tool assisted; responsibility cannot be delegated to a tool.
- Transparency of tools. The specific tools used, their versions and how they were configured should be reported, so the method can be understood and repeated.
- Validation. The performance of an AI tool on the task at hand should be assessed, and its outputs checked, rather than trusted blindly.
- Clear reporting of role. Exactly which steps the AI participated in — screening, extraction, synthesis — should be stated, so readers know where human and machine judgement met.
What an AI-assisted review should report
Drawing these strands together, an evidence synthesis that used AI should make several things plain: it should name the tools and their versions and describe what each was used for; explain how each was configured and, where relevant, trained or calibrated; describe how its decisions were checked, such as whether a sample of automated screening decisions was verified by human reviewers; and be honest about limitations. None of this requires abandoning AI; it requires treating AI exactly as the method demands every other step be treated — described fully enough to be judged and repeated. A review that conceals its use of automation forfeits the very transparency that distinguishes evidence synthesis from mere opinion.
A consistent vocabulary for AI disclosure
For disclosures of this kind to be useful across journals, databases and the systems that index reviews, what is being disclosed has to be described consistently — which tool, used for which task, checked in which way. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so that a statement about AI use in a review means the same thing wherever it is recorded. And because conducting a systematic review is substantial, recognisable scholarly work — searching, screening, extracting, appraising, synthesising — the contributions behind it can be described in the same shared framework, the CRediT taxonomy and its full set of contribution roles, which sits within the broader practice of research administration. AI will keep changing how evidence is synthesised; the enduring obligation — to report the method fully and honestly — is what keeps a review trustworthy.