Tag: publisher policy

  • Generative AI in peer review: disclosure, confidentiality and policy

    Much of the discussion about generative AI in scholarly publishing has focused on authors: what they must disclose when they use AI tools to help write a paper. But there is a second point in the publishing process where generative AI raises questions that are arguably sharper and less appreciated — peer review. Reviewers, facing the same time pressures and the same powerful new tools as everyone else, have begun to use generative AI to help them assess manuscripts: to summarise a paper, to draft a review, to check a method. This is understandable, but it collides with something foundational to peer review, namely the confidentiality on which the whole system rests. This article examines how disclosure, confidentiality and policy are taking shape around AI in peer review, drawing on the generative-AI disclosure domain of the CASRAI Dictionary. For the author side of the question, see our guidance on AI disclosure for authors.

    The confidentiality problem

    The most serious issue is also the least obvious to a busy reviewer. A manuscript under review is a confidential document. It contains unpublished work — ideas, data, results — that the authors have shared with the journal in trust, on the understanding that reviewers will keep it private and use it only to evaluate the paper. When a reviewer pastes that manuscript, or substantial parts of it, into a public generative-AI tool to help write their review, they may be doing something they have not fully thought through: uploading confidential, unpublished work to a third-party system outside the journal’s control. Depending on the tool and its terms, that content may be transmitted, stored or even used to train future models. This is a potential confidentiality breach of the most basic kind — the unpublished work of authors who never consented to it being exposed to an external service. It is for precisely this reason that a clear line has emerged in policy.

    Why publishers ban uploading manuscripts to LLMs

    In response, many publishers and editorial bodies have adopted a firm position: reviewers must not upload manuscripts, or parts of them, into generative-AI tools. The reasoning is the confidentiality concern above. A manuscript is not the reviewer’s to share; entering it into an external large language model is a form of sharing it, and the journal cannot guarantee what happens to it once it leaves. Publisher policies on this point have tended to be more categorical than their policies on authors’ use of AI, and the difference is instructive. An author who uses AI to help write their own paper is sharing their own work, which they are entitled to do; a reviewer who feeds someone else’s unpublished manuscript into a chatbot is sharing work that belongs to others and was entrusted to them in confidence. The asymmetry of ownership is what makes the reviewer’s situation different, and why “do not upload the manuscript” has become a common, near-bright-line rule.

    The judgement problem

    Confidentiality is the clearest concern but not the only one. Peer review exists to provide expert, accountable human judgement on the quality, validity and significance of a piece of work. A generative-AI system can produce fluent, plausible-sounding text about a manuscript, but it does not understand the field, cannot vouch for the correctness of a method, and can generate confident assessments that are simply wrong. If a reviewer leans on an AI tool to form — rather than merely to polish — their assessment, the review risks becoming an exercise in plausibility rather than expertise, while the authors and editor believe they are receiving genuine expert scrutiny. The integrity of peer review depends on the judgement being a real reviewer’s, and on that reviewer remaining accountable for it. A review that outsources its substance to a model fails the authors, who are owed expert attention, and the editor, who is relying on it to make a decision.

    The role of disclosure

    Where some use of AI in review is permitted — for instance, modest help with the language of a review that the reviewer has genuinely written and stands behind — disclosure becomes the governing principle, just as it is for authors. The norms taking shape include several expectations:

    • Confidentiality first. No part of a manuscript should be entered into an external AI tool, regardless of any other consideration.
    • Reviewer accountability. The reviewer remains fully responsible for the content and judgements of their review; AI cannot be a reviewer or bear responsibility.
    • Transparency. Where a tool has been used in preparing a review in a permitted way, that use should be disclosed to the editor, so the editor can weigh it.
    • Following the journal’s policy. Because policies differ, reviewers are expected to check and comply with the specific policy of the venue they are reviewing for.

    COPE and the evolving consensus

    Editorial and integrity bodies have been central to shaping this consensus. The Committee on Publication Ethics (COPE) and individual publishers have issued guidance that consistently emphasises the same core points: the manuscript’s confidentiality must be protected, the reviewer’s human accountability cannot be delegated to a tool, and the use of AI should be transparent. This guidance is still evolving as the tools and the practices around them change, but the foundations are settling. The reviewer’s duty of confidentiality and their duty to provide genuine expert judgement are not new; what is new is a class of tools that can quietly undermine both, and the policy response is essentially an effort to reassert those long-standing duties in a changed technical environment.

    A consistent way to describe AI use

    For disclosure to be meaningful across journals and systems, what is being disclosed must be described consistently — what tool was used, for what purpose, by whom in the process. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so that a statement about AI use in authorship or review is understood the same way wherever it is recorded. And because peer review is itself a genuine, increasingly recognised contribution, the work reviewers do can be described within the same framework used for every other — the CRediT taxonomy and its full set of contribution roles. Generative AI will keep changing how scholarly work is produced and assessed; the durable principles — confidentiality, human accountability and honest disclosure — are what peer review must protect as it adapts.

  • Detecting AI-generated text in research: the limits of detectors and the policy response

    As text generated by large language models has flooded into every kind of writing, including the scholarly, an obvious-seeming solution has emerged alongside it: if AI can write text, surely software can detect AI-written text and flag it. A market of AI detectors has grown up promising exactly this, and editors, institutions and reviewers, understandably anxious about undisclosed AI use, have been tempted to lean on them. But the comfortable assumption that detection can police the boundary between human and machine writing does not survive contact with how these tools actually perform. The limits of AI-text detection are now well enough understood that responsible integrity policy is moving away from detection and towards disclosure instead. This article examines why, drawing on the generative-AI disclosure domain of the CASRAI Dictionary.

    How AI-text detectors work, and why that is a problem

    AI-text detectors do not have any direct insight into whether a particular passage was generated by a model. They cannot — the text carries no reliable hidden marker. Instead, they estimate the probability that text was machine-generated, typically by looking at statistical features such as how predictable or uniform the writing is, on the reasoning that model output tends to be smoother and less idiosyncratic than human prose. This is inference from surface features, not detection of a fact, and that distinction is the root of the problem. A probabilistic guess based on writing style can be wrong in both directions, and the consequences of being wrong are not symmetric.

    The false-positive problem

    The most serious failing is the false positive: human-written text wrongly flagged as AI-generated. This is not a rare edge case but a structural weakness, and certain groups are disproportionately affected. Writing that is more formulaic, more uniform, or simply less idiosyncratic — precisely the characteristics of much technical and scientific prose, and notably of writing by people composing in a language that is not their first — can read as “machine-like” to a detector and be flagged accordingly. The implications are grave: a researcher could be falsely accused of undisclosed AI use, with reputational damage and integrity proceedings, on the basis of a tool that is essentially guessing from style. Studies and practical experience have repeatedly shown that detectors flag genuine human writing as AI-generated, and that non-native English writers bear an unfair share of these errors. To build serious consequences on so unreliable a foundation is itself an integrity failure.

    The false-negative problem and the arms race

    The opposite error is also pervasive. Detectors miss genuinely AI-generated text — the false negative — and they are especially easy to evade. Lightly editing or paraphrasing model output, or running it through tools designed to disguise its origins, can readily defeat a detector. This produces a futile arms race: as detectors improve, evasion techniques adapt, and the detectors fall behind again. The combination is the worst of both worlds: a tool that wrongly accuses the honest while failing to catch the deliberately deceptive offers no sound basis for policy. It punishes the wrong people and misses the right ones.

    Why detection is the wrong foundation

    Step back from the technical failings and a deeper point emerges. Detection asks the wrong question. It tries to establish, after the fact and by inference, whether a tool was used — an inherently uncertain enquiry that pits unreliable software against the people it judges, and that treats authors as suspects. It also misframes the issue. The concern with generative AI in research is not that text was produced with a tool’s help per se; it is whether AI was used appropriately, whether the human author takes responsibility for the result, and whether the use was honestly declared. None of those things can be read off a statistical style score. A detector cannot tell you whether AI was used responsibly; at best it makes an unreliable guess about whether it was used at all. Policy built on detection is therefore aimed at the wrong target with an unfit instrument.

    Disclosure as the better answer

    The more defensible foundation is disclosure. Rather than trying to catch undisclosed use through faulty detection, integrity policy increasingly asks authors to declare how they used generative AI — what tool, for what purpose, in what part of the work — and holds them responsible for the result. This approach has several advantages. It treats authors as accountable adults rather than suspects. It targets what actually matters — responsible use and honesty — rather than the mere fact of use. It avoids the injustice of false accusations generated by unreliable software. And it builds a culture of transparency rather than one of surveillance and evasion. Editorial and integrity bodies, including the Committee on Publication Ethics (COPE), have emphasised that AI cannot be an author and cannot bear responsibility, that human authors remain accountable for their work, and that the appropriate response to AI use is transparency about it. The emphasis falls on what authors declare, not on what software claims to detect.

    A consistent vocabulary for disclosure

    If disclosure is to do the work that detection cannot, then what is disclosed has to be described consistently — what tool was used, for what purpose, at what stage — so that a declaration made to one journal is understood the same way by another, and by the institutions and databases that consume it. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so that statements about AI use mean the same thing wherever they appear. And because authorship and accountability remain human, the contributions behind a work can be described in the established framework for them — the CRediT taxonomy and its full set of contribution roles, which sits alongside the principles of honest authorship. The instinct to fight AI text with AI detection is understandable, but it rests on tools that cannot bear the weight placed on them. The durable answer is not better detectors but honest disclosure and human accountability.

  • Disclosing AI-generated images and figures in research

    Most of the debate about generative AI in research has concerned the written word: what authors must declare when they use AI tools to help draft a manuscript. But generative AI does not only produce text. It produces images — and images in a scientific paper occupy a fundamentally different place from prose. A figure is often presented as evidence: a micrograph, a gel, a scan, a chart of results is taken to be a faithful record of something that was observed or measured. When such an image can be conjured by a model that has never observed anything, the most basic assumption of scientific communication — that what you are shown is real — comes under threat. This is why AI-generated images raise concerns that are sharper, and in some respects more dangerous, than those raised by AI-generated text. The question belongs within the generative-AI disclosure domain of the CASRAI Dictionary.

    Why images are different from text

    The crucial difference is the relationship between an image and reality. Text in a paper is understood to be the authors’ account, their argument and interpretation; readers know it is written. But many images in scientific papers are understood to be records of fact — this is what the cells looked like, this is the structure that was resolved, this is the experimental output. The integrity of the entire scientific record depends on that understanding holding true. A generative model can now produce an image that looks exactly like such a record but corresponds to nothing that was ever observed. In the context of a results figure, that is not illustration; it is fabrication, which has always been among the gravest forms of research misconduct. The danger is not hypothetical or cosmetic: a fabricated figure can mislead readers, reviewers and an entire field into believing in findings that do not exist.

    The integrity and manipulation concerns

    Several specific concerns flow from this:

    • Fabrication of results. The starkest risk is the generation of fake data visualisations, fake imaging results or fake experimental outputs presented as genuine — falsification of the scientific record itself.
    • Undisclosed manipulation. AI tools that retouch, enhance or alter genuine images can cross the long-standing line between acceptable adjustment and impermissible manipulation, especially when done invisibly.
    • Erosion of trust. If readers can no longer assume that scientific images are authentic, the evidential value of figures — central to how science is communicated and checked — is undermined across the board.
    • Detection difficulty. Synthetic images can be very hard to distinguish from real ones, which means the safeguards cannot rely on catching fakes after the fact and must lean heavily on honesty and clear rules.

    These concerns sit on top of a pre-existing problem: image integrity, including inappropriate duplication and manipulation of figures, was already one of the most common sources of research-integrity cases before generative AI arrived. The new tools pour fuel on a fire already burning.

    Why many publishers ban AI-generated figures

    Faced with this, a notable number of publishers and journals have taken a stricter line on images than on text. Where policies on AI-assisted writing typically permit it subject to disclosure, policies on AI-generated images frequently prohibit them in scientific figures outright, particularly where a figure represents data or results. The reasoning follows directly from the difference described above. An AI tool that helps phrase a sentence does not pretend the sentence is an observation; an AI-generated results figure, by its nature, presents as real something that is not. Because the risk is fabrication of evidence rather than mere stylistic assistance, the proportionate response is often a ban rather than disclosure. A common pattern in publisher policy is therefore a near-prohibition on generative AI for figures that depict data, alongside more permissive, disclosure-based treatment of AI used for purely decorative or schematic illustrations that no one would mistake for evidence — and even then, transparency is expected.

    The disclosure requirement

    Where AI-generated or AI-assisted imagery is permitted at all — for example, a clearly labelled conceptual illustration, a graphical abstract, or a schematic — disclosure becomes the governing requirement, as it is for text. The emerging expectations include several elements:

    • Declare any use. Authors should state where and how generative AI was used to create or alter images, so that readers and editors are not misled about what they are looking at.
    • Never pass off generated content as real data. The bright line is that no AI-generated image may be presented as an authentic record of an observation or result.
    • Preserve and provide originals. Authors are increasingly expected to retain unaltered original images and supply them on request, so that genuine figures can be verified.
    • Follow the specific journal’s policy. Because rules differ between venues, authors must check and comply with the policy of the journal they are submitting to.

    COPE and the integrity bodies

    Editorial and integrity organisations have shaped this developing consensus. The Committee on Publication Ethics (COPE) and publishers’ own guidance consistently anchor the discussion in long-standing principles of image integrity — that figures must faithfully represent the work, that manipulation which misleads is misconduct, and that AI does not change these duties but makes them more pressing. The framing is instructive: generative AI has not created a new category of wrongdoing so much as made an old and serious one — fabricating or manipulating the visual evidence in a paper — far easier to commit and harder to detect. The policy response is a reassertion of the principle that scientific images must be honest, adapted to a world in which dishonest images are trivially easy to make. These questions connect closely to the responsibilities discussed in our material on authorship and accountability.

    A consistent vocabulary for disclosure

    For disclosure of AI use in images to be meaningful across journals, repositories and integrity systems, what is disclosed must be described consistently — what tool was used, on which figures, for what purpose, and whether the result is illustrative or evidential. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so that a statement about AI use in a paper’s figures is understood the same way wherever it is recorded. And because creating genuine figures and visualisations is real research contribution, that work can be described using the same framework as any other — the CRediT taxonomy and its full set of contribution roles, with Visualization recognising the creation of legitimate data presentation. Generative AI can make a convincing picture of anything; the durable principle — that the images in a scientific paper must be truthful representations of real work — is exactly what disclosure and these firmer rules exist to defend.