Tag: false positives

  • Detecting AI-generated text in research: the limits of detectors and the policy response

    As text generated by large language models has flooded into every kind of writing, including the scholarly, an obvious-seeming solution has emerged alongside it: if AI can write text, surely software can detect AI-written text and flag it. A market of AI detectors has grown up promising exactly this, and editors, institutions and reviewers, understandably anxious about undisclosed AI use, have been tempted to lean on them. But the comfortable assumption that detection can police the boundary between human and machine writing does not survive contact with how these tools actually perform. The limits of AI-text detection are now well enough understood that responsible integrity policy is moving away from detection and towards disclosure instead. This article examines why, drawing on the generative-AI disclosure domain of the CASRAI Dictionary.

    How AI-text detectors work, and why that is a problem

    AI-text detectors do not have any direct insight into whether a particular passage was generated by a model. They cannot — the text carries no reliable hidden marker. Instead, they estimate the probability that text was machine-generated, typically by looking at statistical features such as how predictable or uniform the writing is, on the reasoning that model output tends to be smoother and less idiosyncratic than human prose. This is inference from surface features, not detection of a fact, and that distinction is the root of the problem. A probabilistic guess based on writing style can be wrong in both directions, and the consequences of being wrong are not symmetric.

    The false-positive problem

    The most serious failing is the false positive: human-written text wrongly flagged as AI-generated. This is not a rare edge case but a structural weakness, and certain groups are disproportionately affected. Writing that is more formulaic, more uniform, or simply less idiosyncratic — precisely the characteristics of much technical and scientific prose, and notably of writing by people composing in a language that is not their first — can read as “machine-like” to a detector and be flagged accordingly. The implications are grave: a researcher could be falsely accused of undisclosed AI use, with reputational damage and integrity proceedings, on the basis of a tool that is essentially guessing from style. Studies and practical experience have repeatedly shown that detectors flag genuine human writing as AI-generated, and that non-native English writers bear an unfair share of these errors. To build serious consequences on so unreliable a foundation is itself an integrity failure.

    The false-negative problem and the arms race

    The opposite error is also pervasive. Detectors miss genuinely AI-generated text — the false negative — and they are especially easy to evade. Lightly editing or paraphrasing model output, or running it through tools designed to disguise its origins, can readily defeat a detector. This produces a futile arms race: as detectors improve, evasion techniques adapt, and the detectors fall behind again. The combination is the worst of both worlds: a tool that wrongly accuses the honest while failing to catch the deliberately deceptive offers no sound basis for policy. It punishes the wrong people and misses the right ones.

    Why detection is the wrong foundation

    Step back from the technical failings and a deeper point emerges. Detection asks the wrong question. It tries to establish, after the fact and by inference, whether a tool was used — an inherently uncertain enquiry that pits unreliable software against the people it judges, and that treats authors as suspects. It also misframes the issue. The concern with generative AI in research is not that text was produced with a tool’s help per se; it is whether AI was used appropriately, whether the human author takes responsibility for the result, and whether the use was honestly declared. None of those things can be read off a statistical style score. A detector cannot tell you whether AI was used responsibly; at best it makes an unreliable guess about whether it was used at all. Policy built on detection is therefore aimed at the wrong target with an unfit instrument.

    Disclosure as the better answer

    The more defensible foundation is disclosure. Rather than trying to catch undisclosed use through faulty detection, integrity policy increasingly asks authors to declare how they used generative AI — what tool, for what purpose, in what part of the work — and holds them responsible for the result. This approach has several advantages. It treats authors as accountable adults rather than suspects. It targets what actually matters — responsible use and honesty — rather than the mere fact of use. It avoids the injustice of false accusations generated by unreliable software. And it builds a culture of transparency rather than one of surveillance and evasion. Editorial and integrity bodies, including the Committee on Publication Ethics (COPE), have emphasised that AI cannot be an author and cannot bear responsibility, that human authors remain accountable for their work, and that the appropriate response to AI use is transparency about it. The emphasis falls on what authors declare, not on what software claims to detect.

    A consistent vocabulary for disclosure

    If disclosure is to do the work that detection cannot, then what is disclosed has to be described consistently — what tool was used, for what purpose, at what stage — so that a declaration made to one journal is understood the same way by another, and by the institutions and databases that consume it. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so that statements about AI use mean the same thing wherever they appear. And because authorship and accountability remain human, the contributions behind a work can be described in the established framework for them — the CRediT taxonomy and its full set of contribution roles, which sits alongside the principles of honest authorship. The instinct to fight AI text with AI detection is understandable, but it rests on tools that cannot bear the weight placed on them. The durable answer is not better detectors but honest disclosure and human accountability.