Category: Perspectives

Opinion, argument, and field-shaping commentary on research-administration standards.

  • Tortured Phrases: Detecting Paper Mills in 2026

    The scholarly-publishing integrity ecosystem ended 2025 with the highest retraction rate ever recorded and the clearest evidence yet that industrial-scale fraud is structurally embedded in the literature. The numbers are sobering: Retraction Watch’s database crossed 60,000 entries in 2025; Hindawi/Wiley alone retracted over 11,000 papers across 2023-2024 following paper-mill detection; the Problematic Paper Screener now flags new manuscripts at a rate that strains journals’ capacity to investigate. This post maps the current threats, the detection tooling that has matured, and the United2Act coordination work that is beginning to produce a coherent industry response.

    Paper mills: the supply side

    A paper mill is a commercial operation that fabricates manuscripts and sells authorship slots on them. The mills emerged at significant scale around 2010-2012, driven by promotion-and-tenure incentives in jurisdictions where publication count is a hard quantitative requirement (early-career clinical researchers in some countries face explicit per-promotion-step publication quotas). The mills industrialised what individual fabrication had done for decades.

    The 2022-2024 Hindawi crisis (Wiley’s acquired open-access portfolio was infiltrated at scale, leading to 11,000+ retractions and the closure of several journals) made the systemic nature visible. The Hindawi pattern was: mill-generated manuscripts submitted to special-issue calls in low-rigour journals, peer-reviewed by mill-affiliated reviewers in coordinated networks, published, and used for career advancement. The breakdown was multifactorial: high-volume special-issue calls without sufficient editorial oversight; reviewer networks that the journal could not detect were coordinated; a financial incentive structure that rewarded throughput.

    The 2024-2025 response was substantial. Wiley shut down the Hindawi brand, retracted at scale, and rebuilt its peer-review controls. Other publishers running similar special-issue programmes audited and tightened. The COPE-led United2Act initiative (United2Act for paper mills, launched 2023) produced industry-wide commitments to detection cooperation, transparent retraction practices, and improved reviewer verification.

    Tortured phrases: a detection lever

    The tortured phrases concept, coined by Guillaume Cabanac and Cyril Labbé in 2021, was a methodological breakthrough. A tortured phrase is a clumsy paraphrasing of a standard technical term, typically introduced by attempting to evade plagiarism detection by automatic word substitution. “Counterfeit consciousness” for “artificial intelligence,” “haphazard backwoods” for “random forests,” “fake neural organization” for “artificial neural network.” Once recognised, tortured phrases are a reliable signal of mill involvement, because no human author working in their field would write “haphazard backwoods” when they meant random forests.

    Cabanac and Labbé’s Problematic Paper Screener (PPS) operationalises tortured-phrase detection at scale. The PPS continuously scans the published literature against a curated dictionary of tortured phrases, flagging papers that contain them. By 2026 the PPS has flagged over 14,000 papers; many have been retracted, more are under investigation, and a substantial subset will likely remain in the literature without action because the journals are unresponsive or defunct.

    The PPS is open infrastructure (the dictionary is public, the methodology is published, the flagged papers are listed). It has been criticised for false positives (some flagged papers turn out to have innocent explanations, e.g., automated translation from a non-English original) but the precision is high enough that an editor receiving a PPS flag should treat it as a serious signal warranting investigation.

    Image manipulation

    The other major detection front is image manipulation, particularly in life-science papers where Western blots, microscopy images, and gel electrophoresis are routinely fabricated by duplication, splicing, or AI generation. Elisabeth Bik’s catalogue of image-duplication cases has been the canonical reference for over a decade. The 2022-2024 development was the deployment of automated image-similarity tools (Imagetwin, Proofig) by major publishers; by 2025 most large publishers run automated image screening on every submission.

    The 2025 escalation is AI-generated images. A diffusion-model-generated Western blot is more difficult to detect than a duplicated one because there is no source to find. The detection community has begun work on AI-generated-image detection but the arms race is genuinely real, with no settled tool. The current best practice is to require raw data deposition (the original blot scan, the unprocessed microscopy stack) alongside the published image, with image-manipulation tools running on both. Several Cell Press and EMBO journals now require this for all life-science submissions.

    Citation cartels

    Citation cartels are coordinated networks of authors who systematically cite each other to inflate their citation counts and journal impact factors. The classic cartel pattern is journal-level: a journal’s editorial board reciprocally cites other journals’ editorial boards, all benefiting from the inflated cross-citation. The author-level pattern is similar: a network of researchers in adjacent specialties cites each other across many papers.

    Detection is statistical: cartels show citation patterns that are sharply non-random in the citation graph. The 2023-2024 work by Albers Mohrman and others operationalised the detection at the journal-citation-network level; Clarivate has begun excluding cartel-implicated journals from the JCR. The author-level cartels are harder to act against, but the existence of the signal is becoming part of the institutional-integrity toolkit.

    The retraction infrastructure

    Retraction has historically been slow, opaque, and inconsistently practiced. The 2022 NISO recommended practice on retraction (NISO RP-45-2022) and the 2024 Crossref retraction-metadata revisions have begun to change this. A retracted paper now carries structured machine-readable metadata about the retraction reason, the implicated parties, and the relationship to other papers; downstream services (PubMed, Google Scholar, Scopus, citation databases) consume the metadata and surface retraction notices alongside the paper.

    The remaining gap is the unretracted-but-suspect paper. A paper flagged by the Problematic Paper Screener but never investigated by the journal sits in the literature unmarked. The 2024 COPE-led discussion of expressions of concern as an interim status (the paper is under investigation but not yet retracted) is one direction. A more radical proposal, now being piloted by several preprint servers and one or two journals, is to surface the PPS flag directly on the article landing page even before the journal acts, with a clear distinction between “flagged by automated screener” and “retracted by publisher.”

    The United2Act response

    United2Act, launched in 2023 with COPE and STM coordinating, brought publishers, researchers, integrity offices, and regulators together to address paper mills. The 2024 United2Act communique committed signatories to: cooperate on detection (sharing reviewer-misconduct signals across publishers); standardise retraction practices; improve reviewer verification; coordinate with institutions on consequences for authors of fabricated papers.

    The 2025 work has been operational: the COPE/STM joint paper-mill database (publishers can submit suspect manuscript signatures and the database flags coincidences); reviewer-verification protocols (ORCID iD plus institutional email plus referee history); coordination with national integrity offices in jurisdictions where paper-mill commissioning is concentrated.

    The honest assessment is that United2Act has bought the industry better coordination but has not solved the structural incentive problem. As long as researchers face quantitative publication requirements for promotion, the demand for fabricated authorship slots will exist. The longer-term fix is on the responsible-assessment side (see our responsible-assessment domain); the integrity-side work is harm reduction.

    COPE flowcharts: the per-case operational layer

    The COPE flowcharts, maintained and updated by the Committee on Publication Ethics, are the operational toolkit for editors handling suspected misconduct. The flowcharts cover (among many) plagiarism in a submitted manuscript, plagiarism in a published article, redundant publication, fabricated data, undisclosed conflict of interest, undisclosed AI use, image manipulation, authorship disputes, paper-mill suspicion, and citation manipulation.

    An editor confronted with a suspect submission in 2026 should pull the relevant COPE flowchart, follow the documented procedure, and document the decision trail. The flowcharts are not a substitute for editorial judgement, but they are an audit-defensible baseline. The 2024-2025 COPE updates added flowcharts specifically for AI-assisted fabrication, paper-mill suspicion based on tortured-phrase detection, and image-manipulation findings from automated tools.

    What to do at the institutional level

    For an institutional research-integrity office in 2026, the practical priorities are: (1) monitor your own institution’s authors against the PPS and the Retraction Watch database; (2) integrate retraction-metadata feeds into your CRIS so you can detect when your authors’ papers are retracted elsewhere; (3) participate in United2Act or its national-level analogues; (4) commit publicly to following COPE flowcharts and document decisions; (5) work with your promotion-and-tenure committees to remove the pure-count incentives that fuel the demand side. The research-integrity domain at CASRAI maintains the institutional-integrity playbooks.

    Related dictionary entries

    References

    Cabanac, Labbé, Magazinov, Tortured phrases: A dubious writing style emerging in science (2021 preprint and follow-up papers). Bik et al., The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications (mBio, 2016). Else and Van Noorden, The fight against fake-paper factories that churn out sham science (Nature, 2021). COPE, Paper mills – research, action plans, and resources (2023, updated 2024). United2Act, Joint Communique on Paper Mills (2023).

  • Why the next CRediT version should include ‘AI assistance’ as a role

    The 14 roles of CRediT were designed in 2013-2014 with a model of contribution that did not include large language models or generative AI systems. A decade on, the taxonomy is robust and widely adopted, but the AI question is hard to ignore. This post makes the case — tentatively, and with attention to the counter-arguments — that the next CRediT revision should add a 15th role explicitly covering AI assistance. We are publishing it here to invite community pushback before any formal proposal goes to the CRediT stewardship group.

    Why this question is not solved by disclosure alone

    The current consensus around generative AI in scholarly authorship rests on two pillars: AI cannot be a co-author (the ICMJE 2023 position), and AI use must be disclosed in a structured declaration. CASRAI agrees with both. They do not, however, resolve the question of how AI assistance shows up in CRediT.

    A worked example. Suppose a paper has four authors. Author A wrote the first draft with substantial assistance from a large language model, which she prompted, edited, fact-checked, and revised. Author B ran the formal analysis using an AI-assisted statistical-discovery tool that proposed model specifications. Author C generated several of the figures using a GenAI visualisation tool. Author D supervised. Each used AI; each used it differently; each took human responsibility for the output. How does the CRediT statement represent this?

    Under current CRediT, AI use is invisible. Author A gets Writing – original draft (lead). Author B gets Formal analysis (lead). Author C gets Visualization (lead). Author D gets Supervision. The AI assistance shows up only in the publisher-mandated AI disclosure, which is a free-text field in the methods or acknowledgements. The structured contributorship record has no place for the granular fact that AI was a tool in each of those role-discharges.

    The proposed 15th role

    The draft scope we are testing is this:

    AI assistance. The use of artificial-intelligence systems, including generative AI, machine-learning models, and automated analytical tools, in the production of the work. Includes prompt engineering, model selection, validation of AI output, and human verification of AI-generated content. Does not include use of AI as a routine tool (e.g., grammar checkers, citation-formatting tools) below a disclosure threshold defined by the publisher.

    The role would carry the standard degree-of-contribution qualifier. A human author whose primary contribution was prompting and verifying an AI system would be marked Lead for AI assistance; a co-author who occasionally checked AI outputs would be Supporting. The role would not be a substitute for the existing roles — the human who used AI for the first draft still gets Writing – original draft — but it would add the structured fact that AI was involved.

    The arguments for

    First, structured disclosure is more useful than prose disclosure. A free-text AI declaration cannot be queried, cross-referenced, or aggregated. A CRediT-style structured role can. Integrity offices investigating a fabrication can query for papers with AI assistance roles; funders tracking AI use in grant outputs can roll up the data; bibliometric studies can analyse patterns. None of this is possible with the current free-text disclosure.

    Second, granularity matters for accountability. Knowing that a paper used AI is less useful than knowing which contributor used AI for which task. The CRediT role assignment makes the accountability specific. If a fabricated reference appears in the introduction, the question of who is responsible for verifying it has a structured answer.

    Third, the boundary is becoming a fiction. Modern statistical workflows include AI components (autoML, AI-assisted exploratory analysis); modern writing workflows include AI components (Copilot for prose, Claude for editing); modern visualisation workflows include AI components. The pretence that these are separable from the role they support is increasingly hard to maintain. If AI is being used to discharge a role, the role assignment should say so.

    The arguments against

    Three serious counter-arguments deserve engagement.

    First, the scope-creep concern. CRediT has held to 14 roles deliberately. Each addition raises the cognitive load on authors filling out the statement, increases the integration burden on publishers, and risks the taxonomy becoming unusable through over-specification. The argument from Liz Allen and the original CRediT designers has been that the taxonomy gains its value from being small enough to use.

    Second, the boundary problem. What counts as AI assistance? A grammar checker is plausibly AI; a citation formatter increasingly is; a search engine ranking results by relevance certainly is. If every modern research tool counts as AI, the role becomes meaningless. A workable scope requires a non-trivial threshold (the draft language above gestures at “below a disclosure threshold defined by the publisher”), and that threshold is hard to define without ending up with either everything or nothing.

    Third, the disclosure-versus-contribution distinction. CRediT is a contributorship taxonomy. AI is not a contributor — that is the settled position. Adding an AI role to CRediT risks blurring this. The alternative is to keep AI in a separate disclosure form, structurally similar to a competing-interests declaration or a funding statement, rather than in the contributorship statement.

    A possible middle path

    The middle path is to keep CRediT at 14 roles and to define a parallel AI assistance declaration with comparable structure: a controlled vocabulary of AI-use types, a per-contributor breakdown linked to ORCID iDs, a model-and-version field, and a verification statement. This would sit alongside CRediT in publisher submission systems and JATS XML, rather than inside it.

    This is closer to where the current publisher disclosure forms are heading, and it preserves the conceptual clarity that CRediT roles describe what humans did, while a separate declaration describes what AI tools were used. We are increasingly inclined to recommend this path, with the caveat that the disclosure must be structured to the same standard as CRediT — not free-text, with controlled vocabularies, deposited to Crossref, and surfaced on ORCID.

    What the CRediT stewardship group should do next

    Three concrete steps. First, run a structured community consultation through 2026 on whether to add AI assistance as a 15th CRediT role, with the alternative being a parallel structured declaration. The CRediT governance page outlines the consultation process. Second, in parallel, draft the data model for a parallel AI assistance declaration so that the comparison is concrete and not abstract. Third, coordinate with NISO on whether either option requires a revision to Z39.104.

    The decision is not urgent in the sense that the integrity system is failing today; the existing disclosure forms work, badly. It is urgent in the sense that every year of delay produces another year of unstructured AI-use data that cannot be aggregated or analysed, which makes the eventual transition harder.

    Related dictionary entries