large language models – CASRAI Dictionary

Generative AI refers to machine-learning systems that produce new content, such as text, images, audio or code, by modelling the patterns of their training data and sampling from them. Unlike predictive models that output a label or a number, a generative model outputs an artefact. The most prominent examples are large language models (LLMs) for text and diffusion models for images. For research, the rise of these tools has prompted clear disclosure norms from editorial bodies, the most important being that AI cannot be listed as an author.

What generative AI is

Modern generative systems are typically foundation models: large models trained on broad data at scale, then adapted to many downstream tasks. Large language models are built on the transformer architecture introduced in 2017, which uses an attention mechanism to weigh relationships between tokens in a sequence and predict the next token. Diffusion models generate images by learning to reverse a gradual noising process, starting from random noise and denoising it step by step into a coherent image. The underlying machinery is the neural network described in our explainer on neural networks and deep learning.

How generative AI differs from predictive ML

The distinction is one of output. Predictive (discriminative) machine learning answers questions about given inputs: is this email spam, what is this house worth, which category does this image belong to? Generative AI instead produces novel outputs that did not exist before. A useful framing is that predictive models estimate a label given an input, whereas generative models estimate the distribution of the data itself and sample new examples from it. The foundations of the predictive paradigm are covered in our guide to machine learning concepts and methods.

Aspect	Predictive ML	Generative AI
Typical output	Label, score or value	New text, image, audio or code
Goal	Predict a target for an input	Produce novel content
Examples	Spam filter, price regression	LLMs, diffusion image models

Emerging research-disclosure norms

As researchers began using generative tools to draft, edit and analyse, journals and editorial bodies responded with guidance. Two positions are now widely shared across the scholarly publishing ecosystem.

AI cannot be an author. The International Committee of Medical Journal Editors (ICMJE) and the Committee on Publication Ethics (COPE) hold that authorship entails responsibility and accountability that a non-human tool cannot bear, including approving the final version and being answerable for the integrity of the work. A generative model therefore cannot meet authorship criteria and must not be listed as an author or co-author.

Use must be disclosed. Where generative AI has been used in producing a manuscript, authors are expected to disclose how it was used, typically in the methods or acknowledgements, so that reviewers and readers can assess it. Authors remain fully responsible for the accuracy and integrity of everything in the submission, including any AI-assisted content. These norms are tracked across our GenAI disclosure coverage, and they extend to confidential contexts such as peer review, as set out in our policy on generative AI in peer review, disclosure and confidentiality.

Documenting generative-AI use in the research record

Good disclosure is specific. Stating which tool was used, for what purpose (for example language editing versus drafting analysis), and what human verification followed, makes the record auditable. This dovetails with structured documentation practices such as model cards and datasheets, discussed in our piece on AI model documentation, and with the controlled vocabulary maintained in the casrai.org research dictionary.

Frequently asked questions

Can generative AI be listed as an author on a paper?

No. ICMJE and COPE positions hold that authorship requires accountability for the work that a non-human tool cannot bear. Generative AI cannot be an author or co-author, and its use should instead be disclosed.

How is generative AI different from predictive machine learning?

Predictive ML outputs a label, score or value for a given input, while generative AI produces new content such as text or images. Generative models learn the distribution of the data and sample from it.

Where should authors disclose generative-AI use?

Typically in the methods or acknowledgements, stating which tool was used and for what purpose. Authors remain fully responsible for the accuracy and integrity of all AI-assisted content.

What is a foundation model?

A foundation model is a large model trained on broad data at scale and then adapted to many downstream tasks. Large language models and diffusion image models are common examples.

Tag: large language models

What Is Generative AI and Research Disclosure Norms?