Dictionary domainTrack A
Generative AI use and disclosure
Vocabulary for human–AI collaboration on research outputs and the disclosure required.
For implementers
Operational deployment checklist for Generative AI use and disclosure: prerequisites, five deploy steps, integration notes for Pure, Symplectic Elements, Worktribe, DSpace, and more, plus the pitfalls that recur in the field.
Terms in this domain
33 terms
Model versioning
The practice of identifying a specific revision of an AI model by name, version number, release date, or content hash, sufficient to uniquely distinguish it from earlier or later revisions that may behave differently on the same input.
Inference
The process of generating outputs from a trained AI model in response to inputs at runtime, distinct from training (which updates model parameters); for LLMs, inference is the production of completions from prompts.
Fine-tuning
The process of further training a pre-trained foundation model on a smaller, task-specific or domain-specific dataset, updating some or all parameters, to specialise its behaviour while retaining general capability.
Retrieval-augmented generation (RAG)
An AI architecture in which an LLM is augmented at inference time with documents retrieved from an external corpus (often via vector similarity search), so that the model's outputs are grounded in retrieved evidence rather than relying solely on parametric knowledge.
AI in literature search
The use of AI-powered tools (Elicit, Consensus, Scite, Undermind, Semantic Scholar's AI features) to discover, screen, summarise, or synthesise scholarly literature, distinguished from traditional keyword search by the use of embeddings, LLM summarisation, or generative answering over a literature corpus.
AI in qualitative coding
The use of LLMs or other AI tools to assign codes, themes, or categories to qualitative data (interview transcripts, open-ended survey responses, field notes), either as the sole coder, a second coder for reliability, or a first-pass triage to be human-reviewed.
AI image generation
The use of a generative model (diffusion models, GANs, autoregressive image models) to produce images from text prompts, image prompts, or other conditioning inputs, distinct from AI-assisted image editing of an existing real image.
AI summarisation
The use of an AI system to produce a shorter version of a text that preserves its key information, including abstracts, lay summaries, executive summaries, and literature digests, whether extractive (sentence selection) or abstractive (paraphrasing).
AI translation
The use of an AI system (rule-based, statistical, or neural) to convert text from one natural language to another, including specifically the use of LLMs and dedicated neural machine translation systems for translating scholarly works or portions thereof.
AI in editorial decisions
The use of AI tools by journal editors to triage, screen, or make decisions on submitted manuscripts, including desk-rejection screening, reviewer matching, plagiarism flagging, or assessment of fit — distinguished by whether the AI advises a human editor or substitutes for one.
AI in peer review
The use of generative AI tools by peer reviewers to assist in evaluating manuscripts — including summarisation, language editing of the review, or generation of review text — which is restricted or prohibited by many publishers due to confidentiality and originality concerns.
Generative-AI disclosure statement
A dedicated section in a manuscript — typically headed 'Use of Generative AI' or similar — that consolidates all disclosures of AI tool use across the work, including tools used, versions, sections affected, and the human authors' verification process.
AI co-authorship rejection (ICMJE 2023)
The 2023 update to ICMJE's Recommendations stating explicitly that chatbots and generative AI systems cannot be listed as authors because they cannot satisfy any of the four ICMJE authorship criteria, in particular the requirement to be accountable for the work and to approve the version to be published.
Author responsibility (for AI use)
The principle that human authors retain full responsibility for the accuracy, integrity, originality, ethical sourcing, and lack of plagiarism of all content in a scholarly work, regardless of which portions were drafted, suggested, or generated by an AI tool.
Acknowledgement (vs authorship for AI)
The convention, codified by ICMJE and COPE (2023), that AI tool use must be disclosed in the methods or acknowledgements section of a scholarly work rather than via the author byline or CRediT contributor list, because AI cannot satisfy authorship's accountability requirements.
Training data provenance
Documentation of the sources, collection methods, licensing, consent basis, time range, and processing steps applied to the data used to train an AI model, sufficient to assess fitness-for-purpose, legal compliance, and potential bias.
Data leakage (training)
The contamination of an AI model's training corpus with data that should have remained held-out — including evaluation benchmarks, test sets, or proprietary content — such that the model's apparent performance overstates its true generalisation ability or it can reproduce content it should not have seen.
AI fairness
A property of an AI system whereby its outputs satisfy a defined criterion of equitable treatment across specified groups — common criteria include demographic parity, equalised odds, equal opportunity, and calibration parity — recognising that these criteria are often mutually incompatible.
AI bias
Systematic skew in an AI system's outputs that produces unjustified differential treatment, accuracy, or representation across groups, tasks, or contexts, arising from training-data composition, model architecture, objective function, or deployment context.
Detection tool (AI-generated)
A software system that estimates the probability that a given piece of content (text, image) was produced by a generative AI system, typically by analysing statistical features of the output without access to provenance metadata.
AI provenance
The chain of evidence — metadata, cryptographic signatures, watermarks, or attestations — that documents whether a piece of content was produced or modified by an AI system, by which system, when, and under what prompt or input.
Watermarking (AI output)
The embedding of a statistical, cryptographic, or visible signal into AI-generated content at the time of generation, allowing later identification of that content as machine-produced — distinct from post-hoc detection which infers AI origin from features of the output alone.
Synthetic image
An image produced by a generative model (e.g., diffusion model, GAN) rather than captured from a physical scene or instrument, including images used as illustrations, in figures, or as training data.
Synthetic data
Data generated by a model or algorithm rather than collected from real-world observations or experiments, designed to mimic the statistical structure of real data for purposes such as augmentation, privacy-preservation, or model training.
Hallucination
An output from a generative AI system that is presented confidently and fluently but is factually incorrect, fabricated, or unsupported by the input data or any verifiable source — including invented citations, non-existent authors, false statistics, and incorrect quotations.
System prompt
A prompt provided to an LLM at the start of a session — typically not visible to the end user — that sets persona, constraints, output format, safety rules, and tool-use permissions for all subsequent user interactions in that session.
Prompt engineering
The practice of designing, refining, and structuring input text given to a generative AI system in order to elicit specific desired outputs, including techniques such as role assignment, few-shot examples, chain-of-thought scaffolding, and output-format specification.
Generative AI
Artificial intelligence systems whose primary output is novel content (text, images, audio, video, code, or structured data) produced by sampling from a learned distribution, as distinct from discriminative AI systems whose output is a classification, score, or decision over existing inputs.
Large language model (LLM)
A neural-network model trained on large text corpora using self-supervised next-token prediction (or analogous objective), with parameter counts typically in the billions, capable of generating coherent text and performing a broad range of natural-language tasks without task-specific training.
AI tool disclosure
A statement within a scholarly work that identifies which generative AI tools were used, the version, the scope of use (e.g., language editing, code generation, figure creation), and which sections were affected, sufficient for a reader to assess the AI's role.
AI as author
The disallowed practice of listing a generative AI system (e.g., ChatGPT, Claude) in the author byline or contributor list of a scholarly work, on the rationale that AI cannot meet authorship criteria requiring accountability, agreement, and the capacity to take public responsibility.
AI-generated content
Text, images, code, or other artefacts produced substantively by a generative AI system in response to a prompt, where the AI is the proximate source of the content rather than a tool refining human-authored material.
AI-assisted writing
The use of a generative AI tool by a human author to draft, edit, paraphrase, summarise, or stylistically revise text where the human retains final editorial control and authorship responsibility.







