Skip to main content
v2026.1714 entries · CC-BY 4.0

Dictionary domainTrack C

AI and ML research outputs

Model cards, system cards, datasheets, benchmarks, evaluation suites.

For implementers

Operational deployment checklist for AI and ML research outputs: prerequisites, five deploy steps, integration notes for Pure, Symplectic Elements, Worktribe, DSpace, and more, plus the pitfalls that recur in the field.

View implementation checklist →

Terms in this domain

43 terms

Dictionary termStable

Synthetic benchmark

A benchmark whose evaluation items are wholly or partially generated by another model or procedural method, rather than collected from natural human-produced sources, used to probe specific capabilities or to scale evaluation cheaply.

AI and ML research outputs· Data & methods
Dictionary termStable

RLHF (Reinforcement Learning from Human Feedback)

A training methodology in which a language model is fine-tuned using a reward signal derived from human preferences over pairs (or larger sets) of candidate model outputs, typically by first training a reward model and then optimising the policy against it via PPO or a related algorithm.

AI and ML research outputs· Data & methods
Dictionary termStable

Constitutional AI (concept)

A training methodology in which a model is trained to align its outputs with a written set of principles ('a constitution'), with the model itself used to critique and revise candidate responses against those principles in place of direct human feedback at scale.

AI and ML research outputs· Data & methods
Dictionary termStable

Prompt injection

An attack on a language-model-based system in which adversarial instructions, embedded in untrusted input (a document, web page, tool output, image), cause the model to act in ways that diverge from its developer's or user's intent.

AI and ML research outputs· Data & methods
Dictionary termStable

Jailbreak (LLM)

A prompt or interaction pattern that causes a language model to bypass its safety training and produce outputs the model was tuned to refuse, such as harmful instructions, restricted content, or violations of provider policy.

AI and ML research outputs· Data & methods
Dictionary termStable

Red-teaming

The practice of deliberately adversarial testing of an AI system by skilled testers attempting to elicit failures, unsafe outputs, or policy violations, in order to discover weaknesses before deployment.

AI and ML research outputs· Data & methods
Dictionary termStable

AI safety case

A structured, evidence-based argument that an AI system is acceptably safe to deploy in a defined context, modelled on safety cases from established engineering disciplines (nuclear, aviation, medical devices).

AI and ML research outputs· Data & methods
Dictionary termStable

AI evaluation card

A structured documentation artefact specifically describing an evaluation of an AI system, separate from the model card, including the evaluation methodology, datasets, metrics, results, and known limitations of the evaluation itself.

AI and ML research outputs· Data & methods
Dictionary termStable

Reproducible AI experiment

An AI experiment for which sufficient artefacts and metadata are released (data, code, seed, environment, hyperparameters, training procedure) that an independent investigator can re-run it and obtain numerically equivalent or statistically indistinguishable results.

AI and ML research outputs· Data & methods
Dictionary termStable

Open-source model (criteria)

A model meeting the criteria articulated by the Open Source Initiative's Open Source AI Definition: open data information, open code, and open weights, with each released under terms compatible with the OSI's freedoms to use, study, modify, and share.

AI and ML research outputs· Data & methods
Dictionary termStable

Open weights model

A model whose trained parameter values are publicly released and downloadable, typically under a named licence, distinct from but often described as 'open' even when training data and code are not released.

AI and ML research outputs· Data & methods
Dictionary termStable

Model weight licence

The licence terms governing the use, modification, and redistribution of a model's trained weights, which may differ from the licence on the training code and the licence on the training data.

AI and ML research outputs· Data & methods
Dictionary termStable

Model checkpoint

A saved snapshot of a model's parameters (and optionally optimiser state) at a specific point in training, identified by a step number or version tag and serialised to a file format such as safetensors or .pt.

AI and ML research outputs· Data & methods
Dictionary termStable

Model evaluation suite

A defined collection of benchmarks, tasks, and metrics, with standardised prompting and decoding rules, used to characterise a model's capabilities and behaviour across a range of dimensions.

AI and ML research outputs· Data & methods
Dictionary termStable

Model fine-tune lineage

The specific portion of model lineage that records the sequence of fine-tuning operations applied to a base model: dataset, method (SFT, DPO, RLHF, LoRA), hyperparameters, and resulting checkpoint identifier.

AI and ML research outputs· Data & methods
Dictionary termStable

Model lineage

The chain of provenance for a model recording its base model, the fine-tuning datasets and procedures applied, and any further derivatives, such that any deployed model can be traced back to its constituent training operations.

AI and ML research outputs· Data & methods
Dictionary termStable

Inference carbon footprint

The greenhouse-gas emissions associated with serving inference requests from a deployed model, typically expressed per-request (e.g., gCO2e per query) or in aggregate (kgCO2e per month).

AI and ML research outputs· Data & methods
Dictionary termStable

Training carbon footprint

The total greenhouse-gas emissions, expressed in kilograms or tonnes of CO2-equivalent, attributable to training a machine-learning model, estimated from energy consumption and the carbon intensity of the electricity supply.

AI and ML research outputs· Data & methods
Dictionary termStable

Compute (FLOPs estimate)

The total floating-point operations consumed by training a model, conventionally reported as a single number (e.g., 3.0 x 10^25 FLOPs) used as a regulatory and scientific proxy for training-run scale.

AI and ML research outputs· Data & methods
Dictionary termStable

Training data composition

The mixture of data sources, by domain, language, modality, and provenance, used to train a model, including the proportions and any filtering or deduplication applied.

AI and ML research outputs· Data & methods
Dictionary termStable

Parameter count

The total number of learnable scalar weights in a machine-learning model, conventionally reported as a count (e.g., 7B = 7 x 10^9 parameters) and disclosed as a basic model metadata field.

AI and ML research outputs· Data & methods
Dictionary termStable

Mixture-of-experts (MoE)

A neural-network architecture in which a learned router directs each input (or token) to a small subset of specialist sub-networks ('experts'), so that the model has a large total parameter count but uses only a fraction per forward pass.

AI and ML research outputs· Data & methods
Dictionary termStable

Frontier model

A foundation model whose capabilities meet or exceed the most advanced publicly known systems at the time of training, often defined operationally by training-compute thresholds or by performance on canonical benchmarks.

AI and ML research outputs· Data & methods
Dictionary termStable

Foundation model

A large machine-learning model trained on broad data at scale and adaptable to a wide range of downstream tasks through fine-tuning, prompting, or retrieval augmentation.

AI and ML research outputs· Data & methods
Dictionary termStable

MMLU benchmark

The Massive Multitask Language Understanding benchmark, a 57-subject multiple-choice test covering elementary, high-school, college, and professional knowledge, designed to probe broad-coverage language-model knowledge.

AI and ML research outputs· Data & methods
Dictionary termStable

HELM benchmark

The Holistic Evaluation of Language Models benchmark, a multi-metric framework evaluating language models across accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency on a fixed set of scenarios.

AI and ML research outputs· Data & methods
Dictionary termStable

BIG-bench

The Beyond the Imitation Game benchmark, a community-contributed collection of more than 200 tasks designed to probe capabilities of large language models that may be missed by narrower benchmarks.

AI and ML research outputs· Data & methods
Dictionary termStable

MLCommons benchmark

A benchmark published by the MLCommons consortium for measuring AI system performance under standardised workloads, datasets, and submission rules, with the principal suites being MLPerf Training, MLPerf Inference, and MLPerf HPC.

AI and ML research outputs· Data & methods
Dictionary termStable

Hugging Face Hub (concept)

A web-based platform and ecosystem for sharing machine-learning models, datasets, and demonstration applications ('Spaces'), with conventions for model cards, dataset cards, and versioned repositories.

AI and ML research outputs· Data & methods
Dictionary termStable

NIST AI RMF (Risk Management Framework)

The US National Institute of Standards and Technology's voluntary framework for managing risks associated with AI systems across the AI lifecycle, structured around the functions Govern, Map, Measure, and Manage.

AI and ML research outputs· Data & methods
Dictionary termStable

ISO/IEC 42001 (AI management system)

An international standard, published in 2023, specifying requirements for establishing, implementing, maintaining, and continually improving an AI Management System within an organisation, structured analogously to ISO 9001 (quality) and ISO/IEC 27001 (information security).

AI and ML research outputs· Data & methods
Dictionary termStable

AI conformance assessment

A formal evaluation, conducted by the AI system provider or a notified third-party body, demonstrating that an AI system meets the applicable regulatory or standard-based requirements before being placed on the market.

AI and ML research outputs· Data & methods
Dictionary termStable

AI assurance

The process of measuring, evaluating, and communicating the trustworthiness of AI systems through evidence-based mechanisms such as audits, certifications, impact assessments, and conformity declarations.

AI and ML research outputs· Data & methods
Dictionary termStable

Trustworthy AI

AI systems exhibiting properties (lawful, ethical, technically robust) that warrant the trust of users, affected parties, and society, as articulated in the EU High-Level Expert Group's framework and adopted in subsequent regulation.

AI and ML research outputs· Data & methods
Dictionary termStable

Responsible AI

An umbrella term covering the design, development, deployment, and governance practices intended to ensure AI systems are ethical, fair, transparent, accountable, robust, secure, and respectful of privacy.

AI and ML research outputs· Data & methods
Dictionary termStable

Use card

A documentation artefact recording an intended deployment context for a model or system, including the user population, the decisions the model informs, the supervision regime, and out-of-scope uses.

AI and ML research outputs· Data & methods
Dictionary termStable

Algorithm card

A documentation artefact describing the algorithmic method or family (e.g., a particular gradient-boosting estimator, a clustering algorithm) independent of any particular trained instance, including inductive biases, assumptions, complexity, and intended use cases.

AI and ML research outputs· Data & methods
Dictionary termStable

Bias audit (model)

An audit specifically focused on disparate model performance across demographic, geographic, or contextual sub-groups, including testing for direct, proxy, and intersectional disparities.

AI and ML research outputs· Data & methods
Dictionary termStable

Model audit

A structured assessment of a machine-learning model by an independent party against pre-specified criteria covering performance, robustness, fairness, security, privacy, and conformance with stated policy.

AI and ML research outputs· Data & methods
Dictionary termStable

Data statement (NLP)

A standardised description of an NLP dataset covering curation rationale, language variety, speaker and annotator demographics, speech situation, text characteristics, and recording quality.

AI and ML research outputs· Data & methods
Dictionary termStable

Datasheet for datasets

A structured document accompanying a machine-learning dataset that records its motivation, composition, collection process, pre-processing, intended uses, distribution, and maintenance, modelled on electronic-component datasheets.

AI and ML research outputs· Data & methods
Dictionary termStable

System card

A documentation artefact describing an AI-enabled system in its production configuration, including the constituent models, the pre- and post-processing pipeline, safety filters, monitoring, and operational guardrails.

AI and ML research outputs· Data & methods
Dictionary termStable

Model card

A short, structured document accompanying a machine-learning model that records its intended use, training data, evaluation methodology, performance characteristics across population sub-groups, and known limitations.

AI and ML research outputs· Data & methods

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →