Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI
Dictionary termTrack CStablev2026.2

HELM benchmark

The Holistic Evaluation of Language Models benchmark, a multi-metric framework evaluating language models across accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency on a fixed set of scenarios.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    A model report including the HELM scenario-metric matrix as appendix evidence.

  • Is an instance

    A research lab using HELM scenarios for internal model comparison.

Counter-examples

Looks similar, but isn't

  • Not an instance

    A single-accuracy MMLU score.

  • Not an instance

    A latency-only benchmark.

Editorial commentary

HELM (Liang et al., 2023) emphasises holistic evaluation: a single model is scored across many metrics on many scenarios, with the resulting matrix surfaced as the principal output. This contrasts with single-metric leaderboards and aligns with the multi-property framing of trustworthy AI.

References

  • Liang et al., 'Holistic Evaluation of Language Models' (Transactions on Machine Learning Research, 2023).

Also known as

Holistic Evaluation of Language Models

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="HELM benchmark"
      vocab-term-identifier="https://casrai.org/dictionary/term/helm-benchmark" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "HELM benchmark",
  "identifier": "https://casrai.org/dictionary/term/helm-benchmark",
  "description": "The Holistic Evaluation of Language Models benchmark, a multi-metric framework evaluating language models across accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency on a fixed set of scenarios.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
  "url": "https://casrai.org/dictionary/term/helm-benchmark",
  "sameAs": [
    "Holistic Evaluation of Language Models"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}
LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →