Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI
Dictionary termTrack CStablev2026.2

MMLU benchmark

The Massive Multitask Language Understanding benchmark, a 57-subject multiple-choice test covering elementary, high-school, college, and professional knowledge, designed to probe broad-coverage language-model knowledge.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    An LLM technical report headline including MMLU 5-shot accuracy.

  • Is an instance

    A leaderboard ranking open-weight models by MMLU score.

Counter-examples

Looks similar, but isn't

  • Not an instance

    BIG-bench (different methodology).

  • Not an instance

    HumanEval (code-only benchmark).

Editorial commentary

MMLU (Hendrycks et al., 2021) became the dominant headline benchmark for general LLM capability through 2022-2024. Its limitations (multiple-choice format, contamination risk from public test sets, decreasing headroom) drove the development of successors such as MMLU-Pro and GPQA. MMLU remains widely reported for comparability.

References

  • Hendrycks et al., 'Measuring Massive Multitask Language Understanding' (ICLR 2021).

Also known as

Massive Multitask Language Understanding

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="MMLU benchmark"
      vocab-term-identifier="https://casrai.org/dictionary/term/mmlu-benchmark" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "MMLU benchmark",
  "identifier": "https://casrai.org/dictionary/term/mmlu-benchmark",
  "description": "The Massive Multitask Language Understanding benchmark, a 57-subject multiple-choice test covering elementary, high-school, college, and professional knowledge, designed to probe broad-coverage language-model knowledge.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
  "url": "https://casrai.org/dictionary/term/mmlu-benchmark",
  "sameAs": [
    "Massive Multitask Language Understanding"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}
LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →