The Massive Multitask Language Understanding benchmark, a 57-subject multiple-choice test covering elementary, high-school, college, and professional knowledge, designed to probe broad-coverage language-model knowledge.

ByCASRAI Editorial Board

· Last updated 21 May 2026

Examples

Worked examples

Is an instance
An LLM technical report headline including MMLU 5-shot accuracy.
Is an instance
A leaderboard ranking open-weight models by MMLU score.

Counter-examples

Looks similar, but isn't

Not an instance
BIG-bench (different methodology).
Not an instance
HumanEval (code-only benchmark).

Editorial commentary

MMLU (Hendrycks et al., 2021) became the dominant headline benchmark for general LLM capability through 2022-2024. Its limitations (multiple-choice format, contamination risk from public test sets, decreasing headroom) drove the development of successors such as MMLU-Pro and GPQA. MMLU remains widely reported for comparability.

References

Hendrycks et al., 'Measuring Massive Multitask Language Understanding' (ICLR 2021).

Also known as

Massive Multitask Language Understanding

Machine-readable encodings

Use in your systems

JATS XML <role> element

xml

<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="MMLU benchmark"
      vocab-term-identifier="https://casrai.org/dictionary/term/mmlu-benchmark" />

Schema.org DefinedTerm (JSON-LD)

json

{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "@id": "https://casrai.org/dictionary/term/mmlu-benchmark",
  "name": "MMLU benchmark",
  "identifier": "https://casrai.org/dictionary/term/mmlu-benchmark",
  "description": "The Massive Multitask Language Understanding benchmark, a 57-subject multiple-choice test covering elementary, high-school, college, and professional knowledge, designed to probe broad-coverage language-model knowledge.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-ml-research-outputs#set",
  "url": "https://casrai.org/dictionary/term/mmlu-benchmark",
  "sameAs": [
    "Massive Multitask Language Understanding"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "publisher": {
    "@id": "https://casrai.org/#organization"
  },
  "dateModified": "2026-05-21T02:22:51",
  "inLanguage": "en"
}

Referenced across the research world

View CASRAI adoption →