Examples
Worked examples
- Is an instance
A model technical report including BIG-bench Hard average accuracy across 23 tasks.
- Is an instance
A new benchmark paper using BIG-bench as a baseline distribution of LLM capability.
Counter-examples
Looks similar, but isn't
- Not an instance
MMLU (a different benchmark).
- Not an instance
A single dataset like SQuAD.
Editorial commentary
BIG-bench (Srivastava et al., 2023) emphasises task diversity: arithmetic, logic, multilingual translation, theory-of-mind, common-sense, code, and intentionally adversarial probes. The 'BIG-bench Hard' subset captures the hardest tasks where models showed substantial headroom; it has been heavily reused as a comparison set.
References
- Srivastava et al., 'Beyond the Imitation Game' (Transactions on Machine Learning Research, 2023).
Also known as
Beyond the Imitation Game Benchmark · BBH (subset)
Machine-readable encodings
Use in your systems
<role vocab="credit"
vocab-identifier="https://casrai.org/dictionary/"
vocab-term="BIG-bench"
vocab-term-identifier="https://casrai.org/dictionary/term/big-bench" />{
"@context": "https://schema.org",
"@type": "DefinedTerm",
"name": "BIG-bench",
"identifier": "https://casrai.org/dictionary/term/big-bench",
"description": "The Beyond the Imitation Game benchmark, a community-contributed collection of more than 200 tasks designed to probe capabilities of large language models that may be missed by narrower benchmarks.",
"inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
"url": "https://casrai.org/dictionary/term/big-bench",
"sameAs": [
"Beyond the Imitation Game Benchmark",
"BBH (subset)"
],
"license": "https://creativecommons.org/licenses/by/4.0/"
}







