Examples
Worked examples
- Is an instance
An evaluation card for a code-completion benchmark documenting the held-out test set, prompting template, and decoding configuration.
- Is an instance
An evaluation card for a clinical-reasoning probe describing rater calibration.
Counter-examples
Looks similar, but isn't
- Not an instance
A model card section labelled 'Evaluation' but not separately documented.
- Not an instance
A leaderboard table without supporting methodology.
Editorial commentary
Evaluation cards are a more recent (2023-) addition to the documentation-artefact family, recognising that an evaluation can itself become a reusable artefact (a dataset + protocol + analysis approach) deserving its own documentation. NIST's GenAI evaluation profile and the Stanford CRFM evaluation reports exemplify the genre.
References
- Bommasani et al., Stanford CRFM Foundation Model Transparency Index (2023); NIST AI 600-1 GenAI Profile (2024).
Also known as
eval card
Machine-readable encodings
Use in your systems
<role vocab="credit"
vocab-identifier="https://casrai.org/dictionary/"
vocab-term="AI evaluation card"
vocab-term-identifier="https://casrai.org/dictionary/term/ai-evaluation-card" />{
"@context": "https://schema.org",
"@type": "DefinedTerm",
"name": "AI evaluation card",
"identifier": "https://casrai.org/dictionary/term/ai-evaluation-card",
"description": "A structured documentation artefact specifically describing an evaluation of an AI system, separate from the model card, including the evaluation methodology, datasets, metrics, results, and known limitations of the evaluation itself.",
"inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
"url": "https://casrai.org/dictionary/term/ai-evaluation-card",
"sameAs": [
"eval card"
],
"license": "https://creativecommons.org/licenses/by/4.0/"
}







