Examples
Worked examples
- Is an instance
A model trained with a 75-item constitution covering helpfulness, harmlessness, and honesty.
- Is an instance
A research replication of Constitutional AI on a smaller open-weights base.
Counter-examples
Looks similar, but isn't
- Not an instance
Pure RLHF with human-only preference data.
- Not an instance
Rule-based output filtering after generation (a different mitigation layer).
Editorial commentary
Bai et al. (2022) at Anthropic introduced Constitutional AI as a complement to RLHF. The constitution is a list of natural-language principles (often dozens of items); the model is trained via self-critique to revise its outputs to better satisfy them. The approach reduces dependence on continuous large-scale human labelling for refinement. This entry covers the methodological concept rather than a specific product implementation.
References
- Bai et al., 'Constitutional AI: Harmlessness from AI Feedback' (arXiv 2022).
Also known as
CAI · RLAIF (related concept)
Machine-readable encodings
Use in your systems
<role vocab="credit"
vocab-identifier="https://casrai.org/dictionary/"
vocab-term="Constitutional AI (concept)"
vocab-term-identifier="https://casrai.org/dictionary/term/constitutional-ai-concept" />{
"@context": "https://schema.org",
"@type": "DefinedTerm",
"name": "Constitutional AI (concept)",
"identifier": "https://casrai.org/dictionary/term/constitutional-ai-concept",
"description": "A training methodology in which a model is trained to align its outputs with a written set of principles ('a constitution'), with the model itself used to critique and revise candidate responses against those principles in place of direct human feedback at scale.",
"inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
"url": "https://casrai.org/dictionary/term/constitutional-ai-concept",
"sameAs": [
"CAI",
"RLAIF (related concept)"
],
"license": "https://creativecommons.org/licenses/by/4.0/"
}







