A training methodology in which a model is trained to align its outputs with a written set of principles ('a constitution'), with the model itself used to critique and revise candidate responses against those principles in place of direct human feedback at scale.

ByCASRAI Editorial Board

· Last updated 21 May 2026

Examples

Worked examples

Is an instance
A model trained with a 75-item constitution covering helpfulness, harmlessness, and honesty.
Is an instance
A research replication of Constitutional AI on a smaller open-weights base.

Counter-examples

Looks similar, but isn't

Not an instance
Pure RLHF with human-only preference data.
Not an instance
Rule-based output filtering after generation (a different mitigation layer).

Editorial commentary

Bai et al. (2022) at Anthropic introduced Constitutional AI as a complement to RLHF. The constitution is a list of natural-language principles (often dozens of items); the model is trained via self-critique to revise its outputs to better satisfy them. The approach reduces dependence on continuous large-scale human labelling for refinement. This entry covers the methodological concept rather than a specific product implementation.

References

Bai et al., 'Constitutional AI: Harmlessness from AI Feedback' (arXiv 2022).

Also known as

CAI · RLAIF (related concept)

Machine-readable encodings

Use in your systems

JATS XML <role> element

xml

<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Constitutional AI (concept)"
      vocab-term-identifier="https://casrai.org/dictionary/term/constitutional-ai-concept" />

Schema.org DefinedTerm (JSON-LD)

json

{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "@id": "https://casrai.org/dictionary/term/constitutional-ai-concept",
  "name": "Constitutional AI (concept)",
  "identifier": "https://casrai.org/dictionary/term/constitutional-ai-concept",
  "description": "A training methodology in which a model is trained to align its outputs with a written set of principles ('a constitution'), with the model itself used to critique and revise candidate responses against those principles in place of direct human feedback at scale.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-ml-research-outputs#set",
  "url": "https://casrai.org/dictionary/term/constitutional-ai-concept",
  "sameAs": [
    "CAI",
    "RLAIF (related concept)"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "publisher": {
    "@id": "https://casrai.org/#organization"
  },
  "dateModified": "2026-05-21T02:22:51",
  "inLanguage": "en"
}

Referenced across the research world

View CASRAI adoption →