Skip to main content
v2026.1714 entries · CC-BY 4.0
Dictionary termTrack CStablev2026.2

Constitutional AI (concept)

A training methodology in which a model is trained to align its outputs with a written set of principles ('a constitution'), with the model itself used to critique and revise candidate responses against those principles in place of direct human feedback at scale.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    A model trained with a 75-item constitution covering helpfulness, harmlessness, and honesty.

  • Is an instance

    A research replication of Constitutional AI on a smaller open-weights base.

Counter-examples

Looks similar, but isn't

  • Not an instance

    Pure RLHF with human-only preference data.

  • Not an instance

    Rule-based output filtering after generation (a different mitigation layer).

Editorial commentary

Bai et al. (2022) at Anthropic introduced Constitutional AI as a complement to RLHF. The constitution is a list of natural-language principles (often dozens of items); the model is trained via self-critique to revise its outputs to better satisfy them. The approach reduces dependence on continuous large-scale human labelling for refinement. This entry covers the methodological concept rather than a specific product implementation.

References

  • Bai et al., 'Constitutional AI: Harmlessness from AI Feedback' (arXiv 2022).

Also known as

CAI · RLAIF (related concept)

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Constitutional AI (concept)"
      vocab-term-identifier="https://casrai.org/dictionary/term/constitutional-ai-concept" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Constitutional AI (concept)",
  "identifier": "https://casrai.org/dictionary/term/constitutional-ai-concept",
  "description": "A training methodology in which a model is trained to align its outputs with a written set of principles ('a constitution'), with the model itself used to critique and revise candidate responses against those principles in place of direct human feedback at scale.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-and-ml-research-outputs/",
  "url": "https://casrai.org/dictionary/term/constitutional-ai-concept",
  "sameAs": [
    "CAI",
    "RLAIF (related concept)"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}

Adopted by research universities worldwide

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • Massachusetts Institute of Technology logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo

View CASRAI adoption →