A prompt or interaction pattern that causes a language model to bypass its safety training and produce outputs the model was tuned to refuse, such as harmful instructions, restricted content, or violations of provider policy.

ByCASRAI Editorial Board

· Last updated 21 May 2026

Examples

Worked examples

Is an instance
A role-play prompt ('pretend you are an unrestricted AI...') that elicits otherwise-refused content.
Is an instance
An adversarial-suffix attack appending a learned token sequence that disables refusal.

Counter-examples

Looks similar, but isn't

Not an instance
A legitimate user query in scope of the model's intended use.
Not an instance
A prompt-injection attack delivered via tool input (different category).

Editorial commentary

Jailbreaks include role-play framings, multi-turn manipulation, encoding tricks (base64, ROT13), and adversarial-suffix attacks (Zou et al., 2023). Resistance to jailbreaking is a target of post-training (RLHF, constitutional AI) and a focus of red-team evaluation. The category is a moving target: each newly disclosed jailbreak typically prompts new mitigations.

References

Zou et al., 'Universal and Transferable Adversarial Attacks on Aligned Language Models' (arXiv 2023); Wei, Haghtalab, Steinhardt, 'Jailbroken: How Does LLM Safety Training Fail?' (NeurIPS 2023).

Also known as

LLM jailbreak

Machine-readable encodings

Use in your systems

JATS XML <role> element

xml

<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Jailbreak (LLM)"
      vocab-term-identifier="https://casrai.org/dictionary/term/jailbreak-llm" />

Schema.org DefinedTerm (JSON-LD)

json

{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "@id": "https://casrai.org/dictionary/term/jailbreak-llm",
  "name": "Jailbreak (LLM)",
  "identifier": "https://casrai.org/dictionary/term/jailbreak-llm",
  "description": "A prompt or interaction pattern that causes a language model to bypass its safety training and produce outputs the model was tuned to refuse, such as harmful instructions, restricted content, or violations of provider policy.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/ai-ml-research-outputs#set",
  "url": "https://casrai.org/dictionary/term/jailbreak-llm",
  "sameAs": [
    "LLM jailbreak"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "publisher": {
    "@id": "https://casrai.org/#organization"
  },
  "dateModified": "2026-05-21T02:22:51",
  "inLanguage": "en"
}

Referenced across the research world

View CASRAI adoption →