Jailbreak

Jailbreaking represents a category of prompt injection where an attacker overrides the original instructions of the LLM, deviating it from its intended behavior and established guidelines.

Definition

Jailbreaking, a type of Prompt Injection refers to the engineering of prompts to exploit model biases and generate outputs that may not align with their intended behavior, original purpose or established guidelines.

By carefully crafting inputs that exploit system vulnerabilities, the LLM can eventually respond without its usual restrictions or moderation. There have been some notable examples, such as the "DAN" or "multi-shot jailbreaking", where the AI systems responded without their usual constraints.

Key Concerns:

  1. Brand Reputation: Preventing damage to the organization's public image due to undesired AI behavior.
  2. Decreased Performance: Ensuring the GenAI application functions as designed, without unexpected deviations.
  3. Unsafe Customer Experience: Protecting users from potentially harmful or inappropriate interactions with the AI system.

How Prompt Security Helps

To mitigate these risks, Prompt Security diligently monitors and analyzes each prompt and response. This continuous scrutiny is designed to detect any attempts of jailbreaking, ensuring that the homegrown GenAI applications remain aligned with their intended operational parameters and exhibit behavior that is safe, reliable, and consistent with organizational standards.

If you want to test the resilience of your GenAI apps against a variety of risks and vulnerabilities, including Jailbreaking, try out the Prompt Fuzzer. It's available to everyone on GitHub.

Time to see for yourself

Learn why companies rely on Prompt Security to protect both their own GenAI applications as well as their employees' Shadow AI usage.

Prompt Security Dashboard