Indirect Prompt Injection

Indirect Prompt Injection occurs when a LLM processes input from external sources that are under the control of an attacker.

Definition

Indirect Prompt Injection occurs when an LLM processes input from external sources that are under the control of an attacker, such as certain websites or tools. In such cases, the attacker can embed a hidden prompt in the external content, effectively hijacking the conversation's context. This results in the destabilization of the LLM's output, potentially allowing the attacker to manipulate the user or interact with other systems accessible by the LLM. Notably, these indirect prompt injections do not need to be visible or readable by humans, as long as they can be parsed by the LLM. A typical example is a ChatGPT web plugin that could unknowingly process a malicious prompt from an attacker's website, often designed to be inconspicuous to human observers (white font, for instance).

Key Concerns:

  1. Unauthorized data exfiltration: Extracting sensitive data without permission.
  2. Remote code execution: Running malicious code through the LLM.
  3. DDoS (Distributed Denial of Service): Overloading the system to disrupt services.
  4. Social engineering: Manipulating the LLM to behave differently than planned.

How Prompt Security Helps

To combat this, Prompt Security employs a sophisticated AI engine that detects and blocks adversarial prompt injection attempts in real-time, while ensuring minimal latency overhead. In the event of an attempted attack, besides blocking, the platform immediately sends an alert with full logging and visibility.

If you want to test the resilience of your GenAI apps against a variety of risks and vulnerabilities, including Indirect Prompt Injection, try out the Prompt Fuzzer. It's available to everyone on GitHub.

Time to see for yourself

Learn why companies rely on Prompt Security to protect both their own GenAI applications as well as their employees' Shadow AI usage.

Prompt Security Dashboard