Back to Blog

The Agentic AI Attack Surface: Where Risk Lives Beyond the Prompt

David Abutbul
May 5, 2026
Technical analysis of agentic AI security boundaries covering content ingestion, context translation, tool execution, and behavioral constraints in AI runtimes.
On this Page

Once an AI system can browse, execute code, and act on external content, the security problem stops being limited to the prompt. The interesting questions shift to the rest of the pipeline: what gets pulled in from the outside world, what gets turned into model context, what the model is allowed to execute, and where those actions are contained.

OpenAI’s Atlas browser exposes that full chain more directly than a standard chat interface.

What a Production Agentic AI System Looks Like

Atlas is OpenAI's browser-based agent: it can read pages, run code, and act on what it finds. Untrusted content enters through the browser, gets translated into model context, and can lead to tool use inside a constrained sandbox. 

Those boundaries are where page data starts influencing model behavior, where model output starts turning into action, and where guardrails are needed the most.

How to Map the Attack Surface of an Agentic Runtime

To locate where those boundaries sit in practice, we mapped Atlas from two directions: outside-in through the browser layer, and inside-out through the Python sandbox.

Outside-In: Process & UI Inspection

We began by analyzing the browser's communication patterns. By launching Atlas with specific positional arguments, we bypassed the standard UI wrapper to inspect the underlying processes and internal developer tools.

# Launching Atlas with DevTools enabled for inspection

/Applications/ChatGPT\\ Atlas.app/Contents/MacOS/ChatGPT\\ Atlas --auto-open-devtools-for-tabs

By monitoring syslog traces, we identified the critical interplay between the Chromium-based browser process and Aura, the internal orchestration layer managing the AI's state.

# Capturing high-level orchestration traces (Source: Debugging Atlas)

log show --style syslog --last 10m --predicate 'process == "ChatGPT Atlas" || process == "Aura"'

Inside-Out: Probing the Python Sandbox

Once we confirmed a stateful Jupyter environment was running on the backend, we used Python as a "beachhead" to map the internal filesystem and evaluate the container's isolation.

import os

# Recursively listing reachable paths to map the sandbox

def list_files(startpath):

    for root, dirs, files in os.walk(startpath):

        level = root.replace(startpath, '').count(os.sep)

        indent = ' ' * 4 * (level)

        print(f'{indent}{os.path.basename(root)}/')

# Mapping the hidden configurations

list_files('/home/sandbox')

The "PyPI-less" Tooling Ecosystem

Buried at /home/sandbox/.openai_internal/caas_jupyter_tools is a proprietary library stack with no presence on PyPI. The CaaS library is what moves data between the backend sandbox and the frontend UI.

With no public documentation and no external auditability, it is itself a trust boundary. You cannot audit what you cannot inspect. In any agentic deployment, internal tooling that operates between layers deserves the same scrutiny as the model itself.

The tooling stack is Atlas-specific. The infrastructure pattern it represents is not.

The Four Security Boundaries in an Agentic Runtime

Mapping Atlas makes the pipeline concrete. Four boundaries show up:

  1. Content Ingestion (The Browser Engine): The browser is the edge. It ingests untrusted HTML, scripts, and media. The boundary here is between the raw web and the orchestrator parsing that data.

  2. Context Translation (The <browser__document> Bridge): Raw web data isn't fed directly to the model. It is sanitized and structured into an internal XML-like envelope. This is a critical trust boundary: if a payload escapes this envelope, it becomes context.

<browser>

  <browser__document>

    <browser__document__url><https://example.com/article></browser__document__url>

    <browser__document__title>Example Article</browser__document__title>

    <browser__document__content>

      Extracted and sanitized text content...

    </browser__document__content>

  </browser__document>

</browser>

  1. Tool Execution (The ACE Python Sandbox): The LLM generates code to be executed. The boundary here relies on strict OS-level container isolation.

  2. Behavioral Constraints (Orchestration Layer): The rules dictating what the agent is allowed to do on behalf of the user, enforced outside the model's weights.

These four boundaries appear in Atlas, and they appear in any system built on the same architecture. The product name changes; the boundaries don't.

Profiling the Sandbox: Permitted and Blocked Zones

We mapped the operational permissions within the Atlas Python runtime to see these boundaries in action.

Permitted Operations: Operations permitted without triggering security filters, allowing for basic environmental profiling (e.g., verifying sys.version, benchmarking performance, auditing installed pip packages like defusedxml==0.7.1, and accessing the session-persistent workspace at /mnt/data/).

Blocked Operations: Operations are strictly restricted by the security model, resulting in system-level blocking.

  • Sensitive Paths: Access to /etc, /proc, /root, /var is denied to prevent host metadata and kernel info leaks.
  • Network Access: Commands like requests.get() fail. A strict network air-gap is enforced at the container level.
  • OS Execution: os.system('ls /') and raw bash subprocesses are blocked. Restricted syscalls prevent environment escapes.
  • Exploit Mitigation: External entities in XML and privilege escalation vectors are mitigated via hardened libraries.

These restrictions reflect OpenAI's choices about what to harden. They don't indicate what other implementations harden, or whether these boundaries hold under adversarial conditions. Only that they exist.

Agentic AI Threat Patterns

The same pattern shows up anywhere a model is allowed to ingest outside content and do something with it.

  • Context Poisoning & Vision Verification: Hidden HTML, off-screen elements, and other presentation tricks can change what the model treats as relevant context if the system ingests them blindly. Rendering the page and checking it against visible content is one approach to this, but it assumes the verification layer itself is reliable and that the attack surface is limited to what's visually detectable. Content that passes visual checks can still carry payloads in how it's structured.
  • Prompt Injection via Content: The real boundary is not the page itself but the translation layer that turns page content into model-readable context. If untrusted content crosses that boundary in the wrong form, the model stops seeing it as data and starts treating it as instructions. Structuring that translation into an envelope like <browser__document> creates a defined boundary, but that boundary needs to be continuously monitored. The envelope only works if nothing can write to it from the wrong side.
  • Agent Behavior & Runtime Guardrails: Once the model can act, the question is no longer just what it says, but what it is allowed to do. Hardcoded controls like localhost blocking and captcha restrictions address specific known abuse vectors, but they represent a fixed ruleset against an evolving problem. In agentic systems the control plane sits outside the model, which means the security of the system depends entirely on how well that control plane is defined, monitored, and updated.

How an AI Agent Processes a Request

To visualize how these boundaries interact, we reconstructed the end-to-end Chat Turn Lifecycle within Atlas.

Securing Agentic AI: The Attack Surface Beyond the Prompt

Atlas is one implementation. The pattern is broader.

Once a system can ingest external content, turn that content into model context, and use tools to act on the result, the security problem spreads across the full chain. The weak points are predictable: content ingestion, context translation, tool invocation, and execution boundaries.

That is the part worth inspecting. Not because Atlas is uniquely important, but because this is what agentic runtimes look like when you strip away the product layer. The prompt is only one input. The runtime around it is where the more interesting security decisions are being made.

Share this post