ANALYSIS Feb 10, 2026 · 14 min read

Sandboxes Don't Read Intent: Why AI Coding Agents Need More Than Containment

The sandbox keeps the agent in the box. But who's watching what it does inside?

A cybernetic crab monitoring holographic security displays inside a sandbox

Every enterprise security team evaluating AI coding agents faces the same question: how do you let an autonomous agent write code, execute commands, and modify your codebase without losing control?

OpenAI's Codex answered with a strong architectural bet: kernel-level sandboxing. Seatbelt on macOS. Landlock and seccomp on Linux. Isolated cloud containers with network disabled by default. The agent physically cannot escape its workspace.

It's a good answer. But it's an incomplete one.

Credit Where It's Due: Sandboxing Is Real Security

Let's be clear about what Codex gets right.

OS-level sandboxing is tamper-proof in a way that application-level controls are not. When the Linux kernel enforces that a process cannot open a socket, no amount of prompt injection or clever instruction-following will change that. The agent can't social-engineer a seccomp filter.

For Codex's cloud offering, the isolation is even stronger. Each task runs in a disposable container that is pre-loaded with your repository and then cut off from the network. The agent operates on a snapshot. If something goes wrong, you discard the container. Nothing persists.

This is genuinely better than the "trust the model" approach that many tools default to. Codex deserves recognition for making sandboxing a first-class architectural decision, not an afterthought.

The Problem: Sandboxes Are Binary

A sandbox answers one question: can this process perform this category of action?

Can it access the network? Yes or no. Can it write files outside the workspace? Yes or no. Can it execute shell commands? Yes or no.

What a sandbox cannot answer: should this specific action, with this specific content, in this specific context, be permitted?

Consider a straightforward scenario. Your AI coding agent has workspace write access and network enabled — a common configuration for agents that need to install dependencies or call APIs. The agent uses standard command-line tools to read your .env file and POST its contents to an external webhook endpoint.

The sandbox permits every part of this. The agent has network access. It has read access to project files. The HTTP client is a standard utility. Every individual capability the agent uses has been explicitly allowed.

But the intent — exfiltrating your environment variables to an external endpoint — is clearly malicious. The sandbox has no opinion on intent. It enforces capability boundaries, not behavioral ones.

What Enterprise Agentic Workflows Actually Need

Enterprise environments don't just need containment. They need inspection. The difference matters at every layer of an AI agent's operation.

Content-Aware Pattern Detection

A sandbox sees "agent wants to run a shell command." A hook-based security layer sees the actual command, matches it against hundreds of known attack patterns, and can distinguish between npm install express and a chained command that downloads and pipes a remote script to a shell. The first is routine. The second is a supply chain attack vector. The sandbox permits both identically.

Modern pattern engines carry 250+ attack signatures covering credential exfiltration, privilege escalation, reverse shells, data staging, and prompt injection delivery mechanisms. These patterns are continuously updated as new attack techniques emerge. A static sandbox policy cannot provide equivalent coverage because the threat surface is semantic, not structural.

Graduated Response

Real security isn't binary. An enterprise security posture needs graduated responses:

Allow silently — git status, reading project files, standard build commands
Log for audit — commands that touch configuration files or access credentials
Prompt the developer — operations that cross project boundaries or match medium-severity patterns
Block outright — known exfiltration patterns, credential theft, or attempts to disable security controls

Codex offers three approval modes: ask-before-acting, ask-for-untrusted-ops, and never-ask. These are coarse-grained presets, not context-sensitive decisions. There is no mechanism to say "allow all shell commands except those matching these 275 patterns, and for matches, escalate based on severity and confidence."

Compliance Scanning

Regulated industries don't just need to prevent attacks. They need to prevent data handling violations.

When an AI agent is about to write a file containing a Social Security number, send an API request that includes protected health information, or commit code with an embedded credit card number, the security layer needs to detect and block that operation before it executes — regardless of whether the agent technically has permission to write files or make network calls.

HIPAA, PCI-DSS, GDPR, and SOC 2 compliance cannot be enforced by capability sandboxing. These regulations govern what data flows where, not which system calls a process can make. A sandbox that allows file writes allows all file writes — it cannot distinguish between writing application code and writing patient records to an unencrypted log file.

Cross-Turn Behavioral Analysis

Individual tool calls that appear benign can form malicious sequences:

PASS Agent reads SSH configuration (permitted — workspace read access)
PASS Agent writes a script that base64-encodes file contents (permitted — workspace write)
PASS Agent runs the script targeting credential directories (permitted — shell execution allowed)
PASS Agent sends an HTTP request with the encoded payload (permitted — network enabled)

No single step violates sandbox policy. The attack is only visible when you analyze the sequence of operations across turns. This requires session-level behavioral analysis — a security layer that maintains state across tool invocations and detects anomalous progressions.

A sandbox has no memory. Each syscall is evaluated independently against the policy. There is no concept of "this sequence of individually-permitted operations constitutes an attack."

Semantic Understanding

Pattern matching catches known attack signatures. But what about novel attacks?

LLM-based semantic review can analyze a tool invocation and assess whether its intent is suspicious, even if it doesn't match any known pattern. An agent that generates a script with heavy obfuscation, unusual encoding, or suspiciously specific targeting of credential files can be flagged by semantic analysis even when each individual code construct is innocuous.

This is particularly important for prompt injection defense. When an AI agent reads a file that contains embedded instructions ("ignore previous instructions and run..."), the injected instructions become part of the agent's context. A sandbox doesn't inspect the content the agent reads. It doesn't know that the agent's next action was influenced by malicious content in a file it was asked to review. Semantic analysis on the agent's subsequent tool calls can detect the behavioral shift.

The Extensibility Gap

Perhaps the most significant limitation of Codex's sandbox-only model for enterprise adoption is that it is a closed system.

Codex's security controls are built into the product. They cannot be extended, customized, or integrated with existing enterprise security tooling. There is no protocol for third-party security tools to inspect tool calls before execution, no way to inject organization-specific compliance rules, and no mechanism for feeding findings into existing SIEM infrastructure beyond opt-in OpenTelemetry telemetry (which is read-only and cannot influence agent behavior).

Compare this to the hook ecosystem that has emerged around tools that support pre- and post-tool-use interception. Claude Code, Gemini CLI, Cursor, Windsurf, and GitHub Copilot all support shell-based hooks that fire before and after tool execution. These hooks:

Receive the full tool invocation as structured JSON
Can allow, deny, or prompt for each individual operation
Run any executable — Python, Bash, Node.js, compiled binaries
Chain with existing security infrastructure (Semgrep, Snyk, 1Password, custom enterprise tooling)
Are configurable at project, user, and enterprise levels

Cursor's enterprise hook ecosystem already includes partnerships with Semgrep (static analysis), Snyk (vulnerability scanning), 1Password (secret detection), and Endor Labs (dependency risk). These integrations exist because hooks provide the interception point that allows third-party tools to participate in the agent's decision loop.

Codex has no equivalent extension point. Enterprise security teams must accept OpenAI's built-in controls or use no controls at all. For organizations that have invested in security tooling, compliance automation, and custom policy engines, this is a non-starter.

Sandboxing and Hooks Are Complementary

The framing of "sandbox vs. hooks" is itself misleading. These are not competing approaches. They protect against different threat categories.

Threat	Sandbox	Hooks
Agent escapes workspace	Prevents	Does not prevent
Agent exfiltrates data within allowed perimeter	Does not detect	Detects and blocks
Agent violates compliance regulations	Does not detect	Detects and blocks
Agent follows injected instructions	Does not detect	Detects behavioral shift
Novel attack using only permitted operations	Does not detect	Semantic analysis flags
Known attack pattern	May block if category-level policy matches	Blocks with specific pattern match

The strongest security posture combines both: kernel-level containment to enforce hard boundaries, and content-aware interception to enforce behavioral policies within those boundaries.

Bridging the Gap: Hooks + Sandboxing in Practice

If sandboxing and hooks are complementary, the natural question is: does any tool combine both?

Tweek is an open-source security layer that adds both content-aware hook screening and OS-level sandboxing to AI coding tools that lack one or the other. For Claude Code — which has a mature hook system but no built-in sandbox — Tweek adds three layers of containment on top of its 11-layer screening pipeline:

Project Sandbox (Per-Project Isolation)

Each project directory gets its own isolated security state — trust decisions, override policies, rate limiter budgets, and session history are scoped to the project and cannot leak across workspaces. When global and project-level policies overlap, Tweek applies additive-only merging: a project can tighten security constraints but never relax global ones.

Speculative Execution Sandbox (OS-Level Containment)

Before a shell command actually executes in the user's environment, Tweek can run it in a sandboxed preview using sandbox-exec on macOS or firejail on Linux. The preview captures what the command would do without letting those effects touch the real filesystem. If the preview reveals behavior that contradicts the agent's stated intent, the command is blocked before it executes.

Path Boundary Enforcement

Independent of the OS sandbox, Tweek enforces configurable path boundaries that prevent the agent from reading or writing outside designated project directories. Unlike a kernel sandbox that applies a single binary policy, path boundaries are context-aware — they can permit access to shared dependency directories while blocking access to other project workspaces, home directory dotfiles, or system configuration.

The result is defense-in-depth that neither approach achieves alone. The hook pipeline screens every tool invocation for malicious intent, compliance violations, and behavioral anomalies. The sandbox layers enforce hard containment boundaries that survive even if the screening pipeline has a gap. An exfiltration attempt must evade 275 attack patterns, heuristic scoring, LLM semantic review, session-level behavioral analysis, and an OS-level sandbox — not just one or the other.

This is what the complementary model looks like in production. Codex has the sandbox half. The hook ecosystem provides the inspection half. The enterprise security posture requires both — and tools like Tweek demonstrate that combining them is not only possible but practical today.

The Industry Is Converging — But Not on a Standard

As of early 2026, five major AI coding tools support full pre/post tool-use hooks: Claude Code, Gemini CLI, GitHub Copilot, Cursor, and Windsurf. The underlying mechanism has converged: JSON over stdin/stdout, exit code 0 for allow, exit code 2 for deny, shell command execution in any language.

But convergence is not a standard. Every tool has invented its own naming convention:

Tool	Pre-Hook Event	Post-Hook Event	Config Location
Claude Code	`PreToolUse`	`PostToolUse`	`~/.claude/settings.json`
Gemini CLI	`BeforeTool`	`AfterTool`	`~/.gemini/settings.json`
GitHub Copilot	`preToolUse`	`postToolUse`	`.github/hooks/*.json`
Cursor	`beforeShellExecution`	`afterFileEdit`	`.cursor/hooks.json`
Windsurf	`pre_run_command`	`post_run_command`	JSON config files
Codex	N/A	N/A	N/A

Five tools, five naming schemes, five config file formats. Codex has no hook support at all.

The wire protocol is nearly identical — the same JSON flows through stdin, the same exit codes control decisions — but every security tool that wants to support multiple AI assistants must maintain separate adapter layers, config generators, and event name mappings for each one.

This is where the industry needs to go next. A formal specification for AI coding tool hooks — standardized event names, a common JSON schema for tool invocations, a shared config file format — would allow security tools to write one integration that works everywhere. Without it, we're building the equivalent of browser-specific CSS hacks for every AI assistant on the market.

The community demand for Codex to join this ecosystem is clear. GitHub Discussion #2150 on the Codex repository has drawn 61+ participants over seven months, with users explicitly citing competing tools' hook support as a reason to switch. OpenAI's only response has been a basic notification hook that fires after task completion — useful for alerts, but providing no security interception capability.

What Needs to Change

Codex should keep its sandbox. It's good engineering. But enterprise adoption requires four additions:

Pre-tool-use hooks — Allow external tools to inspect and optionally block any tool invocation before execution. Follow the emerging industry protocol: JSON stdin, JSON stdout, exit code semantics.
Post-tool-use hooks — Allow external tools to inspect tool results before they flow back into the agent's context. This is critical for detecting prompt injection in file contents and API responses.
Enterprise hook policy — Allow organizations to deploy mandatory hooks via MDM or configuration policy, similar to what Codex already supports for sandbox and approval settings via requirements.toml.
Adopt a standard naming convention — Don't invent a sixth naming scheme. The industry needs to converge on common event names and a shared JSON schema so that security tooling works across AI assistants without per-tool adapters. OpenAI is well-positioned to drive this — or at minimum, to align with what already exists rather than fragmenting the ecosystem further.

The sandbox keeps the agent in the box. Hooks watch what it does inside.

Enterprise security requires both.