Secure code execution for AI agents: isolation, ephemerality, and network control
Secure code execution for AI agents means running model-generated commands, scripts, and file operations inside a disposable, hardware-isolated environment — one microVM per task, with no host credentials and default-deny network egress — so that even a hostile or hallucinated command can do nothing worse than waste a sandbox you were going to throw away. The reason agents need this at all is simple: a useful agent doesn't just talk, it acts, and the moment it can run a shell command or write a file, you are executing code you did not write and cannot audit in advance. This post ties together the three properties that make that safe — isolation, ephemerality, and network control — and explains why a microVM is the boundary that holds when the author is an LLM.
Why AI agents need to execute code at all
A language model on its own is a text predictor. It becomes an agent when you give it tools — and the most powerful, general tool is a computer. Let an agent run a shell, edit files, install packages, and call APIs, and it can do real work: analyze a dataset, reproduce a bug, scrape and transform data, build and deploy an app. Every capable coding or research agent in production today is, underneath, a loop that generates a command, runs it, reads the output, and decides what to do next.
That execution step is where the value is and where the danger is. The narrower alternatives — letting the model only call a fixed set of typed functions — cap what the agent can do to whatever you anticipated in advance. The whole point of giving an agent a real shell and filesystem is that you don't have to anticipate. The cost of that generality is that you are now running arbitrary, runtime-generated code on infrastructure you care about.
The trust problem: model-generated code and prompt injection
With a human engineer, you have a reviewer and someone accountable. With an autonomous agent, three things change at once, and they compound.
- The code is generated, not written — there is no author who understood it, and the model can confidently produce a command that deletes the wrong directory or leaks a file it was never meant to touch.
- The inputs may be adversarial — prompt injection means a web page, a file, or a tool result the agent reads can carry instructions that hijack the loop. The agent ingests untrusted text and then writes code based on it. Treat any command derived from external content as attacker-controlled.
- The loop runs unattended — there is no human in the path to catch the bad command before it executes, and the agent may run thousands of commands across a long task.
The isolation spectrum: from in-process eval to microVM
"Sandbox" is an overloaded word. The options form a spectrum from no boundary to a hardware boundary, and for agent tool execution most of the spectrum is disqualified. Walking it top to bottom:
- In-process eval / exec — running the agent's output in your own process (Python's eval/exec, a subprocess on the host). There is no boundary at all: the code runs with your process's permissions, secrets, and network. Never do this with model-generated input.
- Language-level sandbox — restricting interpreter builtins, a JS VM context, RestrictedPython. These are repeatedly and reliably escaped by determined code; the interpreter was not designed as a security boundary. Useful only as one defense-in-depth layer, never as the only one.
- Container (Docker, plain runc) — a real isolation improvement: namespaces, cgroups, a private filesystem and network. But a container shares the host's Linux kernel, and that shared syscall surface is the entire attack surface. A kernel bug or a container-escape exploit reaches the host and every neighbor on it. Fine for your own trusted code; not sufficient as the sole boundary for untrusted, multi-tenant, model-generated code.
- Hardened container runtime (gVisor, Kata) — narrows or replaces the shared-kernel surface, gVisor by intercepting syscalls in userspace, Kata by wrapping the container in a lightweight VM. A genuine step up, and a reasonable answer for many workloads.
- microVM (Firecracker) — each workload boots its own guest kernel inside CPU hardware virtualization (KVM). An escape has to break the hypervisor boundary itself — a deliberately tiny, heavily scrutinized surface — rather than the full Linux syscall interface. This is the boundary purpose-built for running untrusted code at scale.
The rule of thumb is the one I keep coming back to: containers isolate your code from your code; microVMs isolate your code from someone else's. When the someone-else is an autonomous agent acting on adversarial inputs, you want the hardware boundary. For the full container-versus-microVM comparison and where gVisor and Kata fit, see the companion posts on Firecracker vs Docker and on running untrusted code safely.
Why the microVM wins for agents specifically
The historical objection to VMs was startup cost — tens of seconds to boot a full machine makes a fresh-VM-per-task pattern absurd. Firecracker, the open-source VMM that AWS built for Lambda and Fargate, removed that objection: a stripped-down device model and a single guest kernel boot in milliseconds. PandaStack pushes it further with snapshot-restore — there is no warm pool of idle VMs; every create restores a baked snapshot on demand, with a p50 of 179ms to a live, isolated microVM. A same-host fork of an existing environment lands around 400ms.
That speed is what makes the secure pattern economically viable. If a clean, hardware-isolated environment costs you 179 milliseconds, you can afford to give every task — or every tenant, or every risky command — its very own VM and destroy it afterward. The expensive-isolation tradeoff that pushed everyone toward shared-kernel containers simply doesn't apply. Under the hood, memory uses copy-on-write (MAP_PRIVATE) and the rootfs uses reflink clones, so thousands of these microVMs stay dense rather than each carrying a full machine's overhead.
The ephemeral-per-task pattern
Isolation answers "can this code reach the host?" Ephemerality answers "can this run affect the next one?" An ephemeral sandbox is created fresh for a single task and destroyed when the task ends. Nothing persists: no leftover files, no lingering processes, no secrets cached on disk, no state for the next caller to stumble into or for a compromised task to leave behind.
This is the single most important operational discipline for agent platforms. Reusing one long-lived sandbox across tasks or tenants reintroduces exactly the cross-contamination that isolation was supposed to prevent. The pattern is one environment per task (or per tenant), always with a TTL so an abandoned or runaway VM is reaped automatically even if your code forgets to clean up. With sub-second creation, a fresh VM per task is not a luxury — it's the default.
Network and secret hardening
In practice the most common real-world leak from an agent sandbox is not an exotic kernel exploit — it's a perfectly isolated VM that still had ambient network access and an environment variable it shouldn't have. Isolation contains the blast radius of an escape; network and secret hygiene prevents the far more likely quiet exfiltration. Treat them as non-negotiable:
- Default-deny egress — block outbound network by default and allow only the specific destinations the task needs. An agent that can phone home can exfiltrate anything it reads.
- No host credentials in the guest — never inject your cloud keys, database passwords, or long-lived tokens. Pass only what the task requires, scoped and short-lived.
- Block the metadata endpoint — cloud instance metadata (169.254.169.254) is a classic credential-theft target; it must be unreachable from inside the sandbox.
- Resource limits — cap CPU and memory so one task (or a crypto-miner the agent was tricked into running) can't starve the fleet.
- Capture output through the platform API — read results over the sandbox API rather than mounting host paths into the guest, which would punch a hole straight back to your filesystem.
A working example: run an agent-produced command safely
Here is the whole pattern in a few lines with the PandaStack Python SDK. Create a throwaway, hardware-isolated microVM, run whatever the agent generated inside it, read the result over the API, and let the VM be destroyed on exit. The agent can rm -rf its own filesystem, spike CPU, or try to reach the network — the blast radius is one disposable VM.
from pandastack import Sandbox
# Whatever the agent decided to run this turn.
agent_command = "python3 -c 'import statistics; print(statistics.mean([2,4,6,8]))'"
# One hardware-isolated microVM per task (~179ms to create), auto-killed on exit.
with Sandbox.create(
template="code-interpreter", # python + node scientific stack
ttl_seconds=300, # reaped automatically if abandoned
metadata={"task": "agent-tool-call"},
) as sb:
result = sb.exec(agent_command, timeout_seconds=30)
print(result.stdout, result.exit_code)
# Context manager destroys the VM here — no state survives to the next task.The SDK reads PANDASTACK_TOKEN from the environment and talks to https://api.pandastack.ai by default; the same flow is available in the TypeScript SDK and the CLI. For an agent that needs to keep working state across several turns within one task, you can snapshot or fork the sandbox instead of recreating it — see the snapshots and forks documentation — but the boundary between tasks should still be a fresh environment.
When you don't need a microVM
Honesty matters more than maximalism. Not every agent action needs a microVM, and reaching for one indiscriminately adds latency and cost you don't need.
- If your agent only ever calls a fixed, typed set of functions you wrote — and never executes free-form code, shell, or files — the function boundary itself is your sandbox; you don't need a VM per call.
- If you are running only your own trusted, reviewed code with no untrusted input anywhere in the loop, a container is a reasonable boundary and simpler to operate.
- Pure read-only retrieval or API orchestration with no code execution doesn't need an execution sandbox at all.
The moment any of those assumptions breaks — the agent can write and run code, or it ingests untrusted content that influences what it runs, or you're multi-tenant — you are back to needing isolation, ephemerality, and network control together. For agent tool execution that's the common case, which is why the microVM-per-task pattern has become the default for serious agent platforms. The model will keep getting better at writing code; your job is to make sure that the one time it writes something terrible, the only casualty is a sandbox you were about to delete anyway.
Frequently asked questions
What is the safest way to run AI-generated code?
Run it inside a fresh, hardware-isolated microVM that is created for a single task and destroyed afterward, with no host credentials and default-deny network egress. Never eval/exec model output in your own process, and don't rely on a language-level sandbox alone, since those are routinely escaped. Platforms like PandaStack create such a microVM in about 179ms, which makes a disposable VM per task practical rather than expensive.
Why isn't a Docker container enough for an AI agent sandbox?
A container shares the host's Linux kernel, so its entire attack surface is the kernel's syscall interface — a kernel bug or container-escape exploit reaches the host and every neighboring container. That's an acceptable risk for your own trusted code but not for untrusted, model-generated code in a multi-tenant agent. A microVM boots its own guest kernel inside hardware virtualization, so an escape would have to break the much smaller hypervisor boundary instead.
How does prompt injection affect code execution security?
Prompt injection means untrusted text an agent reads — a web page, a file, a tool result — can carry instructions that hijack the agent's loop and cause it to generate hostile commands. Because the agent ingests untrusted input and then writes code based on it, any command derived from external content must be treated as attacker-controlled. The defense is environmental: run every command in a disposable, isolated sandbox so a hijacked command can only damage a throwaway VM.
Why should each agent task run in its own ephemeral sandbox?
An ephemeral, per-task sandbox guarantees that nothing persists between runs: no leftover files, processes, or secrets can leak forward to the next caller, and a runaway or compromised task is discarded along with its environment. Reusing one long-lived sandbox across tasks or tenants reintroduces the cross-contamination that isolation is meant to prevent. With sub-second microVM creation, a fresh VM per task is cheap enough to be the default rather than an optimization.
179ms p50 cold start. Fork, snapshot, and scale to zero.