How to run untrusted (and AI-generated) code safely

Ajay Kumar·June 13, 2026·10 min read

If you're building anything where code arrives at runtime — an AI agent that writes and runs scripts, a code interpreter, a CI system that builds arbitrary repos, a playground that executes user submissions — you have the same problem: you must run code you can't audit in advance, on infrastructure you care about. Getting this wrong ranges from a crashed process to a compromised host to leaked data from another tenant. This guide covers the threats that actually matter and a pattern that holds up.

What untrusted code can actually do

Be concrete about the failure modes, because the mitigation depends on them:

Destroy local state — rm -rf, fill the disk, exhaust memory or CPU.
Reach the network — exfiltrate secrets, call internal services, mine crypto, attack third parties from your IP.
Read secrets — environment variables, mounted credentials, cloud metadata endpoints (169.254.169.254).
Escape its boundary — break out of the container/sandbox to the host, then to other tenants.
Persist — leave something behind in a reused environment that affects the next run.

The most common real-world leak isn't an exotic kernel exploit — it's a sandbox with ambient network access reading a cloud metadata endpoint or an env var that shouldn't have been there. Isolation is necessary but so is a locked-down environment.

Approaches that don't hold up

Running it in your process (eval, exec, a subprocess on the host) — no boundary at all. Never do this with untrusted input.
A language-level sandbox (restricting Python builtins, a JS VM context) — repeatedly broken; determined code escapes the interpreter. Fine for defense in depth, never as the only layer.
A plain container — better, but it shares the host kernel. Acceptable for trusted code; for untrusted multi-tenant code, a kernel bug or escape compromises the host. AWS, Google, and others moved untrusted workloads off plain containers for exactly this reason.

The pattern that holds: an isolated, ephemeral microVM

The durable answer combines three properties: a hardware isolation boundary (a microVM, not a shared kernel), an ephemeral environment (fresh per task, destroyed after — so nothing persists and nothing leaks forward), and a controlled network (default-deny egress, no host credentials, no metadata endpoint). A microVM platform gives you the first two by construction and lets you enforce the third per sandbox.

Here's the shape of it with PandaStack — create a throwaway microVM, run the untrusted code in it, read the result, and let it be destroyed:

from pandastack import Sandbox

# Each task gets its own hardware-isolated microVM (~179ms to create).
with Sandbox.create(template="code-interpreter", ttl_seconds=300) as sb:
    # Whatever the agent/user produced — run it inside the VM, never on the host.
    result = sb.exec("python3 -c 'print(sum(range(100)))'", timeout_seconds=30)
    print(result.stdout, result.exit_code)
# Context manager kills the VM on exit — no leftover state for the next run.

The agent can rm -rf its own filesystem, spike CPU, or try to phone home — and the blast radius is one disposable VM that you were going to throw away anyway. Your host and other users are behind a hardware boundary the code never touches.

Hardening checklist

One environment per task or per tenant — never share a sandbox across untrusted callers.
Always set a TTL so an abandoned or runaway VM is reaped automatically.
Don't inject real secrets into the sandbox; pass only what the task needs, scoped and short-lived.
Treat network egress as default-deny and allow only what the task requires.
Set CPU/memory limits so one task can't starve the fleet.
Capture output through the platform API, not by mounting host paths into the guest.

Why this matters even more for AI agents

With a human writing code, you at least have a person to hold accountable. With an autonomous agent, the code is generated, the inputs may be adversarial (prompt injection), and the loop runs unattended. The only safe assumption is that any individual command might be hostile or simply wrong — so each runs in a disposable, isolated VM, and the worst case is a wasted sandbox. That assumption, made cheap by sub-second microVM creation, is what lets you give an agent a real shell and filesystem without giving it your infrastructure.

Frequently asked questions

Is a Docker container enough to run untrusted code?

For untrusted or AI-generated code, a plain container is not enough on its own — containers share the host kernel, so a kernel bug or container escape can reach the host and other tenants. Use a hardware-isolated microVM (e.g. Firecracker), or a secure-container layer like gVisor/Kata, and pair it with an ephemeral environment and locked-down network egress.

How do I safely run code an LLM generated?

Run it inside a fresh, hardware-isolated microVM that is destroyed after the task, with no host credentials and default-deny network egress. Never eval/exec it in your own process or rely solely on a language-level sandbox. Platforms like PandaStack create such a microVM in ~179ms so a disposable VM per task is practical.

What's the difference between a language sandbox and a microVM sandbox?

A language sandbox restricts what code can do within an interpreter (e.g. limiting Python builtins) and is repeatedly bypassed by determined code — useful only as one layer. A microVM sandbox runs the code in a separate virtual machine with its own kernel and a hardware boundary, so an escape would have to break the hypervisor itself.

What is an ephemeral sandbox and why does it matter?

An ephemeral sandbox is created fresh for a single task and destroyed afterward. It matters because nothing persists between runs: no leftover files, processes, or secrets can leak forward to the next caller, and a runaway or compromised task is discarded with its environment.

Run code in a microVM in one API call.

179ms p50 cold start. Fork, snapshot, and scale to zero.

Start free

Written by Ajay Kumar, Founder, PandaStack.