How to Sandbox Untrusted & AI-Generated Code

Ajay Kumar·June 17, 2026·14 min read

Sooner or later every backend has to run code it did not write and cannot fully trust. A playground executes user submissions. A CI system builds arbitrary repositories. A code interpreter runs whatever a notebook cell contains. And increasingly, an AI agent writes a shell command at runtime and you execute it. In all of these you are running code on infrastructure you care about, with no chance to review it first. The question is not whether to sandbox it — it's which boundary actually holds when the code is hostile, and that depends entirely on what you are defending against. This post is the map: the trust problem, the threat models that decide everything, the isolation options ranked honestly, and the case for why a microVM is the right default for arbitrary untrusted code. It links down to the deeper pieces rather than repeating them.

The trust problem: code you can't review, on infra you can't afford to lose

"Untrusted" is not a vibe — it's a precise property: code whose behavior you cannot bound in advance because you did not write it, cannot read it before it runs, or cannot trust whoever did. Three sources of it dominate, and they're converging onto the same infrastructure:

User-submitted code — coding challenge graders, online IDEs, data-science notebooks, plugin systems. The author is a stranger and some fraction of them are actively hostile.
CI and build systems — running a pull request, a fork, or a third-party dependency's build script means executing arbitrary code with whatever access the build environment has. Supply-chain attacks live here.
LLM-generated code — an agent that can run a shell is, by construction, executing code authored by a model in response to inputs you don't control. This is the fastest-growing category and the one with the worst review story: there is no human author who understood the command, and the input that produced it may itself be adversarial.

What unifies them is the absence of a trustworthy author plus the presence of real stakes. If the code escapes its boundary it reaches your host; on a multi-tenant platform it reaches every other customer's data. The security property you want is therefore not about the code at all — it's about where the code runs. You cannot make a model perfectly trustworthy or vet every submission, so you make the environment expendable: the worst thing a hostile run can do is destroy a box you were going to throw away.

Reframe it as a budget. Every isolation boundary costs something — latency, density, operational complexity, hardware. The job isn't "maximum isolation," it's matching the strength of the boundary to the threat you actually face. Over-isolating a trusted internal batch job wastes money; under-isolating a public multi-tenant code runner is how you end up in an incident report.

Start with the threat model, not the tool

Before choosing a sandbox, answer two questions. They determine everything downstream, and skipping them is how teams end up with either a false sense of security or a wildly over-engineered stack.

What are you defending — and against whom?

Host integrity — can the code break out and run as root on the machine? This is the canonical container-escape concern and the baseline for any untrusted workload.
Tenant isolation — on a shared platform, can tenant A read or corrupt tenant B's data or interfere with their jobs? An escape that reaches the host usually reaches every neighbor too, so this is mostly a consequence of the first.
Data exfiltration — even with no escape at all, can the code phone home with secrets it found in the environment? This is the most common real-world leak and it's a network/secrets problem, not an isolation one.
Resource abuse — can one run exhaust CPU, memory, disk, or PIDs and starve everything else (a fork bomb, a crypto-miner, a memory balloon)?
Guest confidentiality — the inverted threat: can the host operator read the guest's memory? This matters only when you don't trust the infrastructure provider itself, and it changes the whole design (more on this below).

What's your isolation budget?

Stronger boundaries cost more — in startup latency, in achievable density per host, in operational surface, sometimes in needing bare-metal hardware for nested virtualization. A boundary you can't afford to run per-task isn't a boundary you'll actually use; people quietly reuse one long-lived sandbox across callers and reintroduce exactly the cross-contamination the isolation was for. So the real question is: how strong a boundary can you stand up cheaply enough to use it the way the threat model demands — ideally one fresh environment per task? Historically that tension is what pushed everyone toward shared-kernel containers; the rest of this post is largely about why that tradeoff has changed.

The isolation options, at a glance

The mainstream options form a ladder of increasing (and in one case, different) isolation, with a language-level sandbox sitting off to the side as its own category. This is the short tour; the canonical, rung-by-rung treatment — including the precise mechanics, the overhead tradeoffs, and the terminology traps an expert will catch — lives in the dedicated isolation-hierarchy piece (/blog/code-isolation-hierarchy). Here's the shape of it:

Bare process — no boundary. Running an interpreter's eval/exec or a subprocess on your host gives untrusted code your process's permissions, secrets, and network. Never do this with input you don't control.
Container (Docker, runc) — a real, useful boundary assembled from kernel features: namespaces (what a process can see), cgroups (what it can consume), capabilities (which privileged operations it keeps), seccomp-bpf (which syscalls it may call), and optionally an LSM like AppArmor or SELinux. But all of those are enforced by the host kernel, which is shared with every other container. That's the structural ceiling — covered next.
gVisor (runsc) — a user-space kernel written in Go that intercepts the application's syscalls and reimplements them itself, so the workload mostly talks to gVisor instead of the host kernel directly. It shrinks the host kernel attack surface dramatically, at a syscall-interposition performance cost, and sits squarely between containers and microVMs. Note: even in its KVM platform mode it is not a hardware VM — it keeps a process model.
Kata Containers — runs each container or pod inside a lightweight VM with its own guest kernel, giving microVM-class isolation behind a container-like (OCI) UX. Kata is the runtime/orchestration layer, not the hypervisor; it sits on top of a VMM (QEMU, Cloud Hypervisor, or Firecracker).
microVM (Firecracker, Cloud Hypervisor) — each guest boots its own kernel inside CPU hardware virtualization (Intel VT-x / AMD-V via KVM). An escape has to break the hypervisor boundary, not the shared Linux syscall surface. This is the default we'll argue for.
Confidential VM (AMD SEV-SNP, Intel TDX) — a microVM whose memory is encrypted and attested in hardware so that even the host/hypervisor can't read guest plaintext. This answers a different question (protect the guest from the host) and is the top of the ladder for that threat model.

Off the ladder entirely: WebAssembly (WASM/WASI) is a language-level, capability-based sandbox. A module runs in its own linear memory and gets no ambient host access — no files, no network — unless the host explicitly grants a capability. It's excellent for fine-grained untrusted plugins, but it is not a drop-in for "run an arbitrary Linux process or a Python script with native dependencies," so it solves a different problem than the rest of this list.

One honest caveat on the whole ladder: it is not a single monotonic security scale. Confidential VMs change the threat model rather than simply adding more isolation, and WASM is a different category of sandbox altogether. Read it as "increasing or different guarantees," not "each rung is strictly safer than the one below."

Why a container isn't the boundary (for arbitrary untrusted code)

Containers are a strong isolation mechanism and a weak security boundary, and the distinction is the whole ballgame. Every container on a host shares one Linux kernel. The namespaces, cgroups, capabilities, seccomp filters, and LSM profiles that isolate a container are all features of that same kernel — which means the kernel is simultaneously the thing running the untrusted code and the thing being protected from it. A bug reachable through the syscall interface can therefore defeat the very mechanisms meant to contain the process. This isn't a vendor disclaimer; it's the consensus of the container-security literature (NIST and others).

Escapes come in three flavors. A kernel privilege-escalation bug reachable via a syscall from inside the container — success means host compromise. A bug in the container runtime itself — the well-known runc escape of 2019 (CVE-2019-5736), since fixed, let a malicious container overwrite the host runc binary via a leaked /proc/self/exe file descriptor. And by far the most common in practice: dangerous misconfiguration — a --privileged container, a mounted docker.sock, host bind mounts, or a leaked CAP_SYS_ADMIN. Those last ones aren't bugs, they're by-design behavior that's trivially exploitable when misused.

The most common misconception we see: "we locked it down with seccomp and dropped all capabilities, so it's safe." Those measures genuinely reduce risk — they shrink the reachable syscall surface and stop many real attacks. But the kernel is still shared, and a vulnerability in any syscall you left allowed is still exploitable. Seccomp is defense-in-depth, not a hardware boundary. "Reduces risk" and "is a security boundary equivalent to a hypervisor" are different claims; don't let the first quietly stand in for the second.

None of this means containers are useless — for your own trusted code, or behind another boundary, they're the right tool. It means a plain container is not sufficient as the sole boundary for arbitrary untrusted, multi-tenant, or model-generated code. The deep dive on the escape mechanics and the shared-kernel attack-surface argument is its own post (/blog/why-docker-is-not-a-sandbox); the practical feature-by-feature comparison, including where gVisor and Kata land, is in /blog/firecracker-vs-docker.

The microVM default — and why it earns it

For arbitrary untrusted or AI-generated code, the right default boundary is a microVM: each workload gets its own guest kernel, isolated by hardware virtualization. The argument is about attack surface, not faith. Instead of sharing the host's full Linux syscall ABI — well over 300 syscalls, the entire interface a container can probe — a microVM exposes the host to a deliberately tiny surface: the VMM plus the KVM ioctl interface plus a minimal virtio device model. To reach the host, untrusted code first has to compromise its own guest kernel and then break the hypervisor boundary, rather than finding one reachable kernel bug.

Firecracker, the open-source VMM AWS built for Lambda and Fargate, is the sharpest version of this. It's written in Rust (a memory-safe language, which removes a major class of bug — though not logic bugs, panics, or unsafe FFI). It ships a minimal device model — virtio-net, virtio-block, vsock, a serial console, and little else — instead of a full emulated hardware platform, so the surface a malicious guest can poke at is small. And it runs inside a jailer that sets up a chroot and cgroups, drops privileges, and applies tight per-thread seccomp filters, so even a VMM compromise faces a narrow, argument-constrained syscall surface as a second line of defense.

Stronger does not mean absolute, and a security audience should hold us to that. KVM has had real guest-to-host escape CVEs — Google runs a dedicated KVM bug bounty (kvmCTF) paying up to $250,000 for a full escape, which is itself evidence both that the surface is small enough to target and that it's not zero. The virtio device model is the realistic in-VMM attack target, and microarchitectural side channels (Spectre-class, MDS, and newer guest-to-host variants) cross the VM boundary in principle because CPU state is shared at the hardware level. The honest claim is: a microVM is a meaningfully smaller, more-audited, hardware-enforced boundary than a shared kernel — not an unbreakable one.

The historical objection to VMs was startup cost: a fresh-VM-per-task pattern is absurd if a VM takes tens of seconds to boot. Firecracker collapsed that to milliseconds, and PandaStack pushes it further with snapshot-restore — there's no warm pool of idle VMs; every create restores a baked snapshot on demand, with a p50 of 179ms (about 203ms p99) to a live, isolated microVM. The first spawn of a brand-new template cold-boots in roughly 3 seconds and then bakes a snapshot, so every subsequent create takes the fast path. That speed is what makes the secure pattern affordable: if a clean hardware-isolated environment costs you ~179ms, you can give every task, tenant, or risky command its own VM and destroy it after. For how that boot path works, see /blog/what-is-a-microvm and the lifecycle docs (/docs/concepts/sandbox-lifecycle).

Ephemerality and network isolation: the force-multipliers

The isolation boundary answers "can this code reach the host?" Two more properties decide how bad a contained run can actually be, and in practice they prevent more incidents than the boundary itself.

Ephemerality answers "can this run affect the next one?" An ephemeral sandbox is created fresh for a single task and destroyed when it ends — no leftover files, no lingering processes, no cached secrets, no state for a compromised run to leave behind for the next caller. This is the single most important operational discipline for an untrusted-code platform, and it's only practical because creation is sub-second. The pattern is one environment per task (or per tenant), always with a TTL so an abandoned or runaway VM is reaped automatically. When you do need to keep working state within a single task across several steps, snapshot or fork the sandbox rather than reusing one across trust boundaries — PandaStack's same-host fork lands around 400ms via copy-on-write (guest memory MAP_PRIVATE, rootfs XFS reflink); see /blog/snapshot-and-fork-explained and /docs/concepts/snapshots-and-forks.

Network and secret hygiene prevents the quiet exfiltration that a perfect isolation boundary does nothing about. The most common real leak isn't an exotic kernel exploit — it's a perfectly isolated VM that still had ambient network access and an environment variable it shouldn't have. Treat these as non-negotiable: default-deny egress with an allowlist for only what the task needs; no host credentials injected into the guest; the cloud metadata endpoint (169.254.169.254) unreachable from inside; CPU/memory limits so one run can't starve the fleet; and output captured through the platform API rather than by mounting host paths back into the guest. PandaStack's NATID networking gives each sandbox its own Linux network namespace, veth pair, and tap device (16,384 per-sandbox /30 subnets per agent), so egress isolation is per-sandbox by construction — see /docs/concepts/networking-natid.

Putting it together

The whole pattern is small: create a throwaway, hardware-isolated microVM, run whatever the user or agent produced inside it, read the result over the API, and let the VM be destroyed on exit. The code can rm -rf its own filesystem, spike CPU, or try to phone home — the blast radius is one disposable VM you were going to delete anyway.

from pandastack import Sandbox

# Whatever arrived at runtime — a user submission, a CI step, an agent command.
untrusted = "python3 -c 'print(sum(range(100)))'"

# One hardware-isolated microVM per task (~179ms to create), auto-killed on exit.
with Sandbox.create(
    template="code-interpreter",
    ttl_seconds=300,            # reaped automatically if abandoned
) as sb:
    result = sb.exec(untrusted, timeout_seconds=30)
    print(result.stdout, result.exit_code)
# Context manager destroys the VM here — no state survives to the next run.

The SDK reads PANDASTACK_API_KEY (keys are prefixed pds_) from the environment, with a configurable base URL; the same flow exists in the TypeScript SDK (@pandastack/sdk) and the pandastack CLI. PandaStack's core is Apache-2.0 and self-hostable on your own Linux KVM hosts (/dev/kvm) — you run the control-plane API and a per-host agent, and the sandboxes execute on your infrastructure, not someone else's. The same microVM substrate also backs managed PostgreSQL, git-driven app hosting, serverless functions, and durable volumes. For the agent-specific version of this pattern — prompt injection, per-task disposal, locked-down egress — see /blog/secure-code-execution-for-ai-agents; for the hands-on threat-by-threat walkthrough, see /blog/run-untrusted-code-safely.

When a lighter boundary is acceptable

Maximalism is a failure mode too. Reaching for a microVM where you don't need one adds latency and operational weight for no security gain. A lighter boundary is genuinely fine when the threat model is smaller:

You only ever run your own trusted, reviewed code with no untrusted input anywhere in the loop — a container is a reasonable boundary and simpler to operate.
Your agent calls only a fixed, typed set of functions you wrote and never executes free-form code, shell, or files — the function boundary is your sandbox; you don't need a VM per call.
The untrusted surface is fine-grained, language-level plugins with no need for native processes or arbitrary syscalls — a WASM/WASI capability sandbox can be a better fit than a full Linux VM.
You need defense-in-depth on a shared-kernel platform but can't move to full VMs — gVisor or Kata are real steps up and the right intermediate answer for many workloads.

The moment any of those assumptions breaks — the code is arbitrary, the input is adversarial, you're multi-tenant, or you simply can't bound what runs — you're back to wanting a hardware boundary plus ephemerality plus network control, together. For arbitrary untrusted and AI-generated code, that combination is the default, and a microVM is what makes it both safe and cheap enough to actually use. Pick the boundary by the threat, run the code somewhere you can afford to lose, and the worst day is a deleted sandbox instead of an incident.

Frequently asked questions

How do you run untrusted code safely?

Run it inside a hardware-isolated microVM that is created fresh for a single task and destroyed afterward, with default-deny network egress and no host credentials in the environment. The microVM gives each workload its own guest kernel so an escape must break the hypervisor boundary rather than the shared host kernel; the ephemerality stops one run from affecting the next; the network controls stop quiet data exfiltration. Don't rely on eval/exec in your own process or a language-level sandbox alone — both are routinely defeated. Sub-second microVM creation (PandaStack's is ~179ms p50) makes a disposable VM per task practical rather than a luxury.

Is a Docker container enough to sandbox untrusted code?

Not on its own for arbitrary untrusted, multi-tenant, or AI-generated code. A container shares the host's Linux kernel, and the namespaces, cgroups, capabilities, and seccomp filters that isolate it are all enforced by that same shared kernel — so a kernel bug reachable via a syscall, a runtime bug, or a dangerous misconfiguration (privileged container, mounted docker.sock, host mounts) can reach the host and every neighbor. Containers are the right tool for your own trusted code, and seccomp/capabilities meaningfully reduce risk, but they are not a hardware boundary equivalent to a VM. For untrusted code use a microVM, or a secure-container layer like gVisor or Kata as an intermediate step.

What is the difference between a container and a microVM for isolation?

A container is a process on the host that's restricted by kernel features (namespaces, cgroups, capabilities, seccomp); all containers share one host kernel, so the entire host syscall ABI is the attack surface. A microVM (such as Firecracker) boots its own guest kernel inside CPU hardware virtualization via KVM, so the host is exposed only to a small, audited surface — the VMM, the KVM ioctl interface, and a minimal virtio device model. The microVM is a meaningfully stronger, hardware-enforced boundary against escape, at the cost of running a separate kernel per workload; Firecracker keeps that cost low with a minimal device model and millisecond boots.

How should I sandbox AI-generated code specifically?

Treat every command an agent emits as potentially hostile or simply wrong, because the code is model-generated, the inputs may be adversarial (prompt injection), and the loop usually runs unattended with no human to catch a bad command. Run each task in its own ephemeral, hardware-isolated microVM with default-deny egress, no host secrets, and a TTL, so a hijacked or hallucinated command can only damage a throwaway environment. Keep working state within a single task by snapshotting or forking the sandbox rather than reusing one across tasks or tenants. This is the same pattern as for any untrusted code, applied with the assumption that the author is an LLM acting on inputs you don't control.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free

Written by Ajay Kumar, Founder, PandaStack.