How to Build a Sandboxed AI Coding Agent (2026)

Ajay Kumar·June 24, 2026·10 min read

An AI coding agent is, mechanically, a loop: ask the model what to do, run what it says, feed the result back, repeat until the task is done or the budget runs out. The 'run what it says' step is the entire ballgame. Everything else is prompt engineering and plumbing; that one line is where your agent either does useful work or executes a model-generated shell command on the same kernel as your production database. This guide is the how-to for building that loop properly in 2026: the control flow, the isolation boundary that makes it safe, the code to wire it up, and the forking trick that turns a single agent into a search over solutions.

We'll build the loop conceptually, then implement it against a Firecracker-microVM sandbox (PandaStack, our open-source project — full disclosure up top so you can weight everything accordingly). The isolation principles are vendor-neutral and apply whether you use us, E2B, Modal, or roll your own; the comparison section later is honest about where other tools fit. The code is real, the dark parts are real, and the latency numbers I quote are only ever PandaStack's own measured figures.

The shape of the loop

Strip away the framework branding — LangGraph, the OpenAI Agents SDK, CrewAI, your own 200-line orchestrator — and a coding agent is the same four-beat cycle. The model proposes an action (usually a tool call: 'run this code'). You execute it somewhere. You observe the result (stdout, stderr, exit code, files written). You return that observation to the model and let it decide the next move. It's a read-eval-print loop where the 'eval' is a language model and the 'print' goes back into the prompt.

Plan — the model emits a tool call describing code to run (or decides it's finished).
Execute — you run that code in an environment the model does not control.
Observe — you capture stdout, stderr, exit code, and any artifacts.
Feed back — you append the observation to the conversation and loop, until a stop condition (task done, max steps, budget exhausted).

Step 2 is the one with teeth. The model writes code with the breezy confidence of someone who has never been paged at 4am, and it is wrong often enough that you must assume every action is hostile-by-accident. Not malicious — just untrusted. A model that hallucinates a library will, with equal cheer, hallucinate a `find / -delete` to 'clean up disk space.' Your job is to make that boring instead of catastrophic.

The single most common mistake: running the agent's code in the same process, container, or host as your application 'just for the prototype.' Prototypes ship. The blast radius of a confidently-wrong model is whatever you give it access to — so give it a disposable computer, not a shell on yours.

Why the isolation boundary is the product

When your agent runs code the model wrote, you are running untrusted code, by definition. The model is not an authority you can trust with your filesystem any more than you'd trust a stranger's pull request to run unreviewed as root. So the question 'where does the code execute?' is not an implementation detail — it is the security model of your entire product. There are three honest answers, in increasing order of how well you'll sleep.

Containers (namespaces + cgroups + seccomp) — fast and cheap, but every container shares the host's one kernel. A container is a polite suggestion to the kernel about what a process should see; a kernel-level escape ignores the suggestion and you've lost the host and every neighbour on it. Fine for code you wrote, risky for code a model wrote (see /blog/why-docker-is-not-a-sandbox).
User-space kernel (gVisor) — a second kernel, in user space, intercepts the guest's syscalls so they never hit the host kernel directly. A genuine step up from a plain container, with workload-dependent compatibility and performance trade-offs.
Hardware-virtualized microVMs (Firecracker, Kata) — each sandbox boots its own guest kernel, isolated by KVM. Guest code never touches the host kernel; the only exposed surface is a tiny, heavily-audited virtual machine monitor. This is the right default for arbitrary agent-written code — it's the same model AWS Lambda uses to run everyone's functions on shared hardware (/blog/firecracker-vs-docker, /blog/what-is-a-microvm).

The shared kernel is the whole story. With a container, one kernel bug or container escape compromises the host and every tenant on it. With a microVM, that same class of bug is contained to a single disposable VM that you were going to throw away in thirty seconds anyway. The honest caveat: 'microVM' is not 'immune' — VMMs have had bugs, KVM has had bugs. The accurate claim is 'dramatically smaller, better-audited attack surface than the full Linux syscall interface a container shares,' and you still layer seccomp, a privilege-dropping jailer, and per-sandbox egress controls on top (/blog/secure-code-execution-for-ai-agents). The deeper ranking lives in /blog/code-isolation-hierarchy.

Rule of thumb: if a single bad line of model-generated code can reach anything you'd be sad to lose, your isolation boundary is in the wrong place. The fix is almost never 'a better prompt' — it's moving execution behind a real boundary.

Step 1: give the agent a disposable computer

The primitive you want is 'create a fresh, isolated machine, run something on it, throw it away.' With a Firecracker-backed sandbox that create() is cheap enough to do per-task (or per-step) instead of nursing a long-lived box — PandaStack restores a baked snapshot on every create at 179ms p50 (~203ms p99), no warm pool, so a fresh computer per task is a latency rounding error rather than a budget line. Here's the smallest useful thing: spin one up, run an untrusted snippet, read the result, destroy it.

from pandastack import Sandbox

# The model wrote this. We do not trust it. That's the point.
model_code = """
import sys, platform
print('hello from', platform.node())
print('python', sys.version.split()[0])
# A model might also try `import os; os.system('rm -rf /')` here.
# In a microVM, that deletes a guest we're about to throw away. Shrug.
"""

# create() restores a baked Firecracker snapshot (~179ms p50), not a cold boot.
with Sandbox.create(template="code-interpreter", ttl_seconds=120) as sbx:
    sbx.filesystem.write("/workspace/cell.py", model_code)
    result = sbx.exec("python3 /workspace/cell.py", timeout_seconds=30)

    print("exit:", result.exit_code)
    print(result.stdout)
# Context manager exits -> the whole VM (kernel and all) is destroyed.

Note what the `with` block buys you: the sandbox is guaranteed dead when the block exits, success or exception. The model's `rm -rf` ran against a guest kernel and a copy-on-write rootfs that existed for the lifetime of one function call and shared nothing with your host. That is the entire safety story in one indentation level.

Step 2: wire it into the agent loop

Now make the sandbox a tool the model can call. Keep a persistent sandbox alive for the duration of a task so state (installed packages, written files, a warm interpreter) carries across steps, and expose a single `run_code` function the agent calls each turn. The pattern below is deliberately framework-agnostic — it's the executor you'd plug into whatever orchestration layer you prefer.

from pandastack import Sandbox

class AgentSandbox:
    """One disposable computer that lives for the duration of a task."""

    def __init__(self, template: str = "code-interpreter"):
        # persistent=True: survives across steps; we kill it ourselves at the end.
        self.sbx = Sandbox.create(template=template, persistent=True, ttl_seconds=900)
        self._step = 0

    def run_code(self, code: str) -> dict:
        """The tool the model calls. Return value is serialized back into the prompt."""
        self._step += 1
        path = f"/workspace/step_{self._step}.py"
        self.sbx.filesystem.write(path, code)
        r = self.sbx.exec(f"python3 {path}", timeout_seconds=60)
        return {
            "exit_code": r.exit_code,
            "stdout": r.stdout[-4000:],   # don't blow the context window on logs
            "stderr": r.stderr[-2000:],
        }

    def close(self):
        self.sbx.kill()


def agent_loop(client, task: str, max_steps: int = 12):
    env = AgentSandbox()
    messages = [{"role": "user", "content": task}]
    try:
        for _ in range(max_steps):
            reply = client.next_action(messages)        # your model call
            if reply.is_final:
                return reply.text
            # The model asked to run code. We run it somewhere it can't hurt us.
            obs = env.run_code(reply.code)
            messages.append({"role": "tool", "content": obs})
        return "hit max_steps without finishing"   # budgets exist for a reason
    finally:
        env.close()   # the computer always gets thrown away

Three things in there matter more than they look. The `try/finally` guarantees the sandbox dies even if your model client throws — orphaned sandboxes are how you discover billing. Truncating stdout/stderr stops a chatty `pip install` from eating your context window (and your token bill). And `max_steps` is the difference between 'the agent gave up gracefully' and 'the agent rediscovered the halting problem at $0.03 per turn.'

Always set a TTL and a step budget. An agent in a retry loop with no ceiling is a denial-of-wallet attack you wrote yourself. The TTL is your backstop for when the process crashes before `finally` runs; the step budget is your backstop for when the model is simply having a bad day.

Step 3: the fork trick — turn one agent into a search

Here's where the microVM model earns its keep in a way containers can't easily match. Agents are unreliable per-attempt but much better in aggregate: generate five candidate fixes, run each, keep the one whose tests pass. The naive way is to spin up five environments and re-run setup five times. The fast way is to warm one environment — dependencies installed, dataset loaded, repo cloned — then fork it N times via copy-on-write and explore each branch in parallel.

A fork clones a running sandbox with copy-on-write: guest memory is shared through MAP_PRIVATE (the kernel copies a page only when a child writes it), and the rootfs is cloned with an XFS reflink — an O(metadata) operation where the data stays shared until something diverges. A same-host fork lands in roughly 400–750ms; cross-host runs 1.2–3.5s (download plus restore). The setup work you did once is inherited by every branch for free.

from pandastack import Sandbox
import concurrent.futures as cf

# 1) Warm ONE environment: clone, install, load the dataset. Pay setup once.
base = Sandbox.create(template="code-interpreter", persistent=True, ttl_seconds=1800)
base.exec("git clone --depth 1 https://example.com/repo /workspace/repo")
base.exec("cd /workspace/repo && pip install -r requirements.txt", timeout_seconds=300)

candidate_patches = generate_n_fixes(n=5)   # 5 different model attempts

def try_patch(patch: str) -> dict:
    # 2) Fork the warm env (~400ms same-host). Each child inherits the install.
    child = base.fork()
    try:
        child.filesystem.write("/workspace/repo/fix.patch", patch)
        child.exec("cd /workspace/repo && git apply fix.patch")
        r = child.exec("cd /workspace/repo && pytest -q", timeout_seconds=120)
        return {"patch": patch, "passed": r.exit_code == 0, "out": r.stdout[-1000:]}
    finally:
        child.kill()   # branches are disposable; the trunk lives on

# 3) Explore all five branches in parallel. Keep the winners.
with cf.ThreadPoolExecutor(max_workers=5) as pool:
    results = list(pool.map(try_patch, candidate_patches))

base.kill()
winners = [r["patch"] for r in results if r["passed"]]
print(f"{len(winners)}/5 candidate fixes passed the test suite")

This is tree-of-thought made physical: each thought gets its own real, isolated machine that branched from a shared warm state, and the dead ends are discarded for the cost of a `kill()`. It's the pattern behind agent rollouts, 'try five fixes and keep the green one,' and speculative exploration — and it's only cheap because forking is copy-on-write, not a full re-provision. The conceptual walkthrough is /blog/snapshot-and-fork-explained; the internals are in /docs/internals/fork-cow.

Choosing the substrate (honest comparison)

The loop above runs on any sandbox with a create/exec/fork API. Where the tools diverge is the stuff underneath. I'll keep specific numbers to PandaStack and describe everyone else in the qualitative terms their own docs use — verify anything load-bearing against each vendor's current page, because isolation backends get swapped and pricing moves monthly.

Isolation model — PandaStack/E2B/Vercel Sandbox: Firecracker microVMs (own guest kernel, KVM). Modal: gVisor (user-space kernel). Northflank: a choice of Kata/Firecracker/gVisor per workload. Fly.io Sprites: widely reported Firecracker-based. For arbitrary agent code, microVM-class isolation clears the bar; gVisor is a meaningful middle; a plain shared-kernel container generally does not.
Self-host — PandaStack: Apache-2.0 core, runs on your own KVM hosts. E2B: Apache-2.0, self-hostable. Daytona: AGPL-3.0. Modal / Vercel Sandbox / Fly Sprites: hosted-only. Northflank: BYOC (proprietary control plane in your cloud — not the same as open-source).
Forking / CoW state — first-class and cheap on the Firecracker designs (PandaStack exposes snapshot + fork as primitives, ~400ms same-host). If best-of-N or tree-search is central to your agent, weight this heavily; it's the capability that's painful to bolt on after the fact.
Cold-start — everyone advertises 'fast.' The only number worth trusting is the one you measure on your template, in your region. PandaStack's design choice (snapshot-restore on every create, no warm pool) is what produces its 179ms p50, but treat every headline figure — including ours — as a hypothesis to benchmark.
Breadth — focused sandbox primitive (cleaner to swap) vs. a platform that also brings managed databases, app hosting, and functions on one substrate (one bill, more coupling). Decide which side of that line your product is on before the feature list seduces you.

The full buyer's-guide version of this — six decision criteria, every vendor, and an honest 'pick the other one when…' — is /blog/best-code-execution-sandboxes. The hosted-vs-self-host fork is /blog/e2b-alternatives. If you already started on OpenAI's hosted Code Interpreter and are outgrowing its Python-only, hosted-only box, /blog/openai-code-interpreter-alternative covers the migration.

The production checklist (the boring stuff that saves you)

The loop is the fun part. These are the parts that decide whether your agent survives contact with real users and real model output:

TTL on every sandbox — the backstop for crashed orchestrators that never reached your cleanup code. Orphaned VMs are silent until the invoice.
Step and token budgets — cap the loop. An unbounded agent is a creative way to spend money on the word 'hmm.'
Truncate observations — cap stdout/stderr you feed back, or one verbose build log poisons the context window and degrades the model.
Egress controls — a sandbox that can reach your internal network is a sandbox that can exfiltrate from it. Default-deny outbound; allowlist what the task needs.
Per-step timeouts — the model will eventually write `while True: pass` or `time.sleep(99999)`. Bound every exec.
Treat exit codes, not vibes, as truth — let the agent observe real failures (non-zero exits, stderr) so it can self-correct, instead of hiding them.
Disposable by default, persistent by exception — reach for a long-lived sandbox only when a task genuinely needs state across steps; otherwise create per task and let it die.

The bottom line

Building a sandboxed AI coding agent comes down to three moves: model the work as a plan → execute → observe → feed-back loop; put the execute step behind a real isolation boundary (a microVM, not a shared-kernel container, because the code is untrusted the moment a model wrote it); and exploit copy-on-write forking to turn a single agent into a parallel search when you need best-of-N. Get those right and the agent's worst day — a confidently-wrong `rm -rf` at 3am — is a destroyed disposable guest and a non-zero exit code the model learns from, not an incident. PandaStack's bet is to make that boundary cheap enough to use everywhere: Apache-2.0 Firecracker microVMs you can self-host, snapshot-restore on every create (179ms p50), and first-class CoW forking (~400ms same-host). Build the loop, benchmark the substrate on your own workload, and never let the model's confidence become your blast radius.

Frequently asked questions

Why does an AI coding agent need a sandbox at all?

Because the moment a language model writes code, that code is untrusted by definition — the model is not an authority you can trust with your filesystem or network. Models hallucinate libraries, misread instructions, and will execute a destructive command with the same confidence as a correct one. A sandbox moves the 'execute' step of the agent loop behind an isolation boundary so a confidently-wrong action destroys a disposable environment instead of your host, your data, or a neighbouring tenant. Running agent code in your own process or a shared-kernel container 'just for the prototype' is the most common way teams turn a model mistake into an incident.

What isolation should I use for an AI agent — containers or microVMs?

For arbitrary, model-generated code, hardware-virtualized microVMs (Firecracker or Kata) are the right default. A container shares the host's single kernel across all tenants, so a kernel bug or container escape compromises the host and every neighbour — acceptable for code you wrote, risky for code a model wrote. A microVM boots its own guest kernel isolated by KVM, so guest code never touches the host kernel and the exposed attack surface is a tiny, well-audited VMM. gVisor (a user-space kernel) is a meaningful middle ground that shrinks the host-kernel surface without a full VM. None of these is 'unbreakable' — you still layer seccomp, a jailer, and egress controls on top — but a shared host kernel is not a boundary to bet untrusted code against.

How do I structure the agent execution loop?

As a four-beat cycle: the model proposes an action (a 'run this code' tool call), you execute it in an isolated sandbox, you observe the result (stdout, stderr, exit code, files), and you feed that observation back into the conversation — repeating until the task is done or a budget is hit. Expose the sandbox as a single run_code tool the model calls each turn, keep a persistent sandbox for the duration of a task so state carries across steps, and wrap cleanup in a try/finally so the sandbox is always destroyed. Critically, enforce a step budget, per-exec timeouts, and a TTL — an unbounded loop is a denial-of-wallet attack you wrote yourself.

What is best-of-N forking and why does it matter for agents?

Agents are unreliable per-attempt but much stronger in aggregate: generate several candidate solutions, run each, and keep the ones that pass. Best-of-N forking makes this cheap. Instead of provisioning N environments and re-running setup in each, you warm one environment (clone the repo, install dependencies, load the dataset) and then fork it N times via copy-on-write — guest memory shared through MAP_PRIVATE and the rootfs cloned with a reflink, so a fork is an O(metadata) operation rather than a full re-provision. Each branch inherits the warm state for free, you explore them in parallel, and you discard the dead ends for the cost of a kill(). On PandaStack a same-host fork is roughly 400–750ms, which is what makes tree-search-style agent rollouts practical.

How fast does creating a sandbox per task actually need to be?

Fast enough that you can afford a fresh, disposable computer per task (or per step) instead of nursing a long-lived one. The create() call sits inside the agent loop and you may hit it dozens of times per trajectory, so its latency directly shapes how the product feels. PandaStack restores a baked Firecracker snapshot on every create at about 179ms p50 (~203ms p99) with no warm pool, which makes per-task creation a rounding error rather than a budget line. Other providers advertise fast startup too — but cold-start is the easiest metric to mis-measure across vendors (warm pool vs. true cold boot, snapshot resume vs. full boot, your region vs. theirs), so benchmark it on your own template and region rather than trusting any headline number, including ours.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free

Written by Ajay Kumar, Founder, PandaStack.