all posts

How to Give Your AI Agent a Sandbox (With Code)

Ajay Kumar··9 min read

To give an AI agent a sandbox, you expose a single tool — call it run_code or run_shell — whose implementation creates an isolated environment, runs the model's code inside it, and returns the output. The model proposes a command, your code runs it in the sandbox, and you feed the result back into the conversation. The only design decision that actually matters is the boundary: that environment should be a microVM with its own kernel, not a subprocess on your host, because the code your agent runs was written by a language model and never reviewed by a human.

This guide walks through the whole thing with runnable Python: the agent loop, the tool schema, the sandbox-backed tool function, the choice between one sandbox per session versus per task, and how to clean up so you do not leak VMs. The pattern is provider-neutral — it works the same with the OpenAI and Anthropic SDKs — so I will show the tool-schema idea once and keep the execution code identical.

The agent loop: model proposes, you run, you feed back

Every code-executing agent is the same four-step loop. Strip away the SDK ceremony and it looks like this:

  1. Send the user's message plus your tool definitions to the model.
  2. The model replies with a tool call — e.g. run_code with a code argument — instead of (or alongside) text.
  3. You execute that call in the sandbox and capture stdout, stderr, and the exit code.
  4. You append the result to the conversation as a tool-result message and call the model again. Repeat until the model returns a final answer with no more tool calls.

The model never touches your machine. It only emits structured JSON describing what it wants to run; you are the one who decides where that runs. That seam is the entire security story, and it is why the sandbox lives behind the tool, not in the prompt.

The model is an untrusted code generator. Treat every tool call the way you would treat a POST body from the public internet: assume it is hostile, run it somewhere disposable, and never let its output decide what you execute next without the same isolation.

Defining the run_code tool (the schema)

A tool definition is just a name, a description the model reads to decide when to call it, and a JSON Schema for the arguments. Here is the shape both providers expect — this exact dict is what you pass as a tool to the Anthropic Messages API; OpenAI uses the same JSON Schema nested one level deeper under a "function" key.

RUN_CODE_TOOL = {
    "name": "run_code",
    "description": (
        "Execute Python code in an isolated sandbox and return its output. "
        "Use this whenever you need to compute, transform data, or test code. "
        "The sandbox has no network egress guarantees and is reset between sessions."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "The Python source to run. Print results you want back.",
            }
        },
        "required": ["code"],
    },
}

Two things in the description earn their keep. First, telling the model to print what it wants back — sandboxes return stdout, so a model that computes a value but never prints it gets an empty result and loops confused. Second, naming the boundary ("isolated sandbox") nudges the model to actually use the tool for risky work instead of trying to reason through it in its head.

The tool function: create a sandbox and exec

Now the implementation. The PandaStack Python SDK reads your PANDASTACK_TOKEN from the environment and talks to https://api.pandastack.ai by default. Sandbox.create restores a baked snapshot — there is no warm pool to manage — and exec runs a command inside the microVM and returns stdout, stderr, and exit_code.

import os
from pandastack import Sandbox

# pip install pandastack ; export PANDASTACK_TOKEN=pds_...

def run_code(code: str) -> str:
    """Execute model-written Python in a fresh microVM, return its output."""
    # ttl_seconds is a backstop: even if cleanup is skipped, the VM self-destructs.
    with Sandbox.create(template="code-interpreter", ttl_seconds=600) as sb:
        result = sb.run_code(code, language="python")
        out = result.stdout
        if result.exit_code != 0:
            out += f"\n[exit {result.exit_code}] {result.stderr}"
        return out.strip() or "(no output — did you forget to print?)"

if __name__ == "__main__":
    print(run_code("print(sum(range(101)))"))  # -> 5050

The code-interpreter template ships a Python and Node scientific stack, so common imports work out of the box. If you want raw shell instead of Python, swap to sb.exec(cmd, timeout_seconds=30) and rename the tool run_shell — the model writes a command string and you run it verbatim. Everything else in the loop is unchanged.

The with statement matters: PandaStack's Sandbox is a context manager that kills the VM on exit (unless you created it persistent). Combined with ttl_seconds, you have two independent guarantees that the VM goes away — the context manager handles the happy path, and the TTL is the backstop for crashes, timeouts, or a process that gets killed mid-call.

Wiring the tool into the agent loop

Here is the full loop with the Anthropic SDK. The dispatch logic — look at the requested tool, run it, append the result — is the part you would write identically against OpenAI's tool-calls; only the message structure differs.

import anthropic

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-5"

def agent(user_request: str) -> str:
    messages = [{"role": "user", "content": user_request}]
    while True:
        resp = client.messages.create(
            model=MODEL,
            max_tokens=2048,
            tools=[RUN_CODE_TOOL],
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason != "tool_use":
            # No more tools requested — return the final text.
            return "".join(b.text for b in resp.content if b.type == "text")

        # Run every tool the model asked for, collect results.
        results = []
        for block in resp.content:
            if block.type == "tool_use" and block.name == "run_code":
                output = run_code(block.input["code"])
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": output,
                })
        messages.append({"role": "user", "content": results})

if __name__ == "__main__":
    print(agent("What is the 30th Fibonacci number? Compute it, don't guess."))

That is a complete, working code-execution agent. The model asks to run code, run_code spins up a microVM and returns the output, you hand the output back, and the model uses it to answer. With the OpenAI SDK you would read resp.choices[0].message.tool_calls instead of content blocks and append a role: "tool" message keyed by tool_call_id — same loop, different field names.

One sandbox per session, or one per task?

The example above creates a fresh sandbox on every tool call (per-task). That is the most isolated option and it is cheap — restores run at about 179ms p50 — but each call starts from a clean slate, so a package the model installed in step one is gone by step two. For many agents that is exactly what you want. For others it is a problem.

  • Per-task (create + kill inside the tool): maximum isolation, no state bleed between calls, zero lifecycle management. Best for independent computations, untrusted multi-tenant workloads, or 'evaluate this expression' tools.
  • Per-session (create once, reuse across the loop): the model can pip install, write a file in step one and read it in step two, or start a server and curl it later. Best for coding agents, data-analysis sessions, and anything stateful. You pay one create instead of N, but you own teardown.

For the per-session model, create the sandbox before the loop and pass it into the tool. Reach for forks if you want a session's state as a starting point for parallel branches — a same-host fork copies memory and disk copy-on-write in roughly 400ms, so you can snapshot a configured environment and fan out cheaply. See the snapshots and forks documentation for that pattern.

# Per-session: one VM for the whole conversation.
sb = Sandbox.create(template="code-interpreter", ttl_seconds=1800)
try:
    def run_code(code: str) -> str:
        r = sb.run_code(code, language="python")
        return (r.stdout + (r.stderr if r.exit_code else "")).strip()
    # ... run the agent loop here, all calls share `sb` ...
finally:
    sb.kill()  # always tear down; ttl_seconds is the backstop

Cleanup: TTL plus a context manager

Leaked sandboxes are the most common way this goes wrong in production — an agent crashes mid-loop and leaves a VM running. Defend in depth with two mechanisms that do not depend on each other:

  • ttl_seconds on create — the platform reaps the VM after the TTL no matter what your process does. This is your safety net for crashes, OOM kills, and forgotten handles. Set it to a generous bound on how long a single session could legitimately run.
  • A context manager (with Sandbox.create(...) as sb) or an explicit try/finally: sb.kill() — this is the fast path that frees the VM the moment you are done, instead of waiting out the TTL.
If you create a sandbox with persistent=True, the context manager will NOT kill it on exit and the idle reaper leaves it alone — that flag is for managed databases and long-lived app hosts, not for per-request agent tools. For agent code execution, leave persistent off and always set a ttl_seconds backstop.

Why a microVM, not a subprocess or container

It is tempting to make run_code a subprocess.run on your own host. Do not. The code came from a language model: it can be wrong, and it can be adversarially steered by anything in the model's context — a malicious file the agent read, a prompt-injected web page, a poisoned tool result. A subprocess shares your kernel, your filesystem, your environment variables, and your network. One os.environ dump or one open('/etc/...') and your secrets are in the model's context.

Containers narrow the surface but still share the host kernel, so isolation rests on the container runtime being perfectly configured and the kernel being free of escape bugs. A Firecracker microVM boots its own guest kernel with hardware-virtualization isolation, plus its own network namespace. The honest trade-off: a microVM has marginally higher create latency than spawning a process and you are calling a remote API instead of forking locally. For code you wrote yourself, that overhead is not worth it — use a subprocess. For code an LLM wrote and you are about to run unattended, the microVM boundary is the whole point, and snapshot-restore keeps the create cost in the low hundreds of milliseconds.

Where to take it next

You now have the core: a tool schema, a sandbox-backed tool function, the agent loop, a session strategy, and cleanup. From here, the natural extensions are giving the model a run_shell tool for arbitrary commands, exposing filesystem.read and filesystem.write so it can work with files you upload, and using hibernate/wake to park idle sessions cheaply and auto-wake them on the next request. The agent template also ships coding-agent CLIs if you want the model driving a full toolchain rather than single snippets. See the sandboxes and SDK reference documentation for the complete surface.

Frequently asked questions

How do I let an AI agent run code safely?

Give the model a tool it can call instead of running anything on your own machine, and back that tool with an isolated sandbox. With PandaStack, your run_code tool creates a Firecracker microVM, executes the model's code or command inside it, and returns stdout/stderr. The microVM has its own guest kernel and network namespace, so even if the model writes a fork bomb or a rm -rf, the blast radius is one disposable VM, not your host.

Should I use one sandbox per agent session or one per task?

Use one sandbox per session when the agent needs state to carry across steps — installed packages, intermediate files, a long-running process. Use one sandbox per task (created and killed inside the tool call) when each call is independent and you want maximum isolation between actions. The session-scoped model is faster because you skip a create per call; PandaStack creates restore from a baked snapshot in about 179ms (p50), so even per-task creation is cheap.

Why use a microVM sandbox instead of a Docker container or subprocess for agent code execution?

A subprocess shares your host kernel and filesystem — model-generated code can read your environment variables, escape the working directory, or exhaust host resources. Containers share the host kernel too, so a kernel exploit or misconfiguration breaks isolation. A Firecracker microVM boots its own guest kernel with hardware-level isolation, which is the right boundary for running code an LLM wrote and you never reviewed.

How does the agent tool-use loop for code execution actually work?

The loop has four steps: (1) you send the user's request plus your tool definitions to the model; (2) the model responds with a tool call like run_code with arguments; (3) you execute that call in the sandbox and capture the result; (4) you append the result to the conversation as a tool-result message and call the model again. You repeat until the model stops requesting tools and returns a final answer. This is identical in shape for both the OpenAI and Anthropic SDKs — only the message field names differ.

Run code in a microVM in one API call.

179ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.