all posts

Run Untrusted MCP Servers in Isolated MicroVMs

Ajay Kumar··8 min read

The Model Context Protocol made it trivial to plug new tools into an AI agent — a GitHub server, a Postgres server, a web-scraper server, a filesystem server. The catalogs now list thousands of them, most written by strangers. And here's the part nobody puts on the install button: an MCP server is a program. When your agent loads it, that program runs on your machine (or your backend) with whatever access the process has — your filesystem, your network, your environment variables. You wouldn't `npm install` a random person's binary and run it as root. But wiring up an untrusted MCP server is exactly that, with a friendlier name.

I'm Ajay, I built PandaStack. This post is about the trust gap in the MCP ecosystem and a concrete fix: run each untrusted or third-party MCP server inside its own Firecracker microVM, so a malicious or compromised server is confined to a throwaway VM instead of sitting in your agent's process with the keys to everything.

The trust problem with the MCP ecosystem

MCP is a protocol, not a sandbox. It standardizes how an agent discovers and calls tools; it says nothing about what those tools are allowed to do to your system. A local (stdio) MCP server is a subprocess your host spawns — it inherits your working directory, your file permissions, and any secrets in the environment. A remote MCP server is an HTTP endpoint someone else operates, and you're trusting both their code and their operational security. In both cases the code path is: model decides to call a tool → an opaque third-party program executes → results come back. You did not write that program, you cannot review it before every call, and it updates out from under you.

The failure modes aren't hypothetical. A tool-description field can carry a prompt injection that hijacks your agent into calling a different tool. A server can quietly read every file in the directory it was launched from and POST it somewhere. A dependency in its supply chain can be compromised between the version you audited and the version that auto-updated. A 'helpful' server can shell out. The more servers you connect — and agents are connecting dozens — the larger the combined attack surface, all of it running with the same blast radius as your own code.

An MCP server's permissions are your process's permissions. If your agent backend can read ~/.aws/credentials, so can every MCP server you've loaded into it. The protocol does not add a boundary — you have to.

Why a container isn't enough

The instinct is to drop the server in a Docker container and call it isolated. A container helps with packaging and resource limits, but it is not a security boundary against code you don't trust: every container on a host shares one kernel. A container escape or a kernel vulnerability turns 'isolated tool' back into 'code running on your host.' That's an acceptable risk for first-party services you wrote. It's the wrong bet for an arbitrary MCP server pulled from a public catalog, because the threat model is exactly 'this code is actively trying to get out.'

A Firecracker microVM is a different category. Each one boots its own guest kernel under hardware virtualization (KVM) — the same VMM AWS Lambda and Fargate use to isolate untrusted tenant code. The MCP server inside it can only touch the outside world through a tiny set of emulated devices. To escape, it would have to break the hypervisor, a far smaller and more heavily audited surface than the full Linux syscall interface a container sees. The blast radius of a malicious server collapses to one disposable VM with its own memory, filesystem, and network namespace.

The microVM-per-MCP-server model

The shape that works: one untrusted MCP server, one microVM. Your agent never imports the server's code into its own process — it talks to the server over the protocol while the server runs inside a sandbox. If the server is malicious, it owns a VM that holds nothing but its own session and gets destroyed when the session ends. Pick the boundary that matches your trust model:

  • Per-server — one VM per distinct MCP server. Good when servers are independently untrusted but a single user owns the session: the web-scraper server can't read what the github server saw.
  • Per-tenant — one VM (or one set) per end user. Essential for multi-tenant products: user A's MCP traffic never shares a kernel or filesystem with user B's.
  • Per-session, disposable — create the VM when the session starts, kill it when it ends. No state survives, so a compromise can't persist or wait for the next user.
  • Per-call, for the truly hostile — fork a clean baked VM for a single tool call and throw it away. Highest isolation, made affordable by copy-on-write forks.

The historical objection to VMs was startup cost — nobody wants to wait seconds to boot a kernel for one tool call. PandaStack removes that: every create restores a baked snapshot on demand instead of cold-booting, so a sandbox comes up in 179ms p50 (about 203ms at p99), with the restore step itself near 49ms. The first cold boot of a brand-new template is around 3 seconds, but after that it's snapshot-restore. That's what makes per-session — even per-call — VM isolation practical instead of a thought experiment.

Don't forget egress: the network is the exfil path

Filesystem isolation is the obvious half. The half people skip is the network. A malicious MCP server's whole goal is usually to get data out — read your repo, your DB rows, your env, then phone home. If the sandbox has open outbound internet, you've isolated the filesystem and left the exfiltration door open. Treat egress as part of the boundary, not an afterthought.

Each PandaStack sandbox runs in its own network namespace behind a per-VM NATID slot (an agent pre-allocates 16,384 /30 subnets), so its traffic is isolated and attributable rather than mixed into a shared bridge. From there you control egress at the network layer: deny outbound by default and allow only the hosts a given server legitimately needs — the GitHub API for a github server, nothing for a pure-compute server. The rule of thumb: a tool that has no business reaching the internet shouldn't be able to, and you enforce that below the code, not by trusting the code.

Two boundaries, not one: the microVM contains what the server can read and run; egress policy contains what it can send out. A server that can read everything but can't talk to anyone is a much smaller problem than one that can do both.

Spinning up a microVM to host an MCP server

Here's the concrete pattern with the Python SDK. Create a disposable sandbox, write the untrusted server's code (or install it) inside the guest, and launch it there. Set PANDASTACK_API_KEY in your environment and the SDK picks it up. The Sandbox context manager kills the VM on exit, so a compromised server doesn't outlive its session.

from pandastack import Sandbox

# A throwaway weather MCP server we pulled from a public catalog and do NOT trust.
server_code = '''
import sys
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("untrusted-weather")

@mcp.tool()
def get_forecast(city: str) -> str:
    # Whatever this does, it runs inside a microVM, not our agent process.
    return f"forecast for {city}: sunny"

if __name__ == "__main__":
    mcp.run(transport="sse", host="0.0.0.0", port=8000)
'''

# One untrusted server, one disposable VM. Killed on block exit.
with Sandbox.create(template="base", ttl_seconds=900) as sbx:
    sbx.filesystem.write("/workspace/server.py", server_code)
    sbx.exec("pip install 'mcp[cli]'", timeout_seconds=120)

    # Launch the server detached inside the guest; logs to a file we can tail.
    sbx.exec(
        "setsid python3 /workspace/server.py "
        "> /var/log/mcp.log 2>&1 < /dev/null &",
        timeout_seconds=10,
    )
    # Your agent now connects to this server's SSE endpoint over the
    # sandbox's preview URL — never importing its code into your process.
    print("untrusted MCP server is running inside its own microVM")

The agent reaches the server over the sandbox's per-VM preview URL (`https://<port>-<id>.<suffix>`), so the protocol traffic flows in and out without the server's code ever touching your agent's process. If the server turns out to be malicious, the damage is scoped to a VM that holds one session's data and disappears when the block exits or the TTL fires.

Fork a clean baked server per call

For a server you consider actively hostile, give it zero continuity between calls. Configure and snapshot a known-good sandbox once, then fork that snapshot for each tool call and destroy the fork after. A same-host fork is roughly 400 to 750ms and shares memory copy-on-write, so every call starts from an identical clean state with nothing carried over from the last one.

from pandastack import Sandbox

# Created and configured once: install the untrusted server, then snapshot.
base = Sandbox.create(template="base", persistent=True)
base.exec("pip install 'mcp[cli]'", timeout_seconds=120)
base.filesystem.write("/workspace/server.py", server_code)  # from above
snap = base.snapshot()

def handle_tool_call(payload: dict) -> dict:
    # Fork a pristine VM for this single call; no state survives it.
    sbx = snap.fork(ttl_seconds=120)
    try:
        sbx.filesystem.write("/workspace/input.json", payload["json"])
        r = sbx.exec("python3 /workspace/run_once.py", timeout_seconds=30)
        return {"exit_code": r.exit_code, "stdout": r.stdout, "stderr": r.stderr}
    finally:
        sbx.kill()  # the only state this server ever had is now gone

Cross-host forks (when the parent isn't local) land in the 1.2 to 3.5 second range because they pull the snapshot over the network first — fine for cold scaling, but for hot per-call isolation you want same-host forks. Either way, the model server can't accumulate state, can't poison the next caller, and can't persist a foothold.

When you actually need this (and when you don't)

If every MCP server you connect is first-party code your team wrote and audits, a microVM per server is overkill — run them in-process and move on. Reach for the per-VM model the moment a server is third-party, auto-updating, or handling other people's data: a marketplace of community MCP servers, a multi-tenant agent product where each customer brings their own tools, or any 'just try this server from the catalog' workflow. That's the line PandaStack is built on — every sandbox is its own Firecracker microVM, created in ~179ms via snapshot-restore, so VM-grade isolation for an untrusted MCP server costs you almost nothing in latency and gives you back the thing the protocol left out: a boundary.

Frequently asked questions

Why is running a third-party MCP server dangerous?

An MCP server is a program that runs with your process's permissions — your filesystem, network, and environment variables. If you load it into your agent's process, a malicious or compromised server can read your files, exfiltrate secrets, or be hijacked via a prompt injection in a tool description. The protocol standardizes tool calls but adds no isolation, so the trust boundary is on you.

Isn't a Docker container enough to isolate an MCP server?

A container helps with packaging and resource limits but isn't a security boundary against code that's actively trying to escape: all containers share the host kernel, so a container escape or kernel bug compromises the host. For untrusted third-party MCP servers, use a hardware-isolated Firecracker microVM, which boots its own guest kernel under KVM and contains a malicious server to one disposable VM.

How do I run an untrusted MCP server in a microVM?

Create a disposable sandbox, write or install the server's code inside the guest, and launch it there — your agent talks to it over the protocol without importing its code. With PandaStack you create a sandbox on the base template, write the server with filesystem.write, run it with exec, and connect over the per-VM preview URL. The VM is killed on session exit, so a compromised server doesn't persist.

How do I stop an MCP server from exfiltrating data?

Control egress at the network layer, not by trusting the code. Each PandaStack sandbox runs in its own network namespace behind a per-VM NATID slot, so you can deny outbound by default and allow only the hosts a given server legitimately needs. A server that can read data but can't reach the internet is a far smaller problem than one that can do both.

Doesn't booting a VM per MCP server add too much latency?

Not with snapshot-restore. PandaStack restores a baked snapshot on every create rather than cold-booting, so a sandbox comes up in about 179ms (p50), with the restore step near 49ms. For per-call isolation of a hostile server, same-host copy-on-write forks land around 400 to 750ms. That makes per-session and even per-call microVM isolation practical instead of a thought experiment.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.