all posts

Browser Isolation in MicroVMs: Headless & Remote

Ajay Kumar··10 min read

A web browser is the largest piece of attack surface most teams run on purpose. It's a JIT compiler, a font renderer, an image decoder, a JavaScript VM, a PDF engine, a video pipeline, and a network stack — all of it executing code that arrives, unreviewed, from whatever URL you pointed it at. When you load a page you are running someone else's program on your machine. Every tab is someone else's code; now imagine the thing deciding which links to click is a language model navigating the open web on inputs you don't control. That is the problem this post is about: why you put a real browser — headless Chrome, Puppeteer, Playwright — inside a microVM, and why the right unit of isolation is one browser per fresh, disposable VM per session.

Two use cases drive almost all of this. The first is Remote Browser Isolation (RBI): render risky web content somewhere far away from the user's laptop and stream back only pixels or a sanitized DOM, so a drive-by exploit lands on an expendable box instead of inside the corporate network. The second is AI agents that browse and automate the web — clicking, filling forms, downloading files, executing whatever JavaScript a page ships — where the navigation target is chosen at runtime by a model and is, by definition, untrusted. Both want the same thing: a hardware-isolated, throwaway environment per browsing session. If you haven't read the general case for this pattern, /blog/how-to-sandbox-untrusted-code is the broader map; this is the browser-shaped version of it.

The threat model: a browser is arbitrary remote code execution by design

Most untrusted-code stories start with "what if the user submits something malicious?" With a browser the answer is: that's not the edge case, that's the normal case. Loading any modern web page means fetching and executing JavaScript, parsing untrusted HTML and CSS, decoding untrusted images and fonts, and often running WASM — across millions of lines of C++ in the rendering engine. The browser is a sandbox itself (Chrome's site-isolation and renderer sandbox are genuinely excellent), but it is also a perennial target: renderer RCE plus a sandbox-escape chain is a recurring class of in-the-wild exploit, which is exactly why browsers ship security patches on a near-weekly cadence.

So treat the browser process the way you'd treat any program running attacker-controlled input. The threats line up cleanly:

  • Host compromise — a renderer exploit chained to a browser sandbox escape runs code on the box hosting the browser. If that box is your application server, you've lost it.
  • Cross-session contamination — on a shared machine, one hijacked browsing session reaching the host reaches every other tenant's session, cookies, and cached credentials. The blast radius is everyone using the same host.
  • Data exfiltration — even with no escape at all, a malicious page (or a prompt-injected agent) can phone home: post secrets it found, beacon out collected data, or abuse the browser's network access to reach internal services.
  • Malicious downloads — the browser will happily write an attacker's payload to disk. A download is just a file the page asked you to save; whether it then executes is about what shares that filesystem.
  • Resource abuse — a coin-miner in a hidden iframe, a memory-balloon page, a fork bomb of headless tabs. One session shouldn't be able to starve the fleet.

The browser's own sandbox is real defense-in-depth, but it sits on the host kernel like any other process. For untrusted navigation at scale — and an agent clicking model-chosen links is the most untrusted navigation there is — you want a second, hardware-enforced boundary underneath it. That's the microVM.

Why a microVM is the right boundary

A microVM gives each browser its own guest kernel inside CPU hardware virtualization (KVM). Instead of the browser process sharing the host's full Linux syscall surface with everything else on the machine, an escape now has to break the browser sandbox AND its own guest kernel AND the hypervisor boundary before it touches your host. Firecracker — the open-source VMM AWS built for Lambda and Fargate — keeps that hypervisor surface deliberately tiny: a minimal virtio device model, a jailer that drops privileges and applies per-thread seccomp filters, and a Rust codebase. A container, by contrast, shares one kernel across every browser you run; that's a strong isolation mechanism but a weak security boundary for code this hostile (the full argument lives in /blog/what-is-a-microvm).

The classic objection to "a VM per browser session" was always cost: nobody wants to wait ten seconds for a VM to boot before a page can load. That objection is dead. PandaStack restores a baked microVM snapshot on every create — no warm pool of idle VMs — with a p50 of 179ms (about 203ms p99) to a live, isolated guest. The very first spawn of a fresh template cold-boots in roughly 3 seconds and bakes a snapshot; every create after that takes the fast path. When a clean, hardware-isolated browser environment costs you ~179ms, giving every RBI session and every agent task its own disposable VM stops being a luxury and becomes the obvious default.

Downloads and in-page JavaScript are the two things people forget to contain. A headless browser will execute every script a page ships and will write any file a page hands it — that's not a misconfiguration, it's the browser doing its job. Inside a disposable microVM that's fine: the malware runs and the file lands on a filesystem you're about to delete. The danger is running the browser anywhere that download or that script can touch a filesystem, secret, or network you actually care about. Isolate first; let the page misbehave second.

One browser, one microVM, one disposable session

The clean architecture is a single browser per microVM, with a fresh VM created for each session and destroyed when the session ends. Ephemerality is the whole point: no cookies, no localStorage, no cached credentials, no half-written download, and no resident malware survive into the next session, because there is no next session on that VM — it's gone. This is the same one-environment-per-task discipline that makes any untrusted-code platform safe, applied to browsing. Always attach a TTL so an abandoned or runaway browsing VM reaps itself instead of lingering.

Per-session network isolation is the other half. A browser's reason for existing is to make network requests, so "contain the exploit" isn't enough — you also have to bound where the page (or the agent) can reach. PandaStack's NATID networking gives every sandbox its own Linux network namespace, veth pair, and tap device, so egress control is per-session by construction: default-deny outbound with an allowlist for exactly the sites a task needs, the cloud metadata endpoint (169.254.169.254) unreachable from inside, and no host credentials injected into the guest. That's how you stop the quiet exfiltration that survives even a perfect isolation boundary. For agents specifically, where the navigation target is adversarial by assumption, /blog/secure-code-execution-for-ai-agents covers the locked-down-egress pattern in depth.

The mental model: one browsing session per VM, never two. Isolation is per-VM, so the boundary is only as good as how rarely you reuse a VM across trust domains. Two different users' sessions in the same browser VM share cookies, cache, and a filesystem — that defeats the isolation you paid for. RBI and multi-tenant scraping both mean fresh-per-session.

The fork angle: warm a browser once, fork it N times

Here's where microVMs do something containers can't easily match. Booting a browser and navigating to a logged-in starting state is expensive — Chrome startup, profile load, authentication, the target page rendered and idle. If you need to run that same warm state across many parallel sessions (best-of-N agent attempts on the same page, fan-out scraping from one authenticated entry point, screenshotting a page across hundreds of viewport sizes), you don't want to repeat that setup N times.

Instead: get one sandbox to the warm browser-and-page state, snapshot it, then fork. A fork is a copy-on-write clone — guest memory is shared via MAP_PRIVATE and the rootfs via XFS reflink, so each child diverges only on the pages it actually writes. A same-host fork lands in roughly 400–750ms; a cross-host fork (when you're spreading load across agents) is about 1.2–3.5s. Every child inherits the live browser process, the loaded page, and the authenticated session, then explores independently — and because each fork is its own microVM, a malicious page in one branch can't touch its siblings. /blog/snapshot-and-fork-explained walks through the copy-on-write mechanics.

What it looks like with PandaStack

PandaStack ships a first-party browser template — a microVM baked with headless Chromium and the automation stack, sized at 4 GiB of RAM and 4 vCPUs (browsers are hungry; this is the baked guest size for that template). You create a sandbox on it, drive the browser with exec, and read results back through the filesystem API. The pattern below writes a Playwright script into the guest and runs it; the same shape works for Puppeteer or for driving headless Chrome directly over the DevTools protocol.

from pandastack import Sandbox

# A Playwright script we don't necessarily trust the *target* of — the URL
# might be chosen by a model or come from a user. Run it in a throwaway VM.
script = """
import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()  # headless by default
        page = await browser.new_page()
        await page.goto("https://example.com", wait_until="networkidle")
        title = await page.title()
        await page.screenshot(path="/workspace/shot.png", full_page=True)
        print("title:", title)
        await browser.close()

asyncio.run(main())
"""

# One hardware-isolated microVM per browsing session (~179ms to create,
# browser template is 4 GiB / 4 vCPU), auto-reaped on exit.
with Sandbox.create(template="browser", ttl_seconds=300) as sbx:
    sbx.filesystem.write("/workspace/run.py", script)
    result = sbx.exec("python3 /workspace/run.py", timeout_seconds=60)
    print(result.stdout, result.exit_code)

    # Pull the screenshot back out as bytes — the page ran in a VM you're
    # about to delete, so any drive-by exploit or download dies with it.
    png = sbx.filesystem.read("/workspace/shot.png")
    with open("shot.png", "wb") as f:
        f.write(png)
# Context manager destroys the VM here — cookies, cache, downloads all gone.

For Remote Browser Isolation you'd keep the session alive and stream it: the browser renders inside the VM and the user sees pixels (or a sanitized DOM) over a connection, never touching the live page directly. PandaStack's tokenless preview URLs make exposing a port from the guest trivial — anything the browser serves on a port is reachable at `<port>-<id>.<suffix>`, where the sandbox UUID is the credential, for the lifetime of that VM. The SDK reads PANDASTACK_API_KEY from the environment, and the same flow exists in the TypeScript SDK (@pandastack/sdk) and the pandastack CLI. PandaStack's core is Apache-2.0 and self-hostable on your own Linux KVM hosts, so the browsers run on infrastructure you control.

Where browser-in-a-microVM pays off

The pattern shows up anywhere a browser meets content you didn't write:

  • Remote Browser Isolation (RBI) — render risky links, email attachments, and unknown sites in a disposable VM and stream pixels to the user, so malware never reaches the endpoint or the network behind it.
  • Scraping and crawling at scale — fan out across many isolated browsers, each with its own egress controls, IP context, and clean profile; a hostile target page can't poison the others or the orchestrator.
  • AI agent web tasks — give the model a browser it can drive, where every model-chosen navigation, click, form-fill, and download is contained to a throwaway VM with locked-down egress.
  • Screenshot and PDF rendering services — turn a URL into an image or PDF on demand; you're rendering untrusted pages all day, so each render gets a fresh, expendable browser.
  • Ad-tech and security analysis — detonate suspicious URLs, ad creatives, and phishing pages in instrumented browsers to observe behavior safely, then discard the VM with whatever it picked up.
  • Automated end-to-end testing — spin a clean, identical browser per test run with no leftover state between runs, and fork a warmed-up logged-in session to parallelize a suite.

The common thread is that the browser is doing exactly what browsers do — executing remote code on remote content — and you've simply moved the blast radius. A renderer exploit, a malicious download, a prompt-injected agent that decides to exfiltrate: the worst outcome is a deleted microVM you were going to throw away anyway. For a wider comparison of execution-sandbox options, /blog/best-code-execution-sandboxes lines them up; for browsers specifically, the calculus is straightforward — one giant attack surface, one disposable hardware boundary, one session at a time.

Frequently asked questions

Why run a headless browser inside a microVM instead of a container?

A browser executes arbitrary remote JavaScript and decodes untrusted HTML, images, fonts, and WASM on every page — it's one of the largest attack surfaces you can run, and renderer-plus-sandbox-escape exploits are a recurring in-the-wild class. A container shares the host's Linux kernel with every other container, so a browser exploit reachable through a syscall can defeat the boundary. A microVM gives each browser its own guest kernel under hardware virtualization (KVM), so an escape must also break the hypervisor — a meaningfully stronger, hardware-enforced boundary. With sub-second microVM creation (PandaStack's is ~179ms p50), a disposable VM per browsing session is practical rather than a luxury.

What is Remote Browser Isolation (RBI) and how do microVMs help?

Remote Browser Isolation renders risky web content away from the user's device — in a remote, expendable environment — and streams back only pixels or a sanitized DOM, so a drive-by exploit or malicious download lands on the remote box instead of the endpoint. MicroVMs make each RBI session a fresh, hardware-isolated, disposable VM: one browser per VM, destroyed when the session ends, with per-session network egress controls. Because creation is sub-second, you can hand every session its own VM and reap it afterward, leaving no cookies, cache, downloads, or resident malware behind.

How should an AI agent browse the web safely?

Treat every navigation as untrusted, because the target URL is chosen by the model at runtime on inputs you don't control, and prompt injection can turn a benign task into an exfiltration attempt. Run the agent's browser inside its own ephemeral microVM with default-deny network egress (allowlist only what the task needs), no host credentials in the guest, the cloud metadata endpoint unreachable, and a TTL. Every model-chosen click, form-fill, download, and script execution is then contained to a throwaway VM. Use a fresh VM per task — or fork a warmed browser session — rather than reusing one across tasks or users.

Can I run Puppeteer or Playwright in a PandaStack sandbox?

Yes. PandaStack ships a first-party browser template — a microVM baked with headless Chromium and the automation stack, sized at 4 GiB of RAM and 4 vCPUs. You create a sandbox on the browser template, write your Puppeteer or Playwright script into the guest with the filesystem API, run it with exec, and read artifacts like screenshots or PDFs back out as bytes. You can also drive headless Chrome directly over the DevTools protocol, or expose a port via a tokenless preview URL to stream the live session.

How do I run many parallel browser sessions from one warm state?

Get one sandbox to the warm state you want — browser launched, page loaded, session authenticated — then snapshot and fork it. A fork is a copy-on-write clone: guest memory is shared via MAP_PRIVATE and the rootfs via XFS reflink, so each child only diverges on what it writes. A same-host fork lands in roughly 400–750ms (cross-host is about 1.2–3.5s). Every child inherits the live browser and loaded page, then explores independently, and because each fork is its own microVM, a hostile page in one branch can't reach its siblings — ideal for best-of-N agent attempts, fan-out scraping, or multi-viewport screenshots.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.