Best Open-Source Sandboxes for Running Untrusted Code

Ajay Kumar·June 17, 2026·12 min read

If you're running untrusted code — LLM output, user submissions, CI jobs — and you've decided the answer has to be open-source and self-hostable, the field is smaller and more honest than the hosted-sandbox market. There's no marketing layer between you and the isolation primitive: you're choosing the primitive itself, plus how much platform you're willing to build on top of it. This post is a roundup of the realistic open-source options, characterized by two things that actually matter — the OSS license and the isolation model — rather than ranked into a leaderboard that ignores your workload.

The set splits into two layers. There are isolation building blocks — gVisor, Kata Containers, Firecracker itself, and the lighter OS-level tools nsjail and bubblewrap — which are open-source components you wire into your own system. And there are sandbox platforms built on top of those blocks — PandaStack and microsandbox — which give you an API, a guest agent, networking, and lifecycle so you're not assembling the whole thing from scratch. Knowing which layer you want is the first decision, so we'll start there.

I'm the founder of PandaStack, so read this as a vendor's roundup and weight it accordingly. I keep it honest the only way that works: I cite specific numbers (latency, license, fork times) only for PandaStack, and I describe every other project in general, qualitative terms drawn from its own license and docs rather than inventing internals or quoting figures I can't stand behind. For anything load-bearing to your decision, confirm against each project's own repository and documentation — open-source projects move fast and this landscape shifts.

First, decide which layer you're buying

An isolation building block is a mechanism: it isolates a process or a VM and stops there. You still have to write the thing that creates sandboxes on demand, injects code, captures output, manages networking, cleans up, and exposes all of it as an API your application calls. That's a real system, and for many teams it's months of work plus ongoing maintenance — but it's also the most flexible path, and if your needs are narrow (run one binary in a jail, nothing more) it can be the simplest.

A platform takes that work off you. It wraps an isolation block in lifecycle management, a guest agent for exec and filesystem operations, per-sandbox networking, and a client SDK. You trade some control for not building plumbing. The catch is that a platform encodes opinions — about boot path, persistence, networking — that a raw building block leaves open. If those opinions match your workload, a platform saves you enormous effort; if they don't, the raw block plus your own glue may fit better. Decide which trade you want before comparing anything, because it changes the whole shortlist.

The isolation model, because that's the whole point

When the code is untrusted, isolation strength is the dimension that matters most, and the open-source options span the full range. It's worth being precise about what each model actually protects, because the word 'sandbox' is applied loosely across all of them. We cover this in depth in /blog/code-isolation-hierarchy; here's the short version, in increasing order of strength.

OS-level process jails (namespaces, cgroups, seccomp, capabilities): the lightest tools live here. They confine a process but every process still calls into the one shared host kernel. A kernel bug reachable from a guest syscall is a potential host escape. Cheap and fast, appropriate for trusted-ish or heavily-constrained code — riskier for arbitrary code you didn't write.
User-space kernel (gVisor): a second kernel implemented in user space intercepts guest syscalls so they never hit the host kernel directly, shrinking the host-kernel attack surface without a full VM. A genuine step up from raw containers, with compatibility and performance trade-offs that depend on the workload.
Hardware-virtualized microVMs (Firecracker, Kata): each sandbox gets its own guest kernel and is isolated by hardware virtualization (KVM). Guest code never touches the host kernel directly; the exposed surface is the much smaller, much better-audited virtual machine monitor. This is the right default for arbitrary untrusted code — see /blog/what-is-a-microvm and /blog/firecracker-vs-docker.

Don't over-read 'microVM' as 'immune.' Hardware virtualization is a much stronger boundary than a shared kernel, and the VMM attack surface is far smaller than the full Linux syscall interface a container shares — but it is not zero. VMMs have had bugs; KVM has had bugs. The honest claim is 'dramatically smaller, better-audited attack surface,' not 'unbreakable.' Defense in depth (a jailer, seccomp, dropped privileges, network egress controls) still matters on top of the VM boundary.

nsjail and bubblewrap: the lightweight OS-level tools

nsjail and bubblewrap are the honest 'lightest tool that could possibly work' answers, and it's worth being clear about what they are so you don't reach for them where they don't fit. Both are open-source process-isolation tools that lean on Linux namespaces, cgroups, seccomp filters, and capability dropping to confine a process. bubblewrap is the unprivileged-container sandbox that underpins Flatpak; nsjail is a configurable process jail used widely for CTF infrastructure and constrained execution of single binaries.

Their strength is exactly their smallness: no VM, negligible overhead, trivial to drop in front of a single command. Their limitation is the shared kernel — they reduce what a confined process can reach, but they don't put a hardware boundary between guest code and the host. For running one well-understood binary under tight constraints, they're often the right and proportionate choice. For running arbitrary, adversarial code — the LLM-writes-and-executes-anything case — a shared-kernel jail is a weaker boundary than a microVM, and you should treat it as such. They're tools, not a platform: there's no API, no lifecycle, no networking model, no guest agent. You build all of that yourself, or you don't need it. We walk through where each tier fits in /blog/how-to-sandbox-untrusted-code.

gVisor: the user-space kernel

gVisor is Google's open-source application kernel: a runtime that intercepts a sandboxed process's syscalls in user space (the Sentry) rather than passing them straight to the host kernel. It plugs into the container ecosystem as an OCI runtime, so you can run it under existing container tooling. It's a building block — an isolation mechanism you adopt, not a hosted product or a full platform — and several hosted sandbox services use it as their backend.

The trade-off is well-understood: gVisor narrows the host-kernel attack surface meaningfully versus a plain container, because guest syscalls are mediated by the Sentry instead of hitting the host directly. The cost is compatibility and performance — re-implementing the syscall surface means some workloads run slower or hit unimplemented corners, and the right answer depends on what your code actually does. It sits between OS-level jails and full microVMs on the strength curve: stronger than a namespace jail, a different bet than a hardware-virtualized VM. If you're already container-native and want a sizeable isolation upgrade without standing up KVM hosts, gVisor is a reasonable open-source choice to evaluate against your workload.

Kata Containers: microVMs with a container interface

Kata Containers is the open-source project that runs each container or pod inside a lightweight VM with its own guest kernel, presenting a standard container/OCI interface on top. It's a CNCF-ecosystem building block that integrates with Kubernetes via the CRI, so you can get microVM-class hardware isolation while keeping the container developer experience and orchestration you already run.

That's its defining strength and its defining constraint at once. If your world is already Kubernetes and you want to harden specific untrusted workloads to a hardware boundary without rewriting how you ship them, Kata is the natural fit — it slots into the runtime layer. If you're not running Kubernetes, you're adopting a substantial orchestration dependency to get the isolation, which may be more machinery than the problem warrants. Like gVisor, it's a mechanism: it gives you the VM boundary, not a sandbox API, a snapshot/fork model, or a guest agent for exec. You bring the platform; Kata brings strong isolation underneath it.

Firecracker itself: the VMM under everything

Firecracker is the open-source virtual machine monitor (written in Rust) that a large share of this market is built on — including PandaStack, and several hosted sandbox providers. It boots minimal microVMs with a deliberately tiny device model (just network, block, and vsock virtio devices) and runs under a jailer that drops privileges, so the surface exposed to guest code is small and the host is well-protected compared to sharing the full Linux syscall interface. That minimalism is the whole design thesis, and it's why it's become the default substrate for untrusted-code execution at scale.

But Firecracker is a VMM, not a sandbox platform. On its own it gives you a microVM and the snapshot primitives; it does not give you on-demand create from an API, a guest agent, per-sandbox networking, copy-on-write forking as a product feature, or a fleet model across hosts. Building a usable sandbox service directly on Firecracker means writing all of that yourself — it's exactly the work the platforms in this list have already done. If you want maximum control and have the engineering appetite, raw Firecracker is the most flexible foundation there is. If you'd rather not rebuild the plumbing, that's the argument for a platform on top of it. We cover running it locally in /blog/run-firecracker-on-mac.

microsandbox: a self-hosted microVM platform

microsandbox is an open-source, self-hosted sandbox platform aimed squarely at running agent and untrusted code in microVMs. It uses libkrun for hardware-isolated microVMs, can run OCI container images inside them, and ships with MCP integration so it slots into agent toolchains. Unlike the raw building blocks above, it's a platform — the point is that you self-host the whole thing and get sandbox lifecycle rather than wiring up a VMM yourself.

It's a genuinely aligned option if your priority is microVM isolation, self-hosting, and running existing container images as the unit of execution, with an agent-friendly interface out of the box. Evaluate it on the axes that distinguish platforms: the boot/create path and its latency, whether it offers snapshot and fork semantics, the networking model, and how broad the surrounding feature set is. Those are the same axes that separate PandaStack from a building block, and they're where two microVM platforms can diverge sharply even though both deliver hardware isolation — so test the specific behavior your workload depends on rather than stopping at 'both are microVMs.'

PandaStack: an OSS Firecracker core plus a full platform

PandaStack is our project, so here's where I'm allowed to be specific. The core is open-source under Apache-2.0 and is built to self-host on your own Linux KVM hosts — anything with /dev/kvm. You run the control-plane API and a per-host agent; sandboxes execute entirely on your infrastructure. (There's a hosted offering too, but self-host is a first-class path: the same binaries, the same agent, the SDK base URL configurable so identical code points at either.) Every sandbox is a Firecracker microVM with its own guest kernel (5.10, Ubuntu 24.04 guest), isolated by KVM — not a shared-kernel container.

The differentiator versus the raw building blocks above is that PandaStack is the platform — the lifecycle, networking, snapshot, and fork work already done on top of the OSS Firecracker core. Concretely, on that substrate:

Snapshot-restore on every create, no warm pool: there's no fleet of idle VMs waiting. Every create restores a baked Firecracker snapshot — a booted kernel, a running guest agent, an open network stack — so 'start' is really 'restore memory and resume.' That lands at 179ms p50 (p99 ~203ms). Only the first-ever spawn of a brand-new template does a real cold boot (~3s) and bakes the snapshot; every create after is on the fast restore path. Details in /docs/internals/snapshot-restore.
First-class copy-on-write forking: a fork clones a running sandbox via CoW — guest memory shared through MAP_PRIVATE (pages copied only on write), rootfs cloned with an XFS reflink (O(metadata), data shared until written). A same-host fork completes in about 400ms; cross-host (download plus restore) runs 1.2–3.5s. Warm one environment to a known state, then fork it N times to branch in parallel without re-running setup. See /docs/concepts/snapshots-and-forks and /docs/internals/fork-cow.
Per-sandbox network isolation (NATID): each sandbox gets its own Linux network namespace, veth pair, and tap device, drawn from 16,384 pre-allocated /30 subnets per agent — so egress is isolated per sandbox, not shared across a host. See /docs/concepts/networking-natid.
Optional UFFD memory streaming: an agent can page guest memory on demand from object storage (HTTP Range GET, 4 MiB chunks, zero-page elision, a prefetch trace, and a shared per-host chunk cache) so it boots without downloading the whole memory image first. Documented in /docs/internals/streaming-restore.
A platform around the sandbox: managed PostgreSQL 16, git-driven app hosting with scale-to-zero, serverless functions with cron, and durable volumes — all on one microVM substrate, with Python (pandastack), TypeScript (@pandastack/sdk), and CLI clients.

The honest counterweight is the same one that applies to every self-hosted option here: self-hosting is real operational weight. You're running KVM hosts, an agent fleet, networking, and snapshot storage. If you don't have an infra team or the appetite for one, a hosted-only provider is genuinely less work — and that's a legitimate reason not to self-host anything, ours included. For the broader hosted-vs-self-host landscape, see /blog/e2b-alternatives and /blog/self-hosted-code-execution-sandbox.

How to choose among them

Being an honest broker means saying plainly when something other than PandaStack is the right call. Map your situation to the option, not the other way around:

Pick nsjail or bubblewrap when you need to confine a single, well-understood binary with minimal overhead and no platform — and you accept the shared-kernel boundary for that specific, constrained case.
Pick gVisor when you're container-native, want a real isolation upgrade over plain containers without standing up KVM hosts, and your workload tolerates a re-implemented syscall surface.
Pick Kata Containers when you already run Kubernetes and want to harden specific workloads to a hardware boundary while keeping your existing orchestration and container interface.
Pick raw Firecracker when you want maximum control over the substrate and have the engineering capacity to build the lifecycle, networking, and fork plumbing yourself.
Pick microsandbox when your priority is a self-hosted microVM platform that runs OCI images with an agent-friendly interface, and its boot/networking/feature model fits your workload.
Pick PandaStack when you want an open-source (Apache-2.0) Firecracker platform you can self-host with snapshot-restore-on-create, first-class CoW forking, per-sandbox network isolation, and managed services on one substrate — and you want the platform work already done.

Notice the through-line: for arbitrary untrusted code, the building blocks worth shortlisting cluster around hardware-virtualized microVMs (Firecracker, Kata, libkrun-backed platforms) and the user-space-kernel option (gVisor). The OS-level jails are the right tool for narrower, more trusted cases. Where the microVM options diverge is everything above the isolation boundary — boot path, fork semantics, networking model, and how much platform comes with it — which is exactly where you should focus your evaluation, because the isolation strength is roughly the thing they agree on. We use that lens throughout /blog/code-isolation-hierarchy.

Don't pick from this post — or any roundup, including the ones written by the projects themselves — on the strength of a description. Licenses change, isolation backends get swapped, and 'microVM' covers a wide range of real behavior. Verify each candidate's current license in its own repository, confirm its isolation model in its own docs, and run a short spike against your actual workload: create a sandbox, run your real code, exercise the network and persistence behavior you depend on, and measure it in your own environment. An afternoon of hands-on testing settles more than a week of reading comparison pages.

The bottom line

The open-source set for isolating untrusted code is honest and layered. nsjail and bubblewrap are lightweight OS-level jails for confining a single binary. gVisor is a user-space kernel that upgrades container isolation. Kata Containers brings microVM hardware isolation to the Kubernetes container interface. Firecracker is the minimal Rust VMM under much of the market — maximum control if you'll build the platform yourself. microsandbox and PandaStack are platforms that do that work for you on a microVM core. PandaStack's specific bet is an Apache-2.0 Firecracker core wrapped in a full platform — snapshot-restore on every create, copy-on-write forking, per-sandbox networking, and managed services — that you can run end-to-end on your own hardware. Start from which layer you want (block vs platform) and which isolation strength your code demands, shortlist the two that fit, and prototype against both before you commit.

Frequently asked questions

What is the best open-source E2B alternative for self-hosting?

It depends on whether you want a platform or a building block. If you want a platform you can run end-to-end on your own hardware, PandaStack is built for it: its core is Apache-2.0 licensed and self-hosts on any Linux host with /dev/kvm (control-plane API plus a per-host agent, sandboxes executing on your infrastructure). Every sandbox is a Firecracker microVM, it restores a baked snapshot on every create (179ms p50), and it offers first-class copy-on-write forking (~400ms same-host) plus managed services on one substrate. microsandbox is another self-hosted microVM platform (libkrun-based, runs OCI images). If you'd rather assemble your own system, Firecracker, Kata Containers, and gVisor are the open-source isolation building blocks to build on. Confirm any candidate's license and isolation model in its own repository before committing.

Are nsjail and bubblewrap safe for running untrusted code?

They're safe for the right job, which is narrower than people sometimes assume. Both are open-source OS-level process jails built on Linux namespaces, cgroups, seccomp, and capability dropping — excellent for confining a single, well-understood binary with minimal overhead. But they share the host kernel: a confined process still calls into the one kernel every other process uses, so a kernel bug reachable from a guest syscall is a potential escape. For arbitrary, adversarial code (the LLM-writes-and-runs-anything case), a hardware-virtualized microVM puts a much stronger boundary between guest code and the host than a shared-kernel jail does. Use nsjail or bubblewrap for constrained, trusted-ish workloads; reach for a microVM for genuinely untrusted code.

What's the difference between gVisor, Kata Containers, and Firecracker?

They sit at different points on the isolation curve. gVisor is a user-space application kernel: it intercepts a sandboxed process's syscalls in user space so they don't hit the host kernel directly, narrowing the attack surface without a full VM (with compatibility and performance trade-offs). Kata Containers runs each container or pod inside a lightweight VM with its own guest kernel, presenting a standard container interface and integrating with Kubernetes. Firecracker is a minimal virtual machine monitor (Rust) that boots microVMs with a tiny device model under a privilege-dropping jailer. Both Kata and Firecracker give hardware-virtualized (KVM) microVM isolation, which is the strongest of the three; gVisor is a step up from plain containers but a different bet from a full VM. All three are building blocks, not finished sandbox platforms.

Do open-source sandboxes use Firecracker microVMs?

Many of the strongest ones build on hardware-virtualized microVMs, but not all use Firecracker specifically. PandaStack runs every sandbox as a Firecracker microVM and exposes its Apache-2.0 core for self-hosting. Firecracker itself is the open-source Rust VMM underneath much of the market. Kata Containers delivers microVM isolation through KVM as well, with a container interface. microsandbox uses libkrun for its microVMs. gVisor takes a different approach entirely (a user-space kernel rather than a hardware-virtualized VM), and nsjail and bubblewrap are OS-level jails with no VM at all. So 'open-source sandbox' spans the full isolation range — always confirm a given project's actual isolation backend in its own documentation rather than assuming Firecracker.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free

Written by Ajay Kumar, Founder, PandaStack.