all posts

Best Self-Hosted Code Execution Sandboxes in 2026

Ajay Kumar··10 min read

Most teams that go looking for a 'self-hosted code execution sandbox' arrive with the SaaS decision already behind them. They've priced out a hosted sandbox API, run the math at their projected volume, hit a data-residency clause, or simply decided they don't want a per-call bill on the hot path of their product — and now they want to run untrusted code on infrastructure they own. That's a different shortlist than the hosted roundup, and it's the one this post covers: the genuinely self-hostable, mostly open-source options for executing code your users or your agents wrote, on your own hardware. As with everything in this market, the trick is being an honest broker — real selection criteria, a fair pass over the field, and specifics only where I can stand behind them.

The field covered here: Firecracker (the raw, DIY VMM — you build the agent and scheduler yourself), PandaStack (our project — an open-source platform built on Firecracker), gVisor/runsc (a secure container runtime), Kata Containers (VM-grade isolation with a container UX), and the lighter jailing/microsandbox approaches. The hosted-only services — E2B's managed tier, Modal, Daytona's cloud — show up only as the contrast that explains why someone wants self-hosted in the first place. The companion hosted roundup is /blog/best-code-execution-sandboxes; this one assumes you've decided to own the substrate and want to know what to put on it.

Disclosure: I'm the founder of PandaStack, so read this as a vendor's roundup and weight it accordingly. I keep it honest the only way that works — I cite specific numbers (latency, fork times, license) only for PandaStack, and I describe every other tool in general, qualitative terms drawn from its own docs rather than inventing internals or quoting figures I can't stand behind. I deliberately don't print competitor latency or dollar pricing, because both are easy to mis-measure and change monthly. For anything load-bearing to your decision, verify against each project's own current docs and repo before you commit.

Why self-host a sandbox at all

Self-hosting is real operational weight, so it's worth being clear about what you're buying with it. There are three honest reasons teams take it on, and if none of them apply to you, a hosted sandbox is genuinely less work.

  • Data residency and control — the code (and whatever it touches: customer data, secrets, internal services) executes on machines you operate, in regions you choose, under your own network and audit controls. For regulated workloads or strict data-residency commitments, this is often non-negotiable, and no amount of hosted convenience substitutes for it.
  • Cost at scale — a hosted sandbox bills per second of CPU, per creation, per GB of egress. At low volume that's a bargain. At high, steady volume — millions of executions, or long-lived environments — owning the hardware can flip the economics, because you're paying for capacity instead of renting each call. Run the math at your real volume; the crossover is a spreadsheet question, not a vibe.
  • No per-call SaaS coupling — when sandbox execution is on the critical path of your product, a third-party API in that path is a dependency, a rate limit, and a pricing-change risk. Self-hosting removes the vendor from the hot loop.

The counterweight is equally honest: self-hosting means you operate KVM hosts, a scheduler, networking, snapshot storage, a template pipeline, and an upgrade cadence — forever. The SaaS-only providers (E2B's managed tier, Modal, Daytona-cloud) exist precisely because that's a lot of work, and for many teams paying to skip it is the right call. The rest of this post assumes you've weighed that and still want to own the substrate.

The criteria that actually separate them

Every option here will run a Python script and hand you back stdout — that baseline tells you nothing. For a self-hosted deployment specifically, the differences that decide fit live in six places. Work out which one is forcing your hand before you compare anything, because the right answer changes completely depending on which one matters to you.

  • Isolation strength — you're running untrusted code by definition, so this is the criterion that matters most: shared-kernel container, user-space kernel (gVisor), or hardware-virtualized microVM (Firecracker, Kata). The ladder is in /blog/code-isolation-hierarchy.
  • Boot and create latency — how long it takes to get a clean environment ready to run. For agent loops and bursty per-request execution this compounds fast; for long-lived environments it matters less.
  • Density and cost — how many concurrent sandboxes you fit per host, and how cheaply they sit idle. This is where self-hosting either pays off or doesn't, so weight it heavily if cost-at-scale is your reason for being here.
  • Operational burden — the gap between 'an isolation primitive' and 'a platform you can run.' The VMM or runtime is the easy part; networking, scheduling, storage, and an API are the work.
  • Snapshot and fork support — can you freeze a warm environment and clone it cheaply (copy-on-write)? This is the difference between re-running setup every time and forking a hot environment N times.
  • Language and runtime support — most of these are OS-level boundaries that run anything in a Linux process, but the surrounding template/image story decides how painlessly you get the runtimes you need.
Don't over-read 'microVM' as 'immune.' Hardware virtualization is a far stronger boundary than a shared kernel, and a minimal VMM's attack surface is much smaller and better-audited than the full Linux syscall interface a container shares — but it is not zero. VMMs have had bugs; KVM has had bugs. The honest claim is 'dramatically smaller, better-audited attack surface,' not 'unbreakable.' Defense in depth — a privilege-dropping jailer, seccomp, per-sandbox egress controls — still matters on top of the VM boundary. We cover that layering in /blog/secure-code-execution-for-ai-agents.

Firecracker (raw, DIY)

Firecracker is the minimal Rust VMM from AWS that underpins much of this market, including Lambda and Fargate. Each microVM gets its own guest kernel, isolated by KVM, with a deliberately tiny device model (a handful of virtio devices) and a built-in jailer that drops privileges — the smallest, best-audited trusted surface you can practically run untrusted code behind. As a self-hosting foundation it's the strongest possible base layer and also the most work: Firecracker boots a VM from a kernel and rootfs and exposes an API socket, and that's it. Networking (tap devices, namespaces, NAT), snapshotting and restore orchestration, a rootfs/template pipeline, cross-host scheduling, image storage, and the actual execution API are all yours to build and operate. The VMM is the easy 10%; the platform around it is the 90%.

  • Isolation model: hardware-virtualized microVM, own guest kernel per sandbox, KVM-isolated, minimal device surface under a jailer — the strongest practical base layer.
  • Best fit: teams with real systems/infra muscle who want maximum control and minimal trust surface, and accept that they're building the orchestration platform themselves. See /blog/what-is-a-microvm and /blog/firecracker-vs-docker.

PandaStack (open-source platform on Firecracker)

PandaStack (our project) is the platform layer over Firecracker that the raw option leaves you to build — open-source under Apache-2.0 and designed to run on any Linux box with /dev/kvm. Every sandbox is a Firecracker microVM with its own guest kernel (5.10, Ubuntu 24.04 guest), KVM-isolated, under a jailer that exposes only a minimal virtio device model (net, block, vsock). You run the control-plane API plus a per-host agent; sandboxes execute entirely on your infrastructure. There's a hosted offering on the same binaries, so identical SDK code targets either. Where I'm allowed to be specific, because these are our own numbers: boot is snapshot-restore on every create — no warm pool of idle VMs — landing at 179ms p50, roughly 203ms p99, with the restore step itself around 49ms; the only slow path is the first-ever spawn of a brand-new template, which cold-boots in about 3s and bakes the snapshot. Forking is first-class via copy-on-write (same-host 400–750ms, cross-host 1.2–3.5s), so you warm an environment once and fork it N times. Per-sandbox networking comes from 16,384 pre-allocated /30 subnets per agent, and managed Postgres, git-driven app hosting, and serverless functions all run on the same substrate (a database create runs 30–90s). The self-hosted API shape is a few lines of Python:

from pandastack import Sandbox

# Points at YOUR control plane via PANDASTACK_API_KEY + a configurable
# base URL; the same code targets the hosted offering unchanged.
sbx = Sandbox.create(template="code-interpreter", ttl_seconds=600)

# Run untrusted code inside the isolated guest, on your own hardware.
result = sbx.exec("python -c 'print(sum(range(100)))'")
print(result.stdout)      # -> 4950
print(result.exit_code)   # -> 0

sbx.kill()  # tear down now, or let the TTL reap it

# A Firecracker microVM booted (~179ms p50 via snapshot-restore),
# ran the code, and never left your infrastructure.
  • Isolation model: hardware-virtualized Firecracker microVM, own guest kernel per sandbox, KVM-isolated, minimal VMM surface under a jailer.
  • Best fit: teams who want Firecracker-grade isolation as a platform they own end-to-end — snapshot-restore, CoW forking, and managed Postgres/apps/functions on one substrate — without building the orchestration layer from scratch. The wrong pick if you have no infra appetite and a hosted-only service would do.

gVisor / runsc

gVisor is Google's user-space kernel: runsc is an OCI runtime that drops in under Docker or Kubernetes, intercepting guest syscalls in a userland kernel (Sentry) so they mostly don't reach the host kernel directly. That's a meaningful step up from a plain shared-kernel container with comparatively little new operational surface — if you already run Kubernetes, adopting runsc as a runtime class is a small change rather than a new platform. The trade-offs are workload-dependent: some syscall-heavy or I/O-heavy programs see compatibility gaps or a performance cost, because the userland kernel is re-implementing the interface. It's a different bet from a hardware-virtualized VM — strong isolation without a full guest kernel per sandbox — and a pragmatic middle rung. Verify current syscall compatibility and performance characteristics against gVisor's docs for your specific workload.

  • Isolation model: user-space kernel (Sentry) intercepting guest syscalls; OCI runtime (runsc), no per-sandbox guest kernel — a middle rung above containers, below a full VM.
  • Best fit: teams already on Docker/Kubernetes who want a real isolation upgrade with minimal new operational surface and can tolerate workload-dependent compatibility/perf trade-offs. See /blog/gvisor-vs-firecracker.

Kata Containers

Kata Containers gives you VM-grade isolation behind a container and Kubernetes UX: each pod or container runs inside a lightweight VM with its own guest kernel, but you drive it through familiar OCI/CRI tooling. Crucially for this list, Kata can run on multiple VMMs underneath — Firecracker, Cloud Hypervisor, or QEMU — so you can dial the isolation/feature trade-off (Firecracker for a minimal surface, QEMU for broader device support) while keeping the same orchestration on top. If your platform is already Kubernetes-shaped and you want hardware-virtualized isolation without abandoning that ecosystem, Kata is the natural fit. It is more moving parts than a single VMM, and the runtime-class plumbing is real work to operate, so verify the VMM options and overhead against Kata's docs for your cluster.

  • Isolation model: VM-grade — lightweight VM with its own guest kernel per workload — over a choice of Firecracker, Cloud Hypervisor, or QEMU, driven through container/Kubernetes tooling.
  • Best fit: Kubernetes-native teams who want hardware-virtualized isolation without leaving the OCI/CRI world, and want to choose the underlying VMM. See /blog/kata-vs-firecracker.

microsandbox and lighter jailing approaches

At the lighter end sit two distinct things worth separating. microsandbox is an open-source project that runs code in microVMs via libkrun (a library that embeds a VMM, KVM-backed on Linux), aiming for a self-hostable, fast-booting sandbox you can embed without standing up a full platform — a smaller-footprint take on the same VM-isolation idea. Distinct from that are pure OS-level jailing approaches — seccomp-bpf syscall filters, Linux namespaces, Landlock, bubblewrap, and similar — which harden a process without a guest kernel. Jailing is the lightest and lowest-overhead option and a genuinely useful layer, but on its own it's still shared-kernel: a kernel-level escape is a host compromise, so for arbitrary untrusted code it's best treated as defense-in-depth on top of a VM boundary rather than the whole boundary (we cover the seccomp layer in /blog/seccomp-explained and the jailing pattern in /blog/jailing-llm-generated-code). Verify each project's isolation backend and threat model against its own docs.

  • Isolation model: microsandbox — microVMs via libkrun (KVM-backed), VM-grade but lighter footprint; pure jailing (seccomp/namespaces/Landlock) — process hardening, shared-kernel, best as a layer not the whole boundary.
  • Best fit: teams wanting an embeddable VM sandbox without a full platform (microsandbox), or teams adding syscall/namespace hardening on top of a stronger boundary. See /blog/best-open-source-code-sandboxes.

The field, option by option

The short version of each, by isolation model and operational shape, so you can scan and shortlist. The discipline holds throughout: specific numbers only for PandaStack, every other option in general terms with a 'verify against its docs' caveat, and no invented figures.

  • Firecracker (raw) — A: strongest practical base layer (minimal KVM VMM, jailer, guest kernel per sandbox). B: you build the entire platform (networking, scheduling, snapshots, API) yourself.
  • PandaStack — A: open-source (Apache-2.0) Firecracker platform you run on any /dev/kvm host; snapshot-restore on every create (179ms p50, ~203ms p99, ~49ms restore), CoW forking (400–750ms same-host), 16,384 /30 subnets per agent, plus managed Postgres/apps/functions. B: it's the platform layer over Firecracker, so the isolation is exactly Firecracker's — and you still operate KVM hosts.
  • gVisor / runsc — A: user-space kernel, drops in as an OCI runtime under Docker/Kubernetes with little new surface. B: not a full VM, with workload-dependent syscall compatibility and performance trade-offs.
  • Kata Containers — A: VM-grade isolation with container/Kubernetes UX over a choice of Firecracker/Cloud Hypervisor/QEMU. B: more moving parts and runtime-class plumbing to operate.
  • microsandbox — A: embeddable microVM sandbox via libkrun (KVM-backed), self-hostable without a full platform. B: smaller ecosystem; verify maturity and feature set against its docs.
  • Pure jailing (seccomp / namespaces / Landlock) — A: lightest weight, lowest overhead, great as a hardening layer. B: shared-kernel on its own, so not a sufficient boundary for arbitrary untrusted code.
  • SaaS-only contrast (E2B managed, Modal, Daytona-cloud) — A: zero substrate to operate. B: not self-hosted — the very thing this list is for; they're the reason you're reading it.

How to choose

Start from the criterion forcing your hand, then map your situation to the option rather than the reverse:

  • Choose Firecracker direct when you want maximum control and the minimal trust surface, have systems-engineering muscle to spare, and are happy to build the orchestration platform around the VMM yourself.
  • Choose PandaStack when you want Firecracker-grade isolation as a platform you own end-to-end — snapshot-restore, CoW forking for rollouts, managed Postgres/apps/functions — without building scheduling, networking, and snapshot orchestration from scratch.
  • Choose gVisor when you're already on Docker/Kubernetes, want a real isolation upgrade with minimal new ops, and your workloads tolerate its syscall-compatibility and performance trade-offs.
  • Choose Kata when your platform is Kubernetes-native and you want hardware-virtualized isolation inside the OCI/CRI world, with the freedom to pick the underlying VMM.
  • Choose microsandbox when you want an embeddable, fast-booting VM sandbox without standing up a full platform — and verify its maturity against your needs.
  • Add pure jailing (seccomp/namespaces/Landlock) as a hardening layer on top of any of the above — not as your only boundary for untrusted code.
  • Stay on a SaaS-only service (E2B managed, Modal, Daytona-cloud) if, after weighing it, none of the self-host reasons — data residency, cost at scale, no per-call coupling — actually apply to you. Less work is a legitimate answer.
Don't pick from this post — or any roundup, including the ones written by the vendors themselves — on the strength of a description. Isolation backends get swapped, licenses change, and 'self-hosted' covers everything from a true open-source deploy to a proprietary control plane in your cloud. Pull every quantitative claim (boot time, overhead, license) live from each project's own repo and docs and date it. Then build a one-day spike against your top two on a single KVM host: stand up the sandbox, run your real workload under realistic concurrency, and measure density, boot latency, and the operational steps you'll repeat forever. A day of hands-on testing settles more than a week of reading comparison pages.

The bottom line

There is no single best self-hosted code execution sandbox — there's a best one for your reason for self-hosting and your six constraints. The serious options share the foundation that matters most: hardware-virtualized microVM isolation (Firecracker, Kata, microsandbox) is the correct base for arbitrary untrusted code, gVisor is a meaningful middle rung, and pure jailing is a hardening layer rather than a whole boundary. They differ on how much platform you have to build, how dense and cheap they run, whether snapshot and fork are first-class, and how naturally they fit your existing stack. Decide whether your driver is data residency, cost at scale, or removing a SaaS dependency from the hot path, shortlist the two that fit, and prototype against both on real hardware before you commit. PandaStack's bet, for the record, is an Apache-2.0 Firecracker core wrapped in a full platform — snapshot-restore on every create (179ms p50), CoW forking (400–750ms same-host), per-sandbox networking, managed services — that you run end-to-end on your own hardware. If that matches your constraints, benchmark it against the field and keep us honest.

Frequently asked questions

What is the best self-hosted code execution sandbox in 2026?

There's no universal winner — it depends on your reason for self-hosting (data residency, cost at scale, or removing a SaaS dependency) and six constraints: isolation strength, boot latency, density/cost, operational burden, snapshot/fork support, and runtime support. For arbitrary untrusted code, the strongest base layers are hardware-virtualized microVMs: Firecracker (raw, you build the platform), PandaStack (an open-source Apache-2.0 platform built on Firecracker, with snapshot-restore on every create at 179ms p50 and CoW forking at 400–750ms same-host), Kata Containers (VM-grade isolation with a Kubernetes UX over Firecracker/Cloud Hypervisor/QEMU), and microsandbox (microVMs via libkrun). gVisor/runsc is a meaningful middle rung, and pure jailing (seccomp/namespaces) is a hardening layer rather than a whole boundary. Decide which criterion is forcing your hand, then prototype your top two on your own hardware before committing.

Why would I self-host a code sandbox instead of using E2B, Modal, or a hosted service?

Three honest reasons: data residency and control (code executes on machines you operate, in regions you choose, under your own network and audit controls — often non-negotiable for regulated workloads); cost at scale (hosted services bill per-second CPU, per creation, and per GB egress, which is a bargain at low volume but can flip at high steady volume where owning capacity beats renting each call); and removing a per-call SaaS dependency from the hot path of your product. The counterweight is real: self-hosting means you operate KVM hosts, scheduling, networking, snapshot storage, and an upgrade cadence forever. Hosted-only services like E2B's managed tier, Modal, and Daytona-cloud exist precisely because that's a lot of work — if none of the three reasons apply to you, paying to skip the operations is a legitimate choice.

Is Firecracker enough on its own, or do I need a platform on top of it?

Firecracker is the strongest practical base layer — a minimal KVM-backed VMM that boots a microVM with its own guest kernel behind a privilege-dropping jailer — but on its own it only boots a VM from a kernel and rootfs and exposes an API socket. Everything that makes it a usable sandbox service is yours to build: tap-device and namespace networking, snapshot/restore orchestration, a rootfs/template pipeline, cross-host scheduling, image storage, and the execution API. The VMM is roughly the easy 10%; the platform is the other 90%. If you have the systems-engineering muscle and want total control, building directly on Firecracker is the maximal-control path. If you want Firecracker-grade isolation without building that platform, an open-source platform layer like PandaStack provides snapshot-restore, copy-on-write forking, per-sandbox networking, and an API on top of the same VMM.

How does gVisor compare to Firecracker or Kata for self-hosting?

They sit at different points on the isolation ladder. gVisor/runsc is a user-space kernel (Sentry) that intercepts guest syscalls so they mostly don't reach the host kernel directly; it drops in as an OCI runtime under Docker or Kubernetes with comparatively little new operational surface, but it's not a full VM and has workload-dependent syscall-compatibility and performance trade-offs. Firecracker and Kata are hardware-virtualized: each sandbox gets its own guest kernel isolated by KVM, the stronger boundary for arbitrary untrusted code. Kata wraps that VM isolation in a container/Kubernetes UX and can run over Firecracker, Cloud Hypervisor, or QEMU. The practical read: if you're already Kubernetes-native and want a real upgrade with minimal new ops, gVisor is pragmatic; if you want the strongest boundary, choose a microVM (Firecracker direct, Kata, or a platform like PandaStack). Verify each project's current behavior against its own docs.

Can self-hosted sandboxes do snapshots and forking like hosted ones?

Some can, and it's worth checking directly because it's easy to assume wrongly. Snapshot and copy-on-write fork support depends on the platform layer, not just the VMM. PandaStack exposes them as first-class primitives — boot is snapshot-restore on every create (179ms p50, ~49ms for the restore step itself, no warm pool), and forks run 400–750ms same-host or 1.2–3.5s cross-host via copy-on-write (MAP_PRIVATE guest memory plus a reflinked rootfs), so you can warm an environment once and fork it N times. Raw Firecracker supports snapshot/restore at the VMM level, but you orchestrate it yourself; Kata and gVisor have their own snapshot/checkpoint stories that vary by version and VMM. If snapshot-and-fork is core to your workload — agent rollouts, tree-search, 'try N fixes' — confirm each option's exact semantics in its own docs rather than assuming from a feature matrix.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.