all posts

Firecracker vs runc: the OCI runtime, honestly compared

Ajay Kumar··9 min read

"Firecracker vs runc" is a better-framed comparison than most, because unlike "Firecracker vs Docker" these two really are peers — both can sit under containerd as the low-level runtime that actually brings a workload to life. The catch is that they bring it to life in completely different ways. runc creates a container by carving up the host kernel you already have; Firecracker boots a brand-new guest kernel inside a VM. Same slot in the stack, opposite isolation boundary. This post is the fair head-to-head, and it is fair to runc — runc is a small, sharp, excellent tool, and most of the time it's exactly the right one.

What runc actually does

runc is the reference implementation of the OCI Runtime Spec. It's what Docker and containerd shell out to when the real work of creating a container happens. It is deliberately unglamorous: you hand it a filesystem bundle (an unpacked root filesystem plus a config.json) and it turns that into a running process. No image pulling, no networking orchestration, no daemon — those live a layer up in containerd and Docker. runc's whole job is the moment of container creation.

Concretely, when runc starts a container it does roughly this: call clone()/unshare() with the namespace flags from config.json (PID, mount, network, UTS, IPC, user, cgroup) so the process gets its own private view of the system; write the resource limits into cgroups (CPU shares, memory ceiling, pids max, I/O weight); pivot_root into the bundle's rootfs and set up the mounts; apply the security policy — a seccomp filter that blocks or allows syscalls, plus AppArmor/SELinux labels and dropped capabilities; and finally exec the entrypoint. From that point runc mostly gets out of the way. The container is just a normal Linux process wearing a costume of namespaces and cgroups, running on the same kernel as everything else on the box.

That minimalism is a feature. runc is small, fast, well-audited, and does exactly what the config tells it to. The flip side is the same sentence read pessimistically: runc does exactly what you tell it, including the parts you didn't mean. A slightly-too-permissive seccomp profile, a capability you forgot to drop, a host path bind-mounted read-write — runc will faithfully honor all of it, because policing your intent isn't its job.

# runc operates on an OCI bundle: a rootfs dir + a config.json.
# Docker/containerd generate this for you; here it is by hand.
$ mkdir -p mycontainer/rootfs
$ docker export $(docker create busybox) | tar -C mycontainer/rootfs -xf -
$ cd mycontainer && runc spec        # writes a default config.json

# config.json declares the isolation — this is the whole boundary:
#   "namespaces": [ {"type":"pid"}, {"type":"network"}, {"type":"mount"}, ... ]
#   "linux": { "seccomp": { "defaultAction": "SCMP_ACT_ERRNO", ... },
#             "resources": { "memory": {"limit": 536870912} } }

$ sudo runc run demo                 # clone() + cgroups + seccomp + exec entrypoint
# ...all of that ran against the HOST kernel. No guest kernel was booted.

The boundary: one shared kernel

Here is the load-bearing fact about runc, and it isn't a knock on runc's quality — it's inherent to the container model. The process inside a runc container talks directly to the host's Linux kernel. Namespaces change what it can see; cgroups change what it can use; seccomp narrows which syscalls it may make. But the kernel servicing those (surviving) syscalls is the same kernel your other tenants and your host itself are running on. The entire Linux syscall interface — hundreds of calls, plus every driver and filesystem and networking path reachable through them — is the attack surface. Seccomp can shrink it, but a real workload needs a broad enough profile that plenty of surface remains.

The math that matters: with runc, a single Linux kernel privilege-escalation bug reachable through an allowed syscall means a container escape — and escaping the container is escaping onto the host, alongside every other container on it. runc did nothing wrong; the boundary just happens to be the shared kernel, and the shared kernel is a big, evolving thing.

Firecracker's boundary: a kernel per guest

Firecracker is also a runtime you can slot under containerd (that's exactly what firecracker-containerd and Kata Containers do), but it draws the boundary a level lower. Instead of carving up the host kernel with namespaces, it boots a whole new guest kernel inside a hardware-virtualized VM (KVM). The workload makes its syscalls against its own private kernel. It never touches the host kernel at all. The only path from guest to host is a tiny virtio device model — a couple of emulated devices and the KVM ioctl interface — running behind a jailer that chroots, drops privileges, and applies its own seccomp for defense in depth.

So the same class of bug lands very differently. A kernel privilege-escalation bug now compromises the guest kernel — which is that one VM's kernel, isolated from the host and from its neighbors. To reach the host, an attacker has to find and exploit a bug in the VMM's device emulation or in KVM itself: a far smaller, far more heavily audited surface than the full Linux syscall ABI runc leaves exposed. This is precisely why AWS Lambda runs untrusted multi-tenant code on Firecracker rather than on bare containers.

Side by side

  • Isolation boundary — Firecracker: a hardware-virtualized microVM; the guest talks to its own kernel, and only a tiny virtio + KVM surface faces the host. runc: namespaces + cgroups + seccomp around a process on the shared host kernel.
  • Kernel — Firecracker: a full, real guest kernel per VM. runc: none of its own; every container uses the host kernel.
  • Untrusted / AI-generated code — Firecracker: designed for it (Lambda-grade multi-tenant isolation). runc: fine for trusted first-party code, risky for arbitrary code because a kernel bug is a host escape.
  • Startup — Firecracker: a guest-kernel boot, but snapshot-restore makes per-create cost sub-second in practice (PandaStack ~179ms p50). runc: extremely fast — no OS to boot, just clone() + setup + exec (verify against your own measurements).
  • Overhead / density — Firecracker: a few MB per guest for kernel + device model. runc: near-zero beyond the process itself, so raw container density is higher (verify against your workload).
  • Tooling / ecosystem — Firecracker: needs a VMM path (firecracker-containerd, Kata, or a platform); a real guest kernel means near-total Linux compatibility. runc: the universal default under Docker/containerd/Kubernetes; unmatched ecosystem and operational familiarity.

When runc is genuinely the right call

Most of the containers running in the world are running under runc, and they should be. If the code is trusted first-party code — your own services, your build steps, your internal tooling, a Kubernetes deployment of software you wrote and reviewed — then the shared-kernel risk is a risk you already own, and runc gives you the lightest, fastest, most universally supported way to run it. There's no VM tax, the density is excellent, and every piece of container tooling on earth already speaks its language. Reaching for a microVM here would be paying for a boundary you don't need against a threat you don't have.

The line to watch for is trust. runc's boundary is exactly as strong as the assumption that the code inside it isn't actively hostile to the kernel. Hold that assumption and runc is superb; drop it and the shared kernel becomes the thing standing between an attacker and the whole host.

When you need a VM boundary

The moment the code is untrusted, multi-tenant, or generated at runtime, the calculus flips. AI agents executing model-written commands, per-user code playgrounds, code interpreters, CI runners for arbitrary repos, per-customer databases — in all of these you cannot assume the code is friendly to the kernel, because you didn't write it and often no human even read it before it ran. That's the case for a per-guest kernel: you want a bug in the workload's kernel to stay in the workload's kernel.

This is the line PandaStack is built on. It's an Apache-2.0 open-source platform where every sandbox is its own Firecracker microVM with its own guest kernel under the jailer — created via snapshot-restore so the VM boundary costs you almost nothing at create time. The ergonomics are container-grade even though the boundary is VM-grade:

from pandastack import PandaStack

ps = PandaStack()   # reads PANDASTACK_API_KEY from the environment

# One Firecracker microVM: its own guest kernel under KVM, created via
# snapshot-restore (~179ms p50, ~203ms p99; only the first cold boot is ~3s).
# The untrusted code's syscalls hit the GUEST kernel, never the host's.
sb = ps.sandboxes.create(
    template="base",
    ttl_seconds=300,     # reaped automatically if the agent abandons it
)

result = sb.exec("python3 -c 'print(sum(range(100)))'")
print(result.stdout, result.exit_code)

sb.destroy()   # the whole microVM goes away — nothing leaks to the next task

Under the hood the same snapshot-and-fork moat applies: a same-host copy-on-write fork lands in roughly 400–750ms and a cross-host fork in about 1.2–3.5s, and each agent pre-allocates 16,384 /30 subnets so per-sandbox networking is set up in single-digit milliseconds. You get the runc-shaped developer experience — one call, run a command, tear it down — with a hardware boundary underneath. Because the core is Apache-2.0, you can self-host the whole thing on your own KVM hosts and keep that boundary on infrastructure you control.

It's not runc versus Firecracker as enemies. They're peers in the same slot: runc for trusted code where the shared kernel is a risk you own, Firecracker for untrusted or multi-tenant code where you want the kernel bug to stay inside one VM. Many real stacks run both — runc for the platform's own services, a microVM runtime for whatever arbitrary code the platform is asked to execute.

The honest bottom line

runc is a small, excellent tool that does one thing extremely well: turn an OCI bundle into a running process using namespaces, cgroups, and seccomp on the kernel you already have. For trusted, first-party code that's the perfect amount of machinery, and its ubiquity and near-zero overhead are unbeatable. Firecracker occupies the same runtime slot but replaces the shared-kernel boundary with a per-guest kernel behind hardware virtualization, trading a few MB and a guest boot (made sub-second by snapshot-restore) for the property that a kernel bug is contained to one VM. If the code is yours, use runc. If the code is arbitrary, untrusted, or model-generated, put a VM boundary under it — and verify the container-side startup and density claims against your own measurements, since they depend heavily on your workload.

Frequently asked questions

Is runc a competitor to Firecracker?

They're peers in the same slot, not rivals. runc is the default low-level OCI runtime that Docker and containerd shell out to; it creates a container from an OCI bundle using Linux namespaces, cgroups, and seccomp on the shared host kernel. Firecracker is a VMM that boots a full guest kernel per microVM and can also serve as a containerd runtime (via firecracker-containerd or Kata). Same layer in the stack, radically different isolation boundary.

What does runc actually do when it starts a container?

runc takes an OCI bundle (a root filesystem plus a config.json) and turns it into a running process: it calls clone()/unshare() with the namespace flags from the config, writes resource limits into cgroups, pivot_roots into the bundle's rootfs and sets up mounts, applies the seccomp filter and AppArmor/SELinux labels and drops capabilities, then execs the entrypoint. All of that runs against the host kernel — no guest kernel is booted.

Can runc run untrusted code safely?

runc containers share the host kernel, so the whole Linux syscall interface is exposed and a kernel privilege-escalation bug reachable through an allowed syscall means a container escape onto the host and its neighbors. That's fine for trusted first-party code where you already own the risk, but for untrusted, multi-tenant, or AI-generated code you want a stronger boundary — a hardware-isolated microVM (Firecracker) so a kernel bug stays inside one VM.

Is Firecracker slower than runc?

A cold Firecracker boot includes a guest-kernel boot, so raw startup is slower than runc's clone()+exec, and containers have higher raw density because there's no per-guest kernel memory. In practice platforms snapshot a booted VM and restore per create — PandaStack restores at ~179ms p50 (~203ms p99), with only the first cold boot around 3s — which makes the per-create cost sub-second. Verify the container-side numbers against your own workload; they depend on the image and setup.

When should I choose runc over Firecracker?

Choose runc when the code is trusted first-party code: your own services, build steps, internal tooling, standard Kubernetes deployments. You already own the shared-kernel risk, and runc gives you the lightest, fastest, most universally supported runtime with near-zero overhead. Reach for Firecracker (or a platform built on it) when the code is untrusted, multi-tenant, or generated at runtime, and you need a kernel bug to be contained to a single VM.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.