How PandaStack Creates a MicroVM in Under 200ms
PandaStack creates a fresh Firecracker microVM — own guest kernel, hardware-isolated, ready to take commands — with a median latency of 179ms and a p99 of about 203ms. It is tempting to read that as "Firecracker boots in 179ms." It does not. That number is a snapshot restore, not a kernel boot. The distinction is the entire point of this post, because the two things are mechanically different and conflating them leads people to wildly wrong mental models of what their infrastructure is doing.
There are two separate claims worth keeping apart. First: a cold Firecracker boot really is fast for a virtual machine — single-digit to low-hundreds of milliseconds — because of how Firecracker is built. Second: PandaStack's steady-state create is faster still, because in steady state we don't boot at all. We restore a frozen machine and resume it. This piece walks both, in order, with the actual pipeline and the actual numbers.
Why a cold Firecracker boot is already fast
Before snapshots enter the picture, it helps to understand why Firecracker's cold boot is quick relative to a conventional VM that takes tens of seconds. A traditional hypervisor emulates a whole PC: a BIOS or UEFI firmware, a PCI bus, legacy interrupt controllers, emulated disk and network controllers, a VGA console. The guest firmware probes all of it, initializes it, then hands off to a bootloader, which loads a general-purpose kernel that re-discovers the same hardware. That discovery-and-init dance is where the seconds go.
Firecracker throws nearly all of it away. The things that make its cold boot fast:
- No BIOS or UEFI firmware. Firecracker loads the guest kernel directly and jumps into it. There is no firmware phase to sit through.
- A minimal virtio device model — net, block, and vsock, plus a serial console. No PCI bus to enumerate, no legacy device emulation, no VGA. There is almost nothing for the guest to probe.
- A stripped, purpose-built guest kernel. PandaStack runs a 5.10 kernel compiled with only the drivers a microVM actually has, so kernel init skips the long tail of hardware it will never see.
- A small, single-purpose VMM written in Rust that runs under a jailer which drops privileges — fast to fork+exec and a deliberately tiny host attack surface.
Put together, the guest goes from process launch to a running userspace far faster than a legacy VM. That is the boot Firecracker is famous for, and it is real. But "fast for a VM" is still measured against an init sequence, a kernel handshake, and userspace startup. To go faster, you have to stop booting.
The trick: restore a frozen machine instead of booting one
A Firecracker snapshot serializes a running microVM at a single instant: the guest's entire physical RAM (vm.mem), and the VMM state (vm.state) — vCPU registers, the interrupt controller, the clock, and every virtio device's configuration. Restoring that snapshot is not a boot. The guest does not run init, does not re-probe devices, does not start systemd. The kernel was already up, its page cache was already warm, processes were already running — and restore brings all of that back exactly as it was frozen.
Mechanically, restore is closer to "map a memory file and resume vCPUs" than to "start a computer." That is why it lands in tens of milliseconds rather than seconds. There is no warm pool of idle VMs sitting around burning RAM in PandaStack — every create restores a baked template snapshot on demand. The model is covered in depth in /blog/snapshot-and-fork-explained and the engineering reference at /docs/internals/snapshot-restore; the one-line version is that a snapshot is a paused running machine, not a disk image you boot from.
A cold boot starts a computer. A snapshot restore resumes one that was already running. PandaStack pays the first cost once per template and the second cost on every create.
The first spawn: a real cold boot, then a bake
The snapshot has to come from somewhere. The very first time a template is ever spawned on an agent, there is no snapshot to restore, so PandaStack does a real cold boot — the full Firecracker boot described above, plus the guest's own userspace coming up to a ready state. That takes on the order of 3 seconds. Once the guest is booted and ready, the agent snapshots it: it freezes the running machine's memory and device state to disk and records the network identity it booted with.
From that point on, every create of that template restores the baked snapshot. The ~3s cold boot is a one-time cost that gets amortized across every subsequent create. Re-baking a template (a new kernel, new rootfs contents) invalidates the old snapshot and triggers a fresh cold-boot-and-bake on the next spawn. So the honest description of the system is: it cold-boots rarely and restores constantly, and the 179ms figure describes the constant case.
The create pipeline, step by step
Restoring the snapshot is only one stage of a create. To get from an API call to a sandbox you can run commands in, the agent has to allocate networking, lay down a writable rootfs, launch the VMM, load the snapshot, resume, and confirm the guest is reachable. Here is the pipeline, with rough per-stage costs on the fast path. Several stages overlap in practice; these are the contributions, not a strict serial sum.
- Allocate a NATID network slot (~1ms). PandaStack keeps a small warm pool of pre-built Linux network namespaces — netns + veth pair + tap + iptables — so create grabs a ready slot instead of doing ip netns add / ip link add cold, which would cost ~100ms. The address space tops out at 16,384 /30 subnets per agent; the warm pool is just the prebuilt depth in front of that, and if it drains a slot is built on demand.
- Configure the tap device in the namespace (~6ms). The guest's baked snapshot expects a specific IP, MAC, and gateway; the agent patches the tap's MAC and routes to match the values frozen at bake time, so the restored guest sees the network identity it remembers.
- Reflink the rootfs (~4ms). The writable disk is an XFS reflink clone of the template's ext4 rootfs — an O(metadata) copy-on-write clone. Blocks are shared with the template until the guest writes, so this is constant-time regardless of image size. (dm-snapshot is also supported.)
- Fork+exec Firecracker under the jailer (~25ms). The VMM process starts in its dropped-privilege jail. There is no firmware or device enumeration here — it comes up ready to be handed a snapshot.
- POST /snapshot/load (~80ms). Firecracker memory-maps vm.mem and loads vm.state. The memory mapping is lazy and copy-on-write (MAP_PRIVATE), so pages fault in only as the guest touches them rather than being eagerly copied up front.
- POST /snapshot/state Resume (~6ms). The vCPUs are unpaused. The guest is now running, mid-instruction, exactly where the snapshot froze it — no reboot, no init.
- Probe TCP :22 (~40ms). The agent confirms the guest's network stack is live and accepting connections before declaring the sandbox ready, so a create only returns success once the box is actually usable.
- Insert the sandbox row (~6ms, async). Bookkeeping in the agent's database, done off the critical path so it does not gate readiness.
The two stages that dominate are the snapshot load (~80ms) and the readiness probe (~40ms). Everything else — networking, rootfs, VMM launch — is engineered to be small and to overlap, which is how the whole pipeline lands at a 179ms p50 with a tight p99 of ~203ms. The NATID design is the reason the network stage is ~7ms instead of ~100ms; more on that in /docs/concepts/networking-natid.
What happens when the memory file isn't local
The pipeline above assumes the snapshot's memory file is already on the host's disk. On a fresh agent, or one that has never served a given template, vm.mem might live in object storage instead. Downloading a multi-gigabyte memory image before you can restore would blow the latency budget, so PandaStack can stream it instead.
This uses userfaultfd, a Linux kernel feature that delivers page-fault events to a userspace handler. The flow: the guest touches a page that isn't resident → the kernel raises a fault → userfaultfd hands that fault to the agent's handler → the handler fetches the corresponding 4 MiB chunk from object storage over an HTTP Range GET → and installs it with UFFDIO_COPY. Pages that are known to be all-zero are elided entirely (no fetch, just a zero-fill), a prefetch trace warms the hot set in the background, and a shared per-host chunk cache means the first restore on a host pays the network cost once and later restores read locally. Optional 2 MiB hugepages cut the number of faults further. The deeper mechanics live in /docs/internals/streaming-restore — but the point for latency is that an agent can start serving restores without first downloading the whole memory image.
Worth being precise about one thing: streaming applies to memory, not the disk. The rootfs always has to be a local file because copy-on-write cloning (reflink or dm-snapshot) needs a local block device. Memory is what gets paged in on demand; the rootfs is synced and reflinked locally.
Fork: the same primitive, pointed at a running machine
Once you see create as "restore a snapshot," fork falls out naturally. A create restores the generic baked template snapshot. A fork snapshots a specific running sandbox and restores that — the same copy-on-write memory map and reflinked rootfs, just starting from your live machine instead of the clean template. A same-host fork runs in roughly 400ms because the parent's memory is already resident and the rootfs reflinks locally; cross-host is 1.2–3.5s because the artifacts have to move over the network first. The full treatment is in /blog/snapshot-and-fork-explained and /docs/internals/fork-cow.
The honest summary
Firecracker's cold boot is fast because it strips the VM down to a minimal device model, no firmware, and a purpose-built kernel — that is a real and useful property, and it is what makes the once-per-template ~3s bake tolerable. But PandaStack's 179ms p50 create is not that boot. It is a snapshot restore: map memory copy-on-write, load device state, resume the vCPUs, confirm the guest is reachable. The speed comes from never paying for the boot twice, wrapped in a create pipeline where networking, rootfs cloning, and VMM launch are each shaved down to single-digit or low-tens of milliseconds.
If you want to see the substrate up close, every sandbox, managed Postgres database, and git-driven app on PandaStack runs on exactly this path — and the core is open source under Apache-2.0, so you can run the control-plane API and per-host agent on your own Linux KVM hosts and watch the create timings yourself. For the conceptual grounding, start with /blog/what-is-a-microvm; for the mechanics, /docs/internals/snapshot-restore.
Frequently asked questions
How does Firecracker achieve sub-100ms boot time?
Two things are happening, and they're worth separating. A cold Firecracker boot is fast for a VM because Firecracker has no BIOS or firmware, a minimal virtio device model (net/block/vsock), and a stripped guest kernel — so there's almost nothing to initialize. But the sub-100ms-class numbers come from snapshot restore, not cold boot: a booted machine is snapshotted once, and every create after that restores that memory and device state and resumes the vCPUs, skipping boot entirely. PandaStack's 179ms p50 create is the restore path, not a kernel boot.
Is PandaStack's 179ms create a Firecracker cold boot?
No. 179ms (p50, ~203ms p99) is a snapshot-restore-and-resume. The actual cold boot — kernel init plus userspace coming up — takes about 3 seconds and happens only the first time a template is spawned, after which the agent bakes a snapshot. Every subsequent create restores that snapshot, which is why it lands in well under 200ms.
Why doesn't snapshot-restore latency grow with guest RAM size?
Because the memory file is mapped copy-on-write (MAP_PRIVATE) rather than copied up front. Restoring a 2 GiB guest doesn't read 2 GiB off disk before the guest can run — it maps the snapshot file and the kernel faults pages in lazily, only when the guest first touches them. When the memory file lives in object storage, those pages can be streamed on demand over HTTP Range GETs via userfaultfd, so an agent restores without downloading the whole image first.
What are the slowest stages of the create pipeline?
On the fast path, the two dominant stages are the snapshot load (~80ms, memory-mapping vm.mem and loading device state) and the TCP readiness probe (~40ms, confirming the guest is actually reachable before returning success). Everything else — NATID network slot (~1ms), tap config (~6ms), rootfs reflink (~4ms), Firecracker fork+exec (~25ms), and resume (~6ms) — is small and overlaps, which is how the full create reaches a 179ms median.
49ms p50 cold start. Fork, snapshot, and scale to zero.