How Firecracker's virtio Devices Work
Most of what makes a VM dangerous to run untrusted code in isn't the CPU or the memory — it's the devices. A guest can't do much harm with arithmetic, but a buggy emulated graphics card, a sound chip nobody uses, or a USB controller written in 2006 is a doorway from inside the VM to the host process driving it. The history of VM escapes is overwhelmingly a history of device-emulation bugs. So the single most consequential design decision a hypervisor makes is: how many devices do I emulate, and how complicated are they? Firecracker's answer is deliberately, almost aggressively, small. This post walks through what a virtio device actually is, how the guest and host hand packets and disk blocks back and forth through shared-memory rings, and why Firecracker's refusal to emulate more than a handful of devices is the foundation of its security story — the same foundation PandaStack runs every sandbox on.
The device model is the attack surface
When a guest kernel boots, it goes looking for hardware: a disk to mount, a network card to bring up, a console to print to, maybe a clock, an interrupt controller, a serial port. On bare metal those are real chips. Inside a VM they're emulated — the hypervisor pretends to be each chip, trapping the guest's reads and writes to device registers and responding the way real silicon would. Every one of those emulated devices is code in the host process, parsing input that comes straight from the guest. And the guest, in a multi-tenant setting, is running code you don't trust.
That's the crux. A device emulator is an attacker-facing parser. The guest controls what it writes to the device's registers and into the buffers the device reads; if the emulator mishandles a malformed descriptor, an out-of-bounds offset, or an unexpected state transition, the bug executes in the host. The richer and more numerous your emulated devices, the more parser code sits on that boundary, and the more chances there are for one of them to be wrong. A hypervisor's host attack surface is, to a first approximation, the sum of its device model.
What Firecracker actually emulates
Firecracker emulates a deliberately tiny set of devices, and almost all of them are virtio — a paravirtualized standard (more on that below). The full list is short enough to state in a sentence: virtio-block for disks, virtio-net for networking, virtio-vsock for guest-to-host messaging, a virtio-based entropy device, a serial console over a legacy UART, and a one-button "keyboard" controller whose only job is to receive a CTRL+ALT+DEL so the host can ask the guest to reboot or shut down. There is no emulated graphics card, no sound, no USB, no PCI passthrough, no BIOS, no CD-ROM, no SCSI controller, no floppy. Most of the device tree a traditional VMM carries simply does not exist here.
That isn't a stripped-down version of a bigger thing — it's the design. Firecracker targets one job (run a serverless function or a sandbox), and a sandbox needs to read a disk, talk to the network, exchange a few control messages with the host, get some randomness, and print logs to a console. Everything on that list serves that job; everything off it is attack surface Firecracker chose not to carry. The one non-virtio oddity, the single-button keyboard, exists only because there's no cleaner in-band way to signal an orderly shutdown to a guest that has no power-management device.
- Firecracker minimal device model — virtio-block, virtio-net, virtio-vsock, virtio-rng (entropy), a serial-console UART, and a one-button keyboard for reboot/shutdown. No graphics, sound, USB, PCI, BIOS, or legacy storage controllers. A handful of small, audited emulators.
- QEMU full device tree — hundreds of emulated devices: multiple NIC models (e1000, rtl8139, virtio), SCSI/IDE/NVMe controllers, VGA/QXL/virtio-gpu graphics, sound cards, USB host controllers, a full PCI bus, ACPI, a BIOS/UEFI firmware, and more. Enormously capable — and an enormously larger surface to audit. (QEMU is general-purpose by design; verify specifics against the QEMU docs.)
QEMU's breadth is a feature for its purpose — it can boot almost any OS and emulate almost any hardware, which is exactly what you want for general-purpose virtualization and development. But breadth and a small trusted boundary are in tension. Firecracker picks the small boundary, and accepts that it can only run the workloads its short device list supports. For untrusted, multi-tenant code, that's the right trade.
What "virtio" means: the guest is in on it
The devices Firecracker emulates are mostly virtio devices, and virtio is worth understanding because it's why the small device model is also a fast one. Emulating a real hardware chip faithfully is slow: the guest's stock driver pokes hardware registers one at a time, and every poke traps out of the guest into the host and back. Faithfully pretending to be an Intel e1000 means servicing thousands of those register accesses, each a context switch.
Virtio throws out the pretense. Instead of emulating a real chip, the host exposes a paravirtualized device — one the guest knows is virtual. The guest loads a virtio driver that's written specifically to cooperate with a hypervisor, and the two sides agree on a far more efficient contract: shared-memory ring buffers, called virtqueues, that both sides can read and write directly, plus a single lightweight notification to say "I've put work in the queue." No register-by-register emulation, no chip to be faithful to. Fewer traps, higher throughput, and — crucially for security — a much simpler emulator, because the device only has to understand the virtio contract rather than mimic real silicon.
The split virtqueue: how a request crosses the boundary
The heart of virtio is the split virtqueue, and it's elegant once you see the three pieces. A virtqueue is a region of memory the guest allocates and the host can access, organized as three rings that together let the guest hand work to the host and get results back without either side blocking on the other.
- Descriptor table — an array of descriptors, each pointing at a chunk of guest memory (address + length) and flagged as something the device reads or writes. A single request (say, "write these 4KB to this disk block") is one or more descriptors chained together.
- Available ring — where the guest publishes the indices of descriptor chains it has prepared and wants the device to process. The guest is the producer here; the host is the consumer.
- Used ring — where the host publishes the indices of chains it has finished, along with how many bytes it wrote. Now the roles flip: the host produces, the guest consumes the completions.
The flow for one operation: the guest fills in a descriptor (or a chain of them) describing the buffers, writes the head index into the available ring, and then notifies the device — a single write to a notification register that traps to the host. The host (Firecracker, or one of its device threads) sees the notification, reads the available ring, follows the descriptor chain to find the guest buffers, does the actual work — reads from the backing file, sends the packet on the TAP — and then writes the completed index into the used ring and raises an interrupt to the guest. The guest's driver, on that interrupt, reads the used ring and knows its request is done. Two rings, two producers, one notification each way. That's the whole machine that moves every disk block and network packet in and out of a Firecracker VM.
Each device maps to one ordinary host resource
The other reason Firecracker's devices stay simple is that each one is just a thin adapter onto a plain host primitive. The device doesn't implement storage or a network — it shuttles bytes between a virtqueue and something the host kernel already knows how to do.
- virtio-block → a file on the host. The guest's "disk" is a regular file (an ext4 image, say). A read descriptor becomes a pread on that file; a write becomes a pwrite. The block device is a few hundred lines that translate virtqueue requests into file I/O.
- virtio-net → a TAP device on the host. Frames the guest transmits get written to a host TAP interface; frames arriving on the TAP get handed back through the receive virtqueue. From there the host's normal Linux stack — namespaces, routing, NAT — handles policy. (We walk the full network path in /blog/firecracker-networking-explained.)
- virtio-vsock → a Unix domain socket on the host. The guest opens a vsock connection to a host port; Firecracker bridges that to a Unix socket the host-side agent listens on. It's a clean control channel that needs no IP, no NIC, and no exposure to the guest's network. (More in /blog/vsock-explained.)
This mapping is why the model composes so well with the rest of the isolation story. The block device is a file, so copy-on-write rootfs is just a reflink of that file. The net device is a TAP, so per-sandbox isolation is just which network namespace the TAP lives in. The vsock device is a Unix socket, so guest-to-host communication never touches the network at all. Each device is a small, boring translator, and the interesting behavior — isolation, CoW, NAT — lives in host primitives that Linux has hardened for decades.
Configuring devices: the Firecracker API
You wire these devices up through Firecracker's REST-over-Unix-socket API before booting the guest. Drives and network interfaces are configured as arrays — you PUT each one — and the shape of the config makes the file-and-TAP mapping explicit. A drive is literally a path to a host file plus an ID; a network interface is literally the name of a host TAP plus a guest MAC.
// Two PUTs to the Firecracker API socket, before InstanceStart.
// Each device is a thin pointer at a host resource.
// PUT /drives/rootfs — virtio-block backed by a host file
{
"drive_id": "rootfs",
"path_on_host": "/var/lib/pandastack/vms/demo/rootfs.ext4",
"is_root_device": true,
"is_read_only": false
}
// PUT /network-interfaces/eth0 — virtio-net backed by a host TAP
{
"iface_id": "eth0",
"host_dev_name": "tap0",
"guest_mac": "06:00:0a:c8:00:02"
}Notice what isn't here: no device model selection, no bus topology, no firmware, no slot assignment. There's nothing to choose because there's almost nothing to configure — a block device is a file path, a NIC is a TAP name. The vsock device is the same shape (a guest CID plus a host Unix socket path). That minimalism in the API is the same minimalism in the emulator: few devices, each described by a couple of fields pointing at a host primitive.
Fewer devices, fewer CVEs
Now the payoff. If the host attack surface is the sum of the device model, and the device model is six small emulators each translating a virtqueue into a file, a TAP, or a socket, then the surface a guest can attack is tiny — and what's left is concentrated in a handful of well-trodden code paths that get scrutiny precisely because they're the boundary. There's no graphics emulator to have a heap overflow in, no USB stack to confuse, no SCSI controller with a forgotten state machine. The classes of VM-escape CVE that have hit general-purpose VMMs over the years largely target devices Firecracker doesn't have.
Firecracker pairs the small device model with a second layer: a seccomp filter that restricts the VMM process itself to a minimal allowlist of host syscalls, so even a hypothetical bug in a device emulator runs into a locked-down process rather than a full-privilege one. Small device model first, then a tight syscall jail around it. (We cover that layer in /blog/seccomp-explained.) The two reinforce each other — fewer parsers to break into, and a smaller blast radius if you do.
Why this matters for multi-tenant untrusted code
Put the pieces together and the case is straightforward. When you run code from many tenants — AI agents executing model-written commands, per-user code interpreters, ephemeral CI for arbitrary repos — the thing you're betting on is the boundary between guest and host. A shared-kernel container bets that boundary on the entire Linux syscall surface. A general-purpose VM shrinks it to the hypervisor, but then re-inflates it with a wide device tree. Firecracker keeps both halves small: hardware-isolated by KVM, and exposed to the guest through only a handful of minimal virtio devices. The narrower that interface, the fewer ways an untrusted guest has to reach the host, and the easier it is to reason about — and audit — what's actually exposed.
This is the substrate PandaStack runs on. Every sandbox, managed database, and hosted app is its own Firecracker microVM with exactly this device model — a virtio-block disk backed by a copy-on-write file, a virtio-net NIC wired to a per-sandbox TAP in its own network namespace, and a vsock channel to the host agent. The minimal device model is what makes it defensible to hand a fresh microVM to untrusted code on every request, and snapshot-restore is what makes it cheap — a create lands at 179ms p50 (around 49ms for the restore itself). Small surface, fast boot, real isolation. PandaStack's core is open source under Apache-2.0, so you can stand up the agent on your own KVM hosts and inspect exactly which devices each guest gets. For the broader argument about why a microVM beats a container for this, start at /blog/firecracker-vs-docker; for the snapshot path, /blog/how-firecracker-boots-fast.
Frequently asked questions
What devices does Firecracker emulate?
A deliberately small set, almost all of them virtio: virtio-block (disk), virtio-net (network), virtio-vsock (guest-to-host messaging), virtio-rng (entropy), a serial console over a legacy UART, and a one-button keyboard controller used only to deliver a CTRL+ALT+DEL for orderly reboot/shutdown. There's no emulated graphics, sound, USB, PCI bus, BIOS, or legacy storage controller. That short list is the entire host-facing attack surface a guest can interact with, which is the point — fewer device emulators means fewer places a bug can let a guest reach the host.
What is a virtqueue and how does virtio work?
Virtio is a paravirtualized device standard: instead of emulating a real hardware chip register-by-register (slow, lots of traps), the host exposes a device the guest knows is virtual, and they cooperate through shared-memory ring buffers called virtqueues. A split virtqueue has three parts — a descriptor table pointing at guest-memory buffers, an available ring where the guest publishes work, and a used ring where the host publishes completions. The guest fills a descriptor, posts it to the available ring, and sends one notification; the host processes it and posts the result to the used ring with one interrupt. It's far fewer context switches than emulating real silicon, and the emulator stays small because it only implements the virtio contract.
Why is Firecracker's device model so much smaller than QEMU's?
QEMU is general-purpose: it emulates hundreds of devices (multiple NIC models, SCSI/IDE/NVMe, VGA/virtio-gpu graphics, sound, USB, a full PCI bus, BIOS/UEFI) so it can boot almost any OS and emulate almost any hardware. Firecracker targets one job — running serverless functions and sandboxes — so it emulates only what that job needs: a disk, a NIC, a control channel, entropy, and a console. Breadth and a small trusted boundary are in tension, and Firecracker picks the small boundary. The trade is that Firecracker can't run arbitrary hardware setups, which is fine for its workload. (Verify QEMU specifics against the QEMU documentation, as its device set evolves.)
How does a smaller device model reduce VM-escape vulnerabilities?
A VM's host attack surface is essentially the sum of its device emulators, because each one parses input the untrusted guest controls — descriptors, offsets, buffer contents. Most historical VM-escape CVEs are device-emulation bugs (in graphics, USB, audio, or storage controllers). Firecracker doesn't emulate those devices at all, so that entire class of bug has nowhere to land. What remains — the virtqueue-handling code that validates guest-supplied addresses and lengths — is small enough to audit closely, and Firecracker wraps the whole VMM process in a seccomp syscall filter as a second layer. Fewer parsers plus a tight syscall jail is structurally safer than a large device tree.
How do Firecracker's devices map to host resources?
Each virtio device is a thin adapter onto an ordinary host primitive. virtio-block is backed by a regular file on the host (reads/writes become pread/pwrite on that file), which is why copy-on-write rootfs is just a reflink. virtio-net is backed by a host TAP device, so frames the guest sends surface on the TAP and the host's normal Linux stack (namespaces, routing, NAT) handles policy. virtio-vsock is bridged to a host Unix domain socket, giving a guest-to-host control channel that never touches the network. You configure them through Firecracker's API as arrays — a drive is a host file path plus an ID, a network interface is a host TAP name plus a guest MAC.
49ms p50 cold start. Fork, snapshot, and scale to zero.