MicroVM Density: The Economics of Per-Tenant Isolation
If you've ever wanted to give every tenant — or every request — its own hardware-isolated VM, you've probably run into the same objection from whoever owns the cloud bill: "a VM per user? do you know how much RAM that costs?" For most of cloud history that objection was correct. Classic virtual machines were heavy enough that handing one to every tenant was a luxury you reserved for paying enterprise customers, not something you did per request. The interesting thing about microVMs isn't only that they're more secure than containers — it's that they quietly rewrote the cost model that made that objection true. This post is about that rewrite: what actually drives the per-tenant cost of VM isolation, why Firecracker changes the arithmetic, and where the real ceiling sits once it does.
Why classic VMs were too heavy to hand out
A traditional VM — the kind a full hypervisor boots — carries a lot of baggage. It emulates a fairly complete machine: a BIOS or UEFI firmware, a wide set of virtual devices (graphics, USB, sound, multiple disk and network controllers), and a general-purpose guest OS that expects all of it. That generality costs you on three axes at once, and all three are exactly the axes that decide density.
- Memory: a general-purpose VM reserves a meaningful chunk of RAM just to exist — guest kernel, drivers for devices it'll never use, a desktop-grade init. Multiply that fixed overhead by thousands of tenants and you're paying for a lot of nothing.
- Boot time: full firmware init plus a complete OS boot is seconds to tens of seconds. If creating an isolated environment is that slow, you can't make one per request — you're forced to keep a pool of them warm, which means paying for idle.
- Device surface: every emulated device is code, attack surface, and per-VM bookkeeping. More surface, more overhead, fewer guests you can responsibly pack onto one host.
So the classic answer was to amortize: one big VM per tenant that stays up for weeks, or one shared multi-tenant process with namespaces papered over it. Per-tenant, per-session, or per-request VM isolation simply wasn't on the menu, because the fixed cost per VM was too high to multiply by your user count. That's the constraint microVMs attack.
What Firecracker changes about the math
Firecracker is a Virtual Machine Monitor built for exactly this problem — it was written at AWS to run Lambda and Fargate, where the whole business depends on packing enormous numbers of isolated, untrusted workloads onto shared fleets cheaply. It gets there by throwing out the generality. A Firecracker microVM has a minimal device model: a virtio block device, a virtio net device, vsock, a serial console, and not much else. No emulated BIOS, no PCI, no graphics. The guest still gets its own kernel and hardware-virtualized isolation via KVM, but the fixed cost of having that kernel drops to a few megabytes of VMM-side overhead per guest rather than the hundreds you'd budget for a fat VM.
Two numbers move, and they're the two that gated density before. The per-guest memory floor collapses, so the same host holds far more guests before it runs out of RAM. And boot collapses too — a stripped device model and a tiny kernel mean a microVM can start in milliseconds rather than seconds. That second number is the sneaky one, because boot speed isn't just a latency win; it's a cost lever, and it's the hinge of the whole argument below.
Overcommit and copy-on-write: paying for RAM you actually touch
Density isn't only about the per-guest floor; it's about whether guests have to each own a private copy of memory they mostly share. This is where copy-on-write does the heavy lifting, and it's the part people underestimate. When you restore many microVMs from the same baked template snapshot, their memory images start out identical — same kernel, same booted userland, same warmed-up runtime. There's no reason to give each one a private physical copy of pages none of them have modified yet.
So you don't. Memory is mapped copy-on-write (MAP_PRIVATE on the snapshot): every restored guest reads the shared template pages, and the kernel only allocates a private physical page at the moment a guest writes to one. A hundred sandboxes restored from the same template don't cost a hundred times the template's RAM up front — they cost the shared image plus whatever each guest has actually dirtied since it started. The disk side works the same way: the rootfs is reflinked (XFS reflink, an O(metadata) clone) so the data blocks are shared until written. You provision for the working set, not the nominal sum of every guest's reserved size. That gap between "reserved" and "actually touched" is precisely the room overcommit lives in, and it's how a host carries more guests than a naive RAM ÷ guest-size division would predict.
Forking leans on the same mechanism even harder. A same-host fork in PandaStack lands in 400–750ms because it maps the parent's guest memory copy-on-write and reflinks its rootfs — the child shares the parent's pages and data blocks until it diverges, so branching a warm state costs metadata, not a fresh boot and a fresh gigabyte. Cross-host forks run 1.2–3.5s because the memory and disk have to travel over the network first; the copy-on-write win is strongest when the parent's bytes are already sitting on the same machine.
The idle problem, and the snapshot-and-delete answer
Idle is where most multi-tenant compute bills quietly hemorrhage. An agent creates an environment, runs one command, then waits — on a model call, on a human, on a queue. If your isolation primitive is slow to create, you can't afford to tear it down during that wait, so it sits there warm, reserving CPU and RAM, billing you for the privilege of waiting. Across thousands of mostly-idle tenants, the idle tax can dwarf the cost of the work itself.
Fast snapshot-restore dissolves this. PandaStack has no warm pool of idle VMs — every create restores a baked Firecracker snapshot at p50 179ms (~203ms p99), with the snapshot-load step around 49ms and only the first spawn of a fresh template paying the ~3s cold boot before baking the snapshot for everyone after. When create is that cheap, the economically correct move is to snapshot-and-delete: capture the state, destroy the VM, free the host, and recreate in under 200ms when the next request lands. Idle goes from a per-tenant standing charge to roughly free, because there's nothing alive to charge for. The thing that made this strategy viable is, again, the boot number — slow boot forecloses it entirely.
This is what "pay for what you touch" looks like in code. Give the sandbox a short TTL so it reaps itself the moment the work is done, instead of lingering on the meter:
from pandastack import Sandbox
# Ephemeral by design: spin up, do the work, let it reap itself.
# No warm pool to amortize, no idle box on the meter afterward.
with Sandbox.create(template="code-interpreter", ttl_seconds=120) as sbx:
out = sbx.exec("python -c 'print(sum(range(1000)))'")
print(out.stdout)
# Sandbox is gone; the host RAM/CPU is freed. Next request? ~179ms to recreate.Classic VM vs microVM vs container, on the axes that decide cost
Lining the three models up on density, idle cost, and isolation makes the trade-off legible. None of these is strictly best; they sit at different points on the cost-vs-isolation curve, and microVMs exist to occupy the middle that used to be empty.
- Classic VM — Density: low (high fixed RAM and device overhead per guest). Idle cost: high (slow boot forces warm pools). Isolation: strong (own kernel, hardware-virtualized). The historically expensive way to get real isolation.
- Container — Density: very high (no second kernel, share the host's). Idle cost: low (near-instant start). Isolation: weak for untrusted code (shared host kernel — one kernel bug or escape reaches every neighbor). Cheap, but not a security boundary you'd bet a stranger's code against.
- MicroVM (Firecracker) — Density: high (few-MB per-guest floor, copy-on-write shared template memory). Idle cost: near-zero (millisecond restore enables snapshot-and-delete, no warm pool). Isolation: strong (own guest kernel, KVM hardware isolation). The point of the exercise: container-class density economics with VM-class isolation.
The honest caveat: a microVM will never be quite as dense as a bare container, because a separate guest kernel is never literally free. The claim isn't that microVMs match containers on raw packing — it's that they get close enough that the isolation upgrade stops being a luxury. You're no longer choosing between "cheap and shared-kernel" and "safe and unaffordable." The middle exists now.
The real binding constraint: memory and CPU, not network slots
A common assumption is that per-sandbox networking is what caps density — every isolated VM needs its own network namespace, virtual interfaces, and routing, and surely that's the bottleneck. In PandaStack it isn't. Each agent pre-allocates 16,384 /30 subnets (a dedicated netns, veth pair, and tap per sandbox), so the network plumbing is provisioned ahead of time and an allocation is essentially patching a MAC, not building a namespace from scratch. That 16,384 figure is a real ceiling, but you will almost never reach it.
You run out of memory and CPU first. A host's RAM and core count is what actually decides how many concurrently-running guests fit, because that's the resource each live guest genuinely consumes — copy-on-write shrinks it, but it doesn't make it zero. So the practical capacity-planning question is never "how many network slots do I have," it's "what's my working-set memory per active guest, and how many of those fit in host RAM after overcommit." The network slots are sized generously precisely so they never become the thing you tune. This is good news for the economics: it means your density is governed by the resource you can actually shrink with copy-on-write and snapshot-and-delete, not by a hard structural cap you can't engineer around.
So is per-tenant VM isolation too expensive?
The objection we started with — "a VM per user costs too much RAM" — was an artifact of a specific implementation, not a law of nature. It was true of fat, slow, generality-laden VMs because their fixed per-guest cost was high and their boot was slow enough to force warm pools. Strip the device model down, share template memory copy-on-write, and make restore fast enough that idle environments can simply not exist, and every term in that cost equation shrinks: lower per-guest floor, shared rather than duplicated memory, near-zero idle. What's left is a model where giving every tenant — or every request — its own hardware-isolated VM is a normal engineering choice, not an extravagance you have to justify to finance.
That's the whole bet behind PandaStack: snapshot-restore on every create with no warm pool, copy-on-write memory and disk, and per-sandbox networking pre-provisioned so it never bottlenecks — so VM-grade isolation costs you close to what a container would, and you stop having to choose between safe and affordable. For the mechanics, see /docs/internals/snapshot-restore for the boot path, /docs/internals/fork-cow and /docs/concepts/snapshots-and-forks for copy-on-write forking, and /docs/concepts/networking-natid for the pre-allocated network pool. If you're weighing the cost side specifically, /blog/e2b-cost-at-scale walks the hosted-vs-self-host break-even, and /blog/firecracker-vs-docker covers the isolation side of the same trade.
Frequently asked questions
How many microVMs can you run per host?
It depends almost entirely on host memory and CPU, not on a fixed cap. A Firecracker microVM adds only a few megabytes of per-guest overhead, and when many guests restore from the same template their memory is shared copy-on-write — so a host carries far more guests than a naive (RAM ÷ guest-size) division suggests, because you provision for each guest's actual working set, not its nominal reserved size. In PandaStack the network layer pre-allocates 16,384 per-sandbox subnets per agent, but that ceiling is generous enough that memory and CPU bind first. The only reliable density number is the one you measure against your own workload's working set.
Why were classic VMs too expensive to give every tenant one?
A traditional VM emulates a fairly complete machine — firmware, a wide device set, a general-purpose guest OS — so it reserves a large fixed chunk of RAM just to exist and boots in seconds to tens of seconds. Multiplied across thousands of tenants, that fixed per-guest overhead made per-user VM isolation a luxury, and the slow boot forced you to keep VMs warm rather than create them on demand. Firecracker microVMs strip the device model down and boot in milliseconds, collapsing both the memory floor and the boot time that made the classic model unaffordable.
How does copy-on-write reduce real memory use across many sandboxes?
When microVMs restore from the same baked template snapshot, their memory images start identical. The snapshot is mapped copy-on-write (MAP_PRIVATE), so every guest reads the shared template pages and the kernel only allocates a private physical page when a guest actually writes to one. A hundred sandboxes from one template therefore cost the shared image plus whatever each has dirtied since start — not a hundred private copies. The rootfs disk works the same way via XFS reflink, sharing data blocks until written. This is why real RAM use tracks the working set, not the sum of reserved sizes.
What is the idle cost of a microVM sandbox?
Near zero, if you snapshot-and-delete instead of keeping a warm pool. PandaStack has no pool of idle VMs — every create restores a baked snapshot at p50 179ms (~203ms p99), so the economically correct move is to destroy a sandbox the moment its work finishes and recreate it on the next request in under 200ms. That turns idle from a per-tenant standing charge into roughly nothing, because there's no live VM to bill for. Slow-booting VMs can't do this — they have to stay warm to hide the latency, which is exactly the idle cost microVMs eliminate.
Is per-sandbox networking the bottleneck for density?
No. In PandaStack each agent pre-allocates 16,384 /30 subnets, each with its own network namespace, veth pair, and tap device, so allocating networking for a new sandbox is essentially patching a MAC rather than building a namespace from scratch. That 16,384 figure is a hard ceiling, but in practice host memory and CPU run out long before network slots do. The binding constraint on how many guests fit is the resource each live guest genuinely consumes — RAM and cores — which copy-on-write and snapshot-and-delete are designed to shrink.
49ms p50 cold start. Fork, snapshot, and scale to zero.