all posts

Serverless on Firecracker: How FaaS Really Works

Ajay Kumar··10 min read

"Serverless" is a marketing word for an engineering reality: you hand a provider a function, and somewhere on hardware you'll never see, your code runs next to a stranger's code, both of you billed by the millisecond. The provider's entire job is to make that crowded room feel like a private one. Function-as-a-Service — Lambda, Cloud Functions, and everything shaped like them — is the productized version of one hard problem: run arbitrary code from thousands of tenants on shared machines, isolate it well enough that nobody can read or crash anybody else, start it fast enough that nobody notices the machine was empty a moment ago, and tear it down the instant it's done so you're not paying for idle. This post is about how that's actually built, why the answer turned out to be microVMs, and how PandaStack's serverless functions work on exactly that substrate.

Why a function needs a wall around it

Start with the uncomfortable premise. A FaaS platform is a multi-tenant code-execution service: it runs code it did not write, from customers it cannot vet, on hardware it shares across all of them to keep the economics sane. If you gave every function its own dedicated server, isolation would be trivial and the business would be bankrupt — most functions run for a few hundred milliseconds a day and would leave a whole machine idle the rest of the time. So you pack many tenants onto each host. The instant you do that, isolation stops being a nice-to-have and becomes the product.

What can go wrong if the wall is thin? A function can try to read another tenant's memory or files. It can saturate the CPU or balloon memory and starve its neighbors — the classic noisy-neighbor problem. Worst of all, it can attempt to break out of its container into the host kernel, and from there into every other tenant on the box. These aren't hypotheticals; container escapes are a recurring, well-documented class of vulnerability. The shared Linux kernel is a large, shared attack surface, and "shared" is doing a lot of dangerous work in that sentence.

So the real engineering question behind every serverless platform is: what's the isolation boundary? A plain process? A container? A full virtual machine? Each is a different trade between how strong the wall is and how fast and cheap it is to build one. Processes and containers are cheap and fast but share the host kernel. Traditional VMs give you a real boundary — a separate guest kernel, hardware-enforced — but historically booted in tens of seconds, which is a non-starter when a function is supposed to spin up in milliseconds and you might create millions of them an hour. For years that trade-off looked unwinnable. We cover the full spectrum in /blog/code-isolation-hierarchy, but the short version is: serverless needed VM-grade isolation at container-grade speed, and nothing on the shelf delivered both.

The defining constraint of FaaS isn't compute or storage — it's containment at speed. You're running many tenants' untrusted code on shared hardware, so the platform lives or dies on how strong its isolation boundary is and how fast it can stand a fresh one up.

How AWS Lambda pioneered Firecracker

AWS hit this wall harder than anyone, because Lambda's scale made every weakness expensive. Their first-generation isolation leaned on EC2 instances and containers, and the seams showed: to keep tenants apart safely they couldn't pack hosts as densely as the economics wanted. The team's conclusion was that they needed actual virtualization — a separate guest kernel per function, hardware-enforced — but without the multi-second boot and the heavyweight device emulation of a conventional hypervisor like QEMU. Nothing existing fit, so they built one.

That's Firecracker: a minimal virtual machine monitor written in Rust, open-sourced in 2018, purpose-built to run thousands of microVMs per host with strong isolation and fast startup. It strips the virtual hardware down to almost nothing — no BIOS or UEFI firmware, no PCI bus, no legacy device emulation, just a handful of virtio devices (network, block, vsock) and a serial console. A microVM is a real VM with its own guest kernel and hardware-enforced memory isolation, but with the boot-time bloat amputated. That combination — VM-grade boundary, container-grade footprint — is what made per-function VMs economically viable, and it's why Lambda and Fargate run on Firecracker today. If you want the deeper anatomy, /blog/what-is-a-microvm walks through exactly what a microVM is and isn't.

The serverless insight wasn't "make containers safer" — it was "make VMs cheap enough to use one per function." Firecracker is what made that affordable.

The reason this matters beyond AWS is that Firecracker is open source under Apache-2.0. The same VMM that backs Lambda is something you can run yourself — which is exactly what PandaStack does. There's no proprietary magic in the isolation layer; the magic is in the orchestration around it.

The cold-start problem, and why it's the whole game

Solving isolation creates a new problem. If every function invocation gets a fresh, isolated environment, something has to create that environment — and if it isn't already running, you pay to start it. That's a cold start: the latency between a request arriving and your code actually executing, spent booting a guest kernel, bringing up userspace, loading your runtime and dependencies. For an interactive request, a multi-second cold start is the difference between "snappy" and "why is this spinner still here."

There are two broad strategies for dealing with it, and most platforms blend them. The first is to keep a warm pool: leave some environments running and idle so a request can land on one immediately. It hides the cold start at the cost of paying for capacity that's doing nothing — and it only helps if the right kind of warm environment happens to be sitting around when your request shows up. The second strategy is to make starting cold so fast that you don't need the pool at all. That second path is where microVM snapshots changed the math.

How snapshot-restore kills the cold start

A cold boot is slow because it does real work: the kernel initializes, userspace comes up, your runtime loads, your dependencies import. A snapshot skips all of it. Firecracker can serialize a fully-booted, running microVM to disk — the guest's entire RAM plus the VMM's device and vCPU state — and later restore that frozen machine and resume it mid-instruction. Restoring isn't booting. The guest doesn't run init, doesn't re-probe devices, doesn't start systemd; it wakes up exactly where it was frozen, page cache warm, processes already running.

Because restore is closer to "map a memory file and unpause the vCPUs" than "start a computer," it lands in tens of milliseconds instead of seconds. On PandaStack the steady-state create — a fresh, isolated microVM ready to take commands — runs at a p50 of 179ms and a p99 around 203ms, which is a snapshot restore (the restore-and-resume itself is the ~49ms core of that), not a kernel boot. The genuine cold boot happens exactly once per template, costs around 3 seconds, and is then baked into a snapshot and amortized away across every later create. The full pipeline, stage by stage, is in /blog/how-firecracker-boots-fast, and the engineering reference lives at /docs/internals/snapshot-restore. The headline, though, is simple: snapshot-restore is what lets you give every invocation a fresh VM without paying the boot tax — which is what makes a warm pool optional rather than mandatory.

Snapshot-restore reframes the cold-start fight. Instead of hiding boot latency behind idle warm capacity, you delete most of the boot. The once-per-template ~3s cold boot is paid once; every create after that is a restore in well under 200ms.

How PandaStack's serverless functions actually work

PandaStack is an open-source Firecracker platform — Apache-2.0, the same VMM as Lambda — with serverless functions built directly on the snapshot-restore substrate. The model is deliberately simple. You deploy a function by uploading a code bundle, which is stored in object storage (GCS). When the function is invoked, the platform restores a fresh microVM, drops your bundle in, runs it, returns the result, and tears the VM down. There is no warm pool of idle function VMs sitting around burning RAM between invocations — every invoke gets its own fresh microVM, and the platform overhead for standing one up is roughly 0.8 seconds. The isolation boundary is a real guest kernel per invocation, every time.

That fresh-VM-per-invoke model has a pleasant property: idle cost is essentially zero. A function that isn't being called isn't holding any compute. It also means each invocation is genuinely isolated from the last — no leaked global state, no neighbor's memory, no "did the previous tenant leave something behind" class of bug, because there is no previous tenant in your VM. Functions can run on a schedule too: PandaStack supports cron schedules, so a function can fire every five minutes or every midnight without you running anything to trigger it. Under the hood a scheduled run is the same fresh-microVM invoke, just kicked off by the cron engine instead of an HTTP request.

The same primitives are exposed directly through the sandbox API, which is the clearest way to see the execution model. Here's the create-run-teardown loop a function invocation is built on, using the Python SDK against a sandbox:

from pandastack import Sandbox

# Each call gets its own fresh, isolated microVM — restored from a baked
# snapshot, not cold-booted. This is the same primitive a function invoke
# is built on: create -> run -> tear down.
def invoke(payload: dict) -> dict:
    with Sandbox.create(template="code-interpreter", ttl_seconds=120) as sbx:
        # Drop the (model- or user-supplied) code into the guest, then run it.
        sbx.filesystem.write("/workspace/handler.py", payload["code"])
        result = sbx.exec("python3 /workspace/handler.py", timeout_seconds=30)
        return {
            "exit_code": result.exit_code,
            "stdout": result.stdout,
            "stderr": result.stderr,
        }
    # The sandbox — and everything the invocation touched — is destroyed on
    # block exit. No state survives to the next call. That's the isolation.

# print(invoke({"code": "print('hello from an isolated microVM')"}))

Notice the two safety rails in that snippet, because they're not decoration. The `timeout_seconds` on `exec` is a circuit breaker for code that loops forever; the `ttl_seconds` on create is a backstop so a VM you forget to reap kills itself. In a real function platform both are enforced for you, but the principle is the same: untrusted code gets a wall, a clock, and a hard cap, and then it gets thrown away.

Fresh-VM-per-invoke vs. warm pools: the honest trade-offs

Fresh-VM-per-invoke is a real architectural choice, not a free lunch, and it's worth being clear about what you gain and what you give up versus the warm-pool approach most legacy platforms use:

  • Idle cost is near zero. A function nobody is calling holds no compute. Warm pools, by contrast, pay continuously for capacity that's sitting idle so it can answer fast — you're renting an empty room just in case someone walks in.
  • Isolation is per-invocation by construction. Every call gets a clean guest kernel and filesystem; nothing leaks from the previous run because there is no previous run in that VM. Warm-pool reuse risks cross-invocation state bleed if the recycling isn't careful.
  • No capacity guesswork. There's no warm-pool size to tune, no scramble when traffic spikes past the pre-warmed count and suddenly everyone hits a cold start anyway. Each invoke stands up its own VM on demand.
  • The cost is a per-invoke floor. You pay the restore + setup overhead on every call (~0.8s platform overhead on PandaStack), where a warm-pool hit can be near-instant. For latency-critical, high-frequency paths, that floor is the trade you're accepting in exchange for zero idle cost and clean isolation.
  • Snapshot-restore is what makes the trade viable at all. Without sub-200ms restore, fresh-VM-per-invoke would mean multi-second cold starts on every call and the warm pool would be mandatory. Restore is the lever that turns "always cold" into "cold but fast enough."

The right answer is workload-dependent. A latency-critical, constantly-hammered endpoint may genuinely want warm capacity. A bursty, spiky, or long-tail workload — the shape most serverless traffic actually has — is far better served by paying a small per-invoke floor and nothing at all in between. If you're evaluating where this fits among general-purpose execution backends, /blog/best-code-execution-sandboxes lays out the landscape, and the same fresh-VM economics are what make PandaStack a natural fit for /blog/ephemeral-ci-runners, where every job wants a clean machine and nobody wants to pay for idle runners overnight.

Serverless removes the floor on cost but not the ceiling. Because invocations scale automatically and you pay per call, a runaway loop, a retry storm, or a recursive function that invokes itself can quietly multiply into a "denial-of-wallet" event — your bill, not your uptime, is what fails. Always cap concurrency, set per-invocation timeouts, and put hard limits on anything that can trigger invocations (webhooks, schedules, self-calls) before you let it loose.

The bigger picture: it's all the same substrate

Once you see serverless as "restore a fresh isolated microVM, run code, throw it away," a lot of adjacent products turn out to be the same machine wearing different clothes. A code sandbox for an AI agent is a microVM you keep around for a session. A serverless function is a microVM you create per invoke. A CI runner is a microVM per job. A managed Postgres database is a microVM with a durable volume that doesn't get thrown away. App hosting is a long-lived microVM behind a stable URL. PandaStack runs all of these on one substrate — sandboxes, serverless functions with cron, managed Postgres, and git-driven app hosting — because the hard part was never the product surface. It was building VM-grade isolation that creates in 179ms, and then pointing it at different jobs.

That's the real story of serverless. It looks like "no servers," but underneath it's a very precise answer to a very old question: how do you safely run code you don't trust, fast, on a machine you share with strangers? AWS answered it by building Firecracker. The fact that Firecracker is open source means the same answer is available to everyone — and snapshot-restore is the piece that turns "isolated" into "isolated and fast enough to do it on every single call." If you want to dig into the mechanics that make it fast, start with /docs/internals/snapshot-restore.

Frequently asked questions

How does AWS Lambda work under the hood?

Lambda runs each function inside a Firecracker microVM — a lightweight virtual machine with its own guest kernel and hardware-enforced isolation. AWS built Firecracker specifically because they needed VM-grade isolation between tenants without the multi-second boot and heavy device emulation of a traditional hypervisor. When an invocation arrives, the platform places it in a microVM (creating or reusing one), runs your handler, and the environment is isolated from every other tenant on the host by a real virtualization boundary, not just a shared kernel.

Why is Firecracker used for serverless instead of containers?

Containers share the host's Linux kernel, which is a large shared attack surface — container escapes are a recurring class of vulnerability, and that's unacceptable when you're running thousands of untrusted tenants on one host. A Firecracker microVM gives each function its own guest kernel with hardware-enforced memory isolation, so a compromise is contained to one disposable VM. Firecracker strips the VM down (no firmware, minimal virtio devices) so you get that VM-grade boundary at container-grade footprint and startup speed — which is the combination serverless needs.

What causes serverless cold starts and how are they reduced?

A cold start is the latency spent standing up a fresh isolated environment before your code runs — booting a kernel, bringing up userspace, loading your runtime and dependencies. There are two ways to fight it: keep a warm pool of idle environments running (which costs money for idle capacity), or make starting cold fast enough that the pool isn't needed. MicroVM snapshot-restore takes the second path: a fully-booted VM is frozen to disk once, and every later create restores that frozen memory and device state and resumes — skipping boot entirely. On PandaStack that restore-based create is 179ms p50, versus a ~3s genuine cold boot paid once per template.

How do PandaStack serverless functions work?

You upload a code bundle, which is stored in object storage (GCS). On each invocation — whether triggered by an HTTP call or a cron schedule — PandaStack restores a fresh Firecracker microVM, runs your function, returns the result, and tears the VM down. There is no warm pool of idle function VMs; every invoke gets its own isolated microVM with roughly 0.8s of platform overhead. The upside is near-zero idle cost and per-invocation isolation by construction; the trade is a per-call setup floor, which is acceptable for the bursty, spiky workloads serverless typically handles.

What is the denial-of-wallet risk in serverless?

Because serverless scales automatically and bills per invocation, the failure mode shifts from downtime to cost. A runaway loop, an aggressive retry storm, a webhook that fires in a tight cycle, or a function that recursively invokes itself can multiply into millions of invocations and an enormous bill before anything visibly breaks — your wallet fails before your uptime does. Mitigate it by capping concurrency, enforcing per-invocation timeouts, and putting hard limits on every trigger source (HTTP, schedules, self-invocation) so unbounded invocations can't quietly compound.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.