all posts

Reproducible Builds in Disposable MicroVMs

Ajay Kumar··10 min read

"Works on my machine" is the oldest joke in software, and the punchline is always the same: your machine is dirty. So is your CI runner. A build is supposed to be a pure function — same source in, same artifact out — but the moment you run it on a long-lived host, the environment becomes a hidden input. Yesterday's dependency cache, a leftover file in /tmp, a global config some previous job mutated: all of it silently feeds into today's build. The fix is structural, not procedural. Instead of scrubbing a shared runner and hoping you got everything, you give every build its own fresh Firecracker microVM, born from an identical baked template, and throw it away when the build finishes. On PandaStack that fresh VM costs about 179ms to create (p50), so "clean every time" stops being a tradeoff against speed.

Why does the same source produce different builds?

A long-lived CI runner is a shared apartment where every previous tenant left something in the fridge. The runner survives across jobs because that's convenient — the dependency cache is warm, the toolchain is already installed, the next build starts fast. That exact convenience is the bug. The machine accumulates state, and state that isn't part of your source tree is, by definition, an input you didn't declare. When the same commit builds green on Monday and red on Thursday with no code change, you're not debugging your code. You're debugging the residue.

Reproducibility — the property that the same inputs always yield the same output — is impossible to guarantee when the environment is mutable and shared. You can get close with lockfiles and pinned base images, but a persistent runner undermines all of it, because the runner's filesystem and process table are mutable inputs nobody pinned. Hermeticity is the stronger property you actually want: a build that depends only on its explicitly declared inputs and nothing from the surrounding host. You cannot bolt hermeticity onto a dirty machine. You have to start clean and stay isolated.

What actually leaks between non-disposable runners?

If you've never audited a persistent runner, the surface area is larger than you'd guess. Every one of these is a real, declared-nowhere input that can change a build's output:

  • Cached dependencies: a stale node_modules, a partially-populated ~/.m2 or ~/.cargo registry, a pip wheel cache holding a version your lockfile no longer pins. The build resolves against the cache and gets an artifact your fresh checkout would never produce.
  • Leftover files: build outputs from job N still sitting in the working tree for job N+1, a half-written file in /tmp, a lockfile a crashed job never released. Incremental build tools happily reuse them and skip work that should have re-run.
  • Mutated global config: a previous job ran 'git config --global', set a default npm registry, dropped a ~/.netrc, or exported an env var into a shell profile. Now every later build inherits a setting nobody chose for it.
  • Installed-but-unpinned tooling: a job did 'npm i -g' or 'pip install --user' and left a binary on PATH. Builds that 'work' because that tool happens to be present break the day you provision a clean replacement runner.
  • Background processes and ports: a test spun up a database or dev server and didn't reap it. The next job's integration tests connect to a stale process holding old data, and pass for the wrong reason.
  • Clock, locale, and filesystem ordering: timestamps baked into artifacts, a locale that changes sort order, directory-read order that differs from the last host. Subtle, but enough to break byte-for-byte reproducibility.
The standard mitigation is a cleanup script that runs between jobs — delete caches, kill processes, reset the working tree. It is slow, and it's only ever as good as the last engineer who remembered to add a line to it. The first thing the cleanup script forgets to reset is the bug you ship.

How a fresh microVM gives you a known-clean start

A microVM is a full virtual machine — its own guest kernel, its own filesystem, its own network namespace — that boots in milliseconds instead of the tens of seconds a traditional VM takes. (If the term is new, our explainer on what a microVM is covers the mechanics: /blog/what-is-a-microvm.) The reason it fixes reproducibility isn't speed, though. It's that the starting state is defined by a template, not by history.

On PandaStack you bake a template once — Ubuntu 24.04, your language toolchains, your pinned build tools — and freeze it into a Firecracker snapshot. Every build creates a sandbox by restoring that exact snapshot. There is no "previous build" for the new VM to inherit from, because the VM didn't exist a moment ago and won't exist a moment after. Two builds of the same commit, run a week apart on different physical hosts, restore byte-identical baked state. That's the hermetic starting line you can't get from scrubbing a shared host: not "cleaned back to clean," but "never dirty in the first place."

The disk side is just as important as the memory side. PandaStack's rootfs is a copy-on-write XFS reflink clone of the template (the mechanics are in /blog/copy-on-write-rootfs). Your build writes into its own private copy; the underlying template blocks are shared and read-only until the moment you touch them. So the build can churn the filesystem as hard as it likes — install packages, generate artifacts, trash the working tree — and none of it propagates back to the template or sideways to any other build. When the sandbox is deleted, every byte it wrote goes with it. The only thing that escapes is what you deliberately read out: the artifact, the logs, the test report.

Hermetic-by-construction beats hermetic-by-cleanup. With a disposable VM you don't need defensive 'reset everything' steps between builds, because there is no shared state to reset. If a step corrupts the environment, it corrupts a VM that's about to be deleted anyway.

A build script is just untrusted code that happens to compile

Here's the part most CI setups underweight. The moment you build third-party code — an open-source dependency you vendored, a pull request from a fork, a contractor's branch — you are executing code you did not write and did not review. And a build is not a passive operation. 'npm install' runs postinstall scripts. A Makefile runs arbitrary shell. 'setup.py' is Python that executes at install time. A 'build.rs' is a program. The line between 'building untrusted code' and 'running untrusted code' does not exist. A build script is just untrusted code that happens to produce an artifact on its way to reading your environment variables.

On a shared, persistent runner that's a genuine supply-chain hole. A malicious fork PR's build step can read cached credentials, poison the dependency cache for the next job, exfiltrate whatever secrets the runner has in its environment, or — with a kernel or container-escape bug — reach the host itself. Containers don't close this, because containers share the host kernel; one kernel bug and the boundary is gone. (We get into why containers aren't an isolation boundary in /blog/firecracker-vs-docker.) A Firecracker microVM is a different category: each build runs behind a hardware virtualization boundary with its own guest kernel, the same isolation model AWS Lambda and Fargate are built on. PandaStack also isolates the network per-sandbox and supports egress control, so a build step can't quietly phone home if you don't let it.

Build fork-PR code as if it's hostile, because it is — anyone on the internet can open a PR. Run the clone and every build step inside the microVM, never on the orchestrator host. Pin the checkout to the immutable commit SHA from the webhook payload, not a branch name a contributor can force-push after your approval gate. Don't bake deploy keys or signing secrets into the build snapshot; give the build only what it needs to compile and test, and nothing it could steal.

The pattern: create, clone, build, read the exit code, destroy

The control loop is the same whatever triggers it — a webhook, a queue worker, a CI step. Create a sandbox from your baked template, get the untrusted source in, run the build, and let the exit codes drive pass/fail. Each exec returns stdout, stderr, and an exit_code, so the orchestrator never has to trust the build's self-report — it reads the process result directly. Set PANDASTACK_API_KEY in the environment first.

from pandastack import Sandbox

REPO = "https://github.com/acme/widget.git"
PR_SHA = "a1b2c3d4"  # the exact commit from the fork PR event, never a branch name

# Fresh, hardware-isolated build VM restored from a baked template snapshot.
# Identical clean starting state every time — no leaked caches, no stale config.
with Sandbox.create(template="base", ttl_seconds=900) as sbx:
    # Clone runs INSIDE the microVM, so even a malicious repo hook is contained
    # to this disposable VM. Pin to the immutable SHA, not a mutable branch.
    clone = sbx.exec(
        f"git clone --depth 1 {REPO} /work && "
        f"cd /work && git fetch --depth 1 origin {PR_SHA} && git checkout {PR_SHA}"
    )
    assert clone.exit_code == 0, clone.stderr

    # The build itself is untrusted code: postinstall scripts, Makefiles, build.rs
    # all execute here, sandboxed. Exit codes drive pass/fail — no self-report.
    install = sbx.exec("cd /work && npm ci", timeout_seconds=300)
    build   = sbx.exec("cd /work && make build", timeout_seconds=600)

    ok = install.exit_code == 0 and build.exit_code == 0
    if ok:
        # Read out only what you trust: the artifact, deliberately, by path.
        artifact = sbx.filesystem.read("/work/dist/widget.tar.gz")  # bytes
        print(f"built {len(artifact)} bytes")
    else:
        print("FAIL\n", (build.stderr or install.stderr)[-2000:])
# VM is destroyed here — every byte it wrote dies with it.

Nothing about this loop needs cleanup between steps. If a build step poisons the cache or leaves a daemon running, it does so inside a VM that ceases to exist on the next line. The ttl_seconds is a hard backstop: a build that tries to hang or mine resources self-destructs on the clock. For a fuller treatment of running one disposable VM per job — including streaming live build logs — see /blog/ephemeral-ci-runners.

Warm the toolchain once, fork per build

Reproducibility usually fights speed: the cleanest environment is the one you build from scratch every time, and building from scratch is slow. The snapshot-and-fork model dissolves that fight. You do the expensive, identical setup once — clone the repo, install dependencies, warm the toolchain — then snapshot that prepared VM. Every subsequent build forks from the snapshot instead of redoing the install.

from pandastack import Sandbox

# One-time (or whenever the lockfile changes): bake a prepared base.
base = ps.sandboxes.create(template="base", ttl_seconds=600)
base.exec("git clone --depth 1 https://github.com/acme/widget.git /work")
base.exec("cd /work && npm ci")     # the slow, deterministic install — done once
snap = base.snapshot()               # freeze filesystem + memory
base.delete()

# Now every build forks this snapshot: deps already present, identical state.
# A same-host fork is ~400-750ms and shares memory copy-on-write, so the Nth
# build doesn't recopy gigabytes of node_modules — it shares the baked pages.
build_vm = ps.sandboxes.fork(snap.id)
build_vm.exec("cd /work && git pull && make build")  # only the changed source
build_vm.delete()

Because the fork shares the snapshot's memory pages copy-on-write and reflinks the rootfs, every build starts from the same frozen, post-install state — and starts from it cheaply. The install ran once; the determinism is captured in the snapshot; each build inherits it identically. This is the inversion that makes per-build VMs practical: the slow part is amortized, the clean part is free. The full mechanics of snapshot and fork — what's captured, how restore works — are in /blog/snapshot-and-fork-explained. Treat the snapshot like a cache key: re-bake it when your lockfile changes, and the determinism follows.

There's one honest caveat. The very first time you boot a brand-new template before any snapshot exists, it's a cold boot of around 3 seconds while PandaStack auto-bakes the snapshot. After that, every create is the ~179ms restore path and every fork is the ~400-750ms same-host path. You pay the cold boot once per template, not once per build.

When this is overkill

Disposable microVM builds are not free of operational cost — you own scheduling, snapshot hygiene, and artifact plumbing. Be honest about whether the properties are worth it. If all your code is first-party and trusted, your build steps don't execute meaningful third-party code, and your existing container-based CI already produces consistent artifacts, the supply-chain argument mostly evaporates and a managed runner is less to operate. The case for a fresh microVM per build is strongest when you build untrusted code — fork PRs, vendored dependencies, anything you didn't write — when reproducibility is a hard requirement rather than a nice-to-have, or when you need a real kernel boundary for compliance or multi-tenancy. In those cases the snapshot model gives you the one thing a shared runner structurally cannot: a known-clean, hardware-isolated starting line for every single build, without the speed penalty that usually comes with it.

Frequently asked questions

What makes a build reproducible in a disposable microVM?

Every build creates a fresh Firecracker microVM by restoring an identical baked template snapshot, so the starting state is defined by the template rather than by whatever previous jobs left behind. There is no shared, mutable runner to leak cached dependencies, leftover files, or mutated global config into the build. The rootfs is a copy-on-write clone the build writes into privately, and the whole VM is destroyed afterward, so the environment is a declared input every time instead of an accumulation of history.

What state leaks between builds on a persistent CI runner?

A long-lived runner accumulates inputs nobody declared: stale dependency caches (node_modules, ~/.m2, pip wheels), leftover build outputs and temp files, mutated global config (git config, npm registry, ~/.netrc), unpinned globally-installed tools on PATH, orphaned background processes and open ports, and even clock/locale/filesystem-ordering differences. Any of these can change a build's output even when the source is identical. Cleanup scripts try to reset them between jobs but are only as good as the last line someone remembered to add.

Why is building untrusted code a security risk, not just running it?

A build is not passive. npm postinstall scripts, Makefiles, setup.py, and build.rs all execute arbitrary code during the build, so building a fork PR or a vendored dependency means running code you didn't write or review. On a shared runner that code can read cached credentials, poison the cache for the next job, or exfiltrate secrets in the runner's environment. Running each build in its own Firecracker microVM with a hardware virtualization boundary and per-sandbox network isolation contains a hostile build script to a disposable VM.

How do I make per-build VMs fast enough for CI?

PandaStack creates a sandbox in about 179ms at p50 (roughly 203ms p99) by restoring a baked snapshot on demand rather than cold-booting, so a fresh VM per build is cheap. To skip dependency install, bake a snapshot of a VM that already has your toolchain and dependencies installed, then fork it per build — a same-host fork is around 400-750ms and shares memory copy-on-write, so each build inherits the post-install state without recopying gigabytes. You only pay a one-time cold boot of about 3 seconds when a brand-new template is first baked.

When should I not use disposable microVMs for builds?

If all your code is first-party and trusted, your build steps don't run meaningful third-party code, and your existing container-based CI already produces consistent artifacts, a managed runner is less to operate — you don't own scheduling, snapshot hygiene, or artifact plumbing. Disposable microVM builds pay off when you build untrusted code like fork PRs and vendored dependencies, when byte-level reproducibility is a hard requirement, or when you need a real kernel isolation boundary for compliance or multi-tenancy.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.