Isolating CI/CD Build Steps in MicroVMs

Ajay Kumar·June 27, 2026·9 min read

Here is an uncomfortable definition of a pull request: it is code a stranger wrote that you have agreed to run on your infrastructure. We dress it up — "CI build," "test job," "lint step" — but the moment your pipeline checks out a branch and runs `make`, `npm ci`, or `pytest`, you are executing arbitrary code from whoever opened the PR. On most CI setups that code runs on a long-lived, shared build host, in the host's own kernel, next to your caches and your secrets. A malicious PR isn't an exotic attack; it's just you holding the door open and being surprised when someone walks in.

This post is about closing that door without slowing the pipeline to a crawl. The fix is isolation at the right layer: give each pipeline job — or each untrusted step within a job — its own Firecracker microVM with a fresh guest kernel, then destroy it when the job ends. I'm Ajay; I built PandaStack, which runs every sandbox as a Firecracker microVM, so I'll be concrete about the mechanics and honest about what isolation does and doesn't buy you.

Why a shared CI host is a soft target

Picture the common self-hosted setup: a beefy EC2 box or Kubernetes node runs a CI agent, and every job lands on it as a container or a bare process. Containers feel isolated, but a container is a Linux process with namespaces and cgroups running on the host's kernel. Every job on that machine shares one kernel — and that kernel is the security boundary you're betting other people's build scripts against.

The attack surface isn't theoretical. A build step is allowed to run anything: it installs dependencies, executes test code, runs codegen, shells out to `curl`. So a hostile (or merely compromised-upstream) PR can do a lot before any human reviews a diff:

Read secrets in the runner environment — registry tokens, cloud credentials, signing keys, the deploy token sitting in an env var "just for this job."
Poison a shared build cache — write a backdoored artifact into the Docker layer cache, the npm/pip cache, or a mounted ccache dir, so the next legitimate build (on main) silently consumes it.
Pivot via the host kernel — exploit a kernel or container-runtime bug to escape the container and own the runner, then every job that ever lands there.
Tamper with the next job — leave a cron, a poisoned PATH binary, or a modified toolchain on a runner that isn't wiped between jobs.
Exfiltrate quietly — the job has network egress by default, so anything it read can leave over plain HTTPS and look exactly like a package download.

The dangerous combination is shared kernel plus shared state plus ambient secrets. A fork PR's build script doesn't need a zero-day if your runner reuses caches and leaves credentials in the environment — it just needs to be the code you already agreed to run.

Managed CI providers mitigate the worst of this by giving you a clean VM per job and withholding secrets from fork PRs — and you should use those defaults. But self-hosted runners, which teams reach for to get bigger machines, GPUs, private-network access, or lower cost, frequently regress to the shared-host model. That's where the per-job-microVM pattern earns its keep.

A microVM per job: fresh kernel, then gone

A Firecracker microVM is not a container. It boots its own guest kernel under hardware virtualization (KVM) and can only talk to the outside world through a tiny set of emulated virtio devices. There is no shared host kernel to attack — an escape would have to defeat the hypervisor itself, a far smaller and more heavily audited surface than the full Linux syscall interface a container has. This is the same VMM AWS Lambda uses to run untrusted functions from millions of customers on shared fleets.

Mapped onto CI, the model is simple: each job gets a fresh microVM, runs its untrusted build inside it, and the VM is destroyed afterward. State cannot leak forward because there is no "next job" on the same machine — there's just a new VM restored from a known-clean snapshot. Caches and secrets aren't on the host the build can reach; you hand them in deliberately, scoped to the job, and they die with the VM.

The historical objection to "a VM per job" was startup cost — nobody wants to wait seconds to boot a VM for a 20-second lint. That objection is mostly gone. PandaStack creates a sandbox by restoring a baked Firecracker snapshot rather than cold-booting: p50 is around 179ms (p99 ~203ms), with the snapshot-restore step itself near 49ms. The first cold boot of a never-before-used template is ~3s, after which every create takes the fast path. At those numbers, per-job VM isolation costs you a rounding error, not a coffee break.

Shared runner vs. microVM per job, side by side

Isolation boundary — Shared runner: namespaces + cgroups on one shared host kernel. MicroVM per job: hardware-virtualized guest with its own kernel.
Blast radius of a malicious PR — Shared runner: the host and every later job on it. MicroVM per job: one disposable VM that's thrown away.
State between jobs — Shared runner: caches, tmp, toolchains, and tampering can persist. MicroVM per job: none — every job starts from a clean snapshot.
Secret exposure — Shared runner: ambient env vars any step can read. MicroVM per job: injected per-job and destroyed with the VM.
Cache poisoning — Shared runner: a poisoned shared cache feeds the next legitimate build. MicroVM per job: caches are per-VM or explicitly read-only, so writes don't survive.
Cleanup — Shared runner: "best-effort" wipe scripts you hope ran. MicroVM per job: the VM ceases to exist; cleanup is structural.
Startup cost — Shared runner: ~0 (it's already running). MicroVM per job: ~179ms p50 via snapshot-restore — small enough to do every job.
Reproducibility — Shared runner: drifts as jobs mutate the box. MicroVM per job: identical environment from the same baked image every time.

Wiring it into a pipeline

The pattern is provider-agnostic: your existing CI (GitHub Actions, GitLab CI, Buildkite, Jenkins) becomes a thin orchestrator whose only job is to hand the untrusted build off to a fresh microVM and report the result. The orchestrator step itself runs trusted, first-party code — it never executes the PR's build commands directly. Here's the shape of that handoff in a shell step:

#!/usr/bin/env bash
# Runs on the trusted CI runner. It does NOT build the PR itself —
# it spins up an isolated microVM, builds there, and reports back.
set -euo pipefail

PR_REPO="$1"          # e.g. https://github.com/acme/widget
PR_REF="$2"           # the untrusted branch / commit SHA

# Provision a fresh sandbox, run the build inside it, capture the exit code.
# Secrets are injected per-job by your orchestrator and never touch the PR's
# build steps on the host runner.
python3 ci/run_in_microvm.py \
  --repo "$PR_REPO" \
  --ref "$PR_REF" \
  --build 'npm ci && npm run build && npm test' \
  --timeout 600
# When this returns, the microVM is already destroyed.

The orchestrator that the shell step calls is where the isolation lives. With the PandaStack Python SDK it's a short script: create a sandbox on a build template, write the untrusted build command into the guest, exec it with a hard timeout, collect logs and exit code, then let the VM die. Because the sandbox is created fresh and destroyed after, there is no persistent runner state for the PR to corrupt.

# ci/run_in_microvm.py
import argparse, sys
from pandastack import Sandbox

p = argparse.ArgumentParser()
p.add_argument("--repo", required=True)
p.add_argument("--ref", required=True)
p.add_argument("--build", required=True)
p.add_argument("--timeout", type=int, default=600)
args = p.parse_args()

# A build script: clone the PR ref, then run the untrusted build. We avoid
# shell-quoting hazards by writing it to a file in the guest and running that.
build_script = f"""#!/usr/bin/env bash
set -euo pipefail
git clone --depth 1 {args.repo} /workspace/repo
cd /workspace/repo
git fetch origin {args.ref} && git checkout {args.ref}
{args.build}
"""

# ttl_seconds is a backstop: even if this process dies, the VM reaps itself.
with Sandbox.create(template="base", ttl_seconds=args.timeout + 120) as sbx:
    sbx.filesystem.write("/workspace/build.sh", build_script)
    result = sbx.exec("bash /workspace/build.sh", timeout_seconds=args.timeout)

    print(result.stdout)
    if result.stderr:
        print(result.stderr, file=sys.stderr)
    sys.exit(result.exit_code)
# The sandbox (and everything the PR's build did to it) is destroyed here.

Two SDK details matter for safety. The `with Sandbox.create(...) as sbx:` form kills the VM on block exit, so a clean run, an exception, or a build failure all converge on "VM destroyed." And the `timeout_seconds` on `exec` plus the `ttl_seconds` on create are belt-and-suspenders against the build script that runs `while true; do :; done` — the exec timeout trips first, and the TTL reaps the VM even if your orchestrator process is killed mid-run.

Default to no secrets in untrusted-PR builds. Fork PRs from outside contributors should build and test with zero credentials. If a step genuinely needs a token (e.g. a private dependency), inject the narrowest possible scoped token into that one VM via the guest environment, and treat it as burned the moment an untrusted build can read it.

Caches, secrets, and the supply-chain angle

Isolation per job fixes the host-kernel and state-leak problems, but "the build downloads dependencies" is its own supply-chain surface, and a microVM doesn't magically vet npm. The point is that the microVM gives you a clean place to enforce supply-chain discipline that a shared, mutable runner cannot:

Caches become read-only or per-VM. Seed the build VM from an immutable, pre-warmed snapshot (dependencies baked in) so the build reads a known-good cache and any writes it makes are thrown away with the VM — no path for one PR to poison the next build's cache.
Lockfiles are enforceable. Because the environment is identical every time, `npm ci` / `pip install --require-hashes` / pinned lockfiles behave deterministically; a drift between two builds is a real signal, not just runner entropy.
Egress is controllable at the VM boundary. If your threat model includes exfiltration, restrict the guest's outbound network at the network layer rather than trusting the build not to phone home — the VM has its own network namespace to lock down.
Provenance is cleaner. A build that starts from a known snapshot, runs to completion, and emits a signed artifact (SLSA-style) is far easier to reason about than one that ran on a host with six months of accumulated state.

There's a reproducibility dividend here that has nothing to do with security. "Works on the runner" stops being a coin flip when every build starts from the same baked image with the same toolchain versions. The microVM you use to contain a malicious PR is the same mechanism that gives an honest PR a clean, identical environment — isolation and reproducibility are the same feature viewed from two angles.

Fanning out a build matrix without fanning out risk

Build matrices multiply the problem: the same untrusted commit runs across Node 18/20/22, three OSes, two architectures. On a shared runner that's the same code touching the same host many times. With per-job microVMs each matrix cell is its own VM, so a poisoned cell can't contaminate its siblings. And because creates are cheap, fanning out 20 cells is 20 fast snapshot-restores, not 20 cold VM boots.

If every matrix cell shares an expensive setup phase (the same clone, the same `npm ci`), you can do that once, snapshot the configured VM, and fork it per cell. A same-host fork lands in roughly 400–750ms and shares memory copy-on-write; a cross-host fork is 1.2–3.5s. Each fork is still a fully isolated VM — you get the speed of shared setup with the blast-radius containment of separate machines.

When you don't need this

Isolation is a cost, and the honest answer is that not every pipeline needs microVMs. If your CI only ever runs first-party code from trusted committers on a private repo with branch protection — no fork PRs, no untrusted contributors, no third-party codegen — then a container on a clean-per-job managed runner is probably fine, and reaching for VMs is over-engineering. The line is trust: the moment a build executes code from someone who could be hostile (an external fork PR, a contractor, an upstream you can't fully vouch for), the shared-kernel runner is the wrong tool, and a disposable microVM per job is the right one.

The framing I'd leave you with: a CI pipeline is a remote-code-execution endpoint you built on purpose and pointed at your own secrets. That's fine — it's the job. But treat it like one. Run the untrusted part in something you're happy to throw away, give it nothing it doesn't need, and destroy it when it's done. A microVM per job makes "throw it away" the default instead of an afterthought.

Frequently asked questions

Why isn't a container per CI job enough isolation?

A container is a process with namespaces and cgroups running on the host's shared kernel. Every job on that host shares one kernel, so a container escape or kernel bug in one untrusted build can compromise the runner and every other job, plus any shared caches or secrets on the host. A Firecracker microVM boots its own guest kernel under hardware virtualization, so an untrusted build is confined to a disposable VM rather than sharing the host kernel with your other jobs.

How can a malicious pull request actually harm my CI?

A PR's build step runs arbitrary code: it can read secrets in the runner environment (registry tokens, cloud credentials), poison a shared build cache so the next legitimate build consumes a backdoored artifact, leave tampered binaries or cron jobs on a runner that isn't wiped between jobs, or exploit a kernel bug to escape its container. None of this requires an exotic attack — it's just the build script doing what build scripts are allowed to do. Running each job in a throwaway microVM removes the shared host and shared state those attacks rely on.

Doesn't booting a VM per job make CI too slow?

Not with snapshot-restore. PandaStack creates a sandbox by restoring a baked Firecracker snapshot rather than cold-booting, so a create is around 179ms p50 (p99 ~203ms), with the restore step itself near 49ms. Only the first-ever boot of a template is a full ~3s cold boot; after that every job takes the fast path. At those latencies a fresh, isolated VM per job is cheap enough to do for every build.

How do I keep build caches fast without letting one PR poison the next build?

Seed each build microVM from an immutable, pre-warmed snapshot that already contains your dependencies, so the build reads a known-good cache and any writes it makes are discarded when the VM is destroyed. There's no shared, mutable cache directory for one PR to tamper with on behalf of the next build. If a matrix shares an expensive setup phase, run it once, snapshot, and fork the VM per cell (a same-host fork is roughly 400–750ms) — each cell is still a fully isolated VM.

Should fork pull requests have access to CI secrets?

By default, no. Untrusted fork PRs should build and test with zero credentials, because any secret the build can read should be considered exfiltrated. If a step genuinely needs a token, inject the narrowest scoped credential into that single job's microVM and treat it as burned. Running each job in a disposable VM makes this enforceable: the secret is scoped to one VM and destroyed with it, rather than sitting as an ambient environment variable on a shared host.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free

Written by Ajay Kumar, Founder, PandaStack.