Ephemeral CI Runners on MicroVMs: Fresh Isolation Per Job
Run each CI job in its own fresh Firecracker microVM, then throw it away when the job finishes. That single change eliminates the two worst failure modes of traditional CI — state leaking between jobs on a shared runner, and untrusted code from a fork pull request running next to your secrets behind nothing but a shared kernel. The usual objection is cost: a VM per job sounds expensive. On PandaStack it isn't, because a sandbox is created in about 179ms (p50) by restoring a baked snapshot, not cold-booting. At that price, ephemeral-per-job stops being a luxury and becomes the obvious default.
What's actually wrong with persistent, shared CI runners?
A self-hosted runner that survives across jobs is convenient — your dependency cache is warm, the toolchain is already installed, and the next job starts fast. That convenience is exactly the problem. The machine accumulates state, and CI runs other people's code.
- State leakage between jobs: a global npm or pip install from job N is still on disk for job N+1. A test that mutates ~/.config, leaves a daemon running, or writes to /tmp creates flaky failures and false passes that are brutal to reproduce. 'Works on the runner, fails on a clean checkout' is the signature of a dirty environment.
- Supply-chain risk from untrusted PRs: when an external contributor opens a pull request, your CI runs their code. On a shared runner that code can read cached credentials, poison the dependency cache for the next job, or — with a kernel or container escape — reach the host. This is not theoretical; it's how runner-host compromises happen.
- Slow, fragile 'clean' environments: the standard mitigation is to scrub the runner between jobs (delete caches, kill processes, reset the filesystem). That's slow, and it's only as good as your cleanup script. The first thing it forgets to reset is the bug you ship.
- Blast radius: one persistent host runs many jobs. A single compromised or runaway job affects every job scheduled after it on that machine.
Why does a fresh microVM per job fix this?
A microVM gives you a clean slate plus a real isolation boundary, and the snapshot-restore model makes both cheap. Three properties do the work.
- Hardware isolation for untrusted code: every PandaStack sandbox is its own Firecracker microVM with its own guest kernel — not a shared-kernel container. Untrusted fork-PR code runs behind a hardware virtualization boundary. A compromised job is trapped in a VM you're about to delete anyway, so the blast radius is one job.
- Clean state, guaranteed: the runner is born from a snapshot for this job and destroyed after. There is no 'previous job' to leak from, no cleanup script to get wrong. Every job sees an identical, known-good filesystem and process table.
- Per-job VMs are actually cheap: create is ~179ms at p50 because every create restores a baked snapshot on demand — there's no warm pool of idle VMs to pay for, and no cold boot. Memory is copy-on-write (MAP_PRIVATE) and the rootfs is an XFS reflink clone, so spinning up the Nth identical runner doesn't copy gigabytes. A same-host fork is ~400ms if you'd rather branch from a live VM.
The pattern: one sandbox per job, exec the pipeline, capture, destroy
The control loop for ephemeral CI is the same regardless of which CI system triggers it. Your orchestrator (a GitHub Actions workflow, a webhook handler, a small queue worker) does five things per job:
- Create a sandbox from a template — ideally a snapshot you baked with your toolchain and dependencies already installed.
- Get the code in: git clone the PR ref, or write a tarball/diff over the filesystem API. For fork PRs, clone the exact commit SHA, never a mutable branch.
- Run pipeline steps via exec: clone, install (if not baked), build, test, lint. Each step returns stdout, stderr, and an exit code.
- Capture results: collect logs from each exec, read build artifacts and test reports off the filesystem, and record the overall pass/fail from exit codes.
- Destroy the sandbox. Nothing survives — that's the point.
Because the VM is disposable, you don't need defensive cleanup between steps. If a step corrupts the environment, it corrupts a VM that's about to be deleted. The only state that escapes is what you deliberately read out (logs, artifacts, reports).
A self-hosted CI runner sandbox in Python
Here's the whole loop with the Python SDK: create, clone an untrusted PR commit, build, test, read the results, and tear down. Set PANDASTACK_TOKEN in the environment first. The exec calls return a result object with stdout, stderr, and exit_code so your orchestrator can decide pass/fail.
from pandastack import PandaStack
ps = PandaStack() # reads PANDASTACK_TOKEN, base url https://api.pandastack.ai
REPO = "https://github.com/acme/widget.git"
PR_SHA = "a1b2c3d4" # the exact commit from the fork PR, never a branch name
# 1. Fresh, hardware-isolated runner. Use a snapshot that already has deps baked
# (see below) to skip install. Falls back to the 'base' template otherwise.
sb = ps.sandboxes.create(
template="ci-node-baked", # or "base" for a clean toolchain
ttl_seconds=900, # hard cap: VM self-destructs after 15 min
metadata={"repo": REPO, "sha": PR_SHA, "kind": "ci"},
)
try:
# 2. Pin to the exact untrusted commit. Clone runs INSIDE the microVM,
# so even a malicious repo hook is contained to this disposable VM.
sb.exec(f"git clone --depth 1 {REPO} /work && cd /work && git fetch --depth 1 origin {PR_SHA} && git checkout {PR_SHA}")
# 3. Pipeline steps. If deps are baked into the snapshot, drop the install.
install = sb.exec("cd /work && npm ci", timeout_seconds=300)
build = sb.exec("cd /work && npm run build", timeout_seconds=600)
test = sb.exec("cd /work && npm test -- --reporter=junit --outputFile=report.xml", timeout_seconds=600)
# 4. Capture results: exit codes drive pass/fail; read artifacts off the FS.
passed = all(step.exit_code == 0 for step in (install, build, test))
report = sb.filesystem.read("/work/report.xml") # bytes -> store / annotate PR
print("PASS" if passed else "FAIL")
print(test.stdout[-2000:]) # tail of the test log for the PR comment
finally:
# 5. Always destroy. No state survives the job.
sb.delete()Stream long steps instead of blocking if you want live log output — the SDK exposes a streaming exec that emits stdout, stderr, and exit events, which maps cleanly onto a live CI log pane. The non-streaming exec above is fine when you only need the final logs and exit code.
Bake a base snapshot to skip dependency install
Reinstalling node_modules or a Python venv on every job is the slowest part of most pipelines, and it's wasted work when dependencies rarely change. The microVM model gives you a clean fix: do the install once, snapshot the VM, and restore from that snapshot for every subsequent job.
# One-time (or whenever your lockfile changes): bake a prepared base.
base = ps.sandboxes.create(template="base", ttl_seconds=600)
base.exec("git clone --depth 1 https://github.com/acme/widget.git /work")
base.exec("cd /work && npm ci") # full install, once
snap = base.snapshot() # capture filesystem + memory state
base.delete()
print("restore CI jobs from:", snap.id)
# Now each job restores this snapshot in well under a second with deps already
# present. Pull only the changed source over the filesystem API and run build+test.Restore is memory copy-on-write plus a reflink rootfs clone, so each job shares the baked pages until it writes — installing once and restoring many times is dramatically cheaper than installing per job. Refresh the snapshot when your lockfile changes; treat it like a cache key. For the mechanics of capturing and restoring state, see the snapshots and forks docs.
Secure CI for fork pull requests
Fork PRs are the case that justifies all of this. Anyone on the internet can open one, and your CI will run their code. The safe posture is to assume that code is hostile and design so that hostility doesn't matter.
- Run the clone and every step inside the microVM, never on the orchestrator host. The VM is the trust boundary.
- Pin to the immutable commit SHA from the PR event — never a branch name a contributor can re-point after checks start (a TOCTOU classic).
- Don't bake real secrets into the CI snapshot. A fork PR shouldn't have access to deploy credentials or signing keys; give it only what it needs to build and test.
- Set a ttl_seconds so a job that tries to hang or mine resources self-destructs. Treat the VM as cattle and the time budget as a hard limit.
- Read results out explicitly (logs, JUnit report, coverage). Anything you don't deliberately extract dies with the VM — including anything malicious the job tried to stash.
When a managed CI service is the better call
Ephemeral microVM runners are not free of operational cost — you own scheduling, scaling, snapshot hygiene, and artifact plumbing. Be honest about whether you need them.
- All your repos are private and trusted, and you have no fork PRs. The supply-chain argument mostly evaporates; container-based managed CI is probably enough.
- Your pipelines are short and infrequent. The operational overhead of self-hosting outweighs the per-job savings.
- You want zero infrastructure to run. GitHub Actions, GitLab CI, and similar handle the runner fleet for you; that's a real feature.
Reach for ephemeral microVM runners when you run untrusted fork PRs, when you need a hardware isolation boundary for compliance or multi-tenancy, when you want byte-for-byte control over the runtime image, or when your job volume makes managed per-minute pricing painful. In those cases the snapshot-restore model gives you what containers can't — a real kernel boundary per job — without the speed penalty that usually comes with it. Match the tool to your threat model and volume, not to the hype.
The shape of the win is simple: clean state by construction, hardware isolation for code you don't trust, and a ~179ms create that makes 'a fresh VM per job' the cheap option rather than the expensive one. For the create path and the isolation model under the hood, see the snapshot-restore and networking internals docs.
Frequently asked questions
What are ephemeral CI runners?
Ephemeral CI runners are build/test environments created fresh for a single job and destroyed when it finishes, instead of reusing a long-lived shared machine. Each job starts from a known-clean state, so leftover files, environment variables, caches, or installed packages from a previous run can't affect it. On PandaStack, each ephemeral runner is its own Firecracker microVM with a dedicated guest kernel, giving hardware-level isolation rather than the shared-kernel isolation of containers.
Why are microVMs safer than containers for running untrusted pull requests?
Untrusted code from a fork PR runs with whatever permissions the runner has, and a container shares its kernel with the host. A kernel exploit or container escape from that PR can reach the host or other jobs. A Firecracker microVM gives each job its own guest kernel and a hardware virtualization boundary, so a compromised job is contained to a VM that gets thrown away. That boundary is why microVMs are the right primitive for CI on untrusted fork pull requests.
Doesn't creating a new VM per CI job make builds slower?
Not meaningfully. PandaStack creates a sandbox in about 179ms (p50) by restoring a baked snapshot on demand rather than cold-booting a VM, so the per-job creation cost is sub-second. You can also snapshot a base image that already has your toolchain and dependencies installed, then restore from it each job to skip the install step entirely. The dominant cost stays your actual build and test work, not VM provisioning.
When should I NOT build my own ephemeral runners on microVMs?
If your repos are all private and trusted, your build times are short, and GitHub Actions or GitLab CI already meet your needs, a managed CI service is less work to operate — you don't run scheduling, autoscaling, or artifact plumbing yourself. Self-hosting ephemeral microVM runners pays off when you have untrusted fork PRs, need hardware isolation for compliance, want to control the exact runtime image, or run enough jobs that per-minute managed pricing hurts. Match the tool to the threat model and the volume.
179ms p50 cold start. Fork, snapshot, and scale to zero.