Multi-Tenant Code Execution: Isolation Requirements
If your platform runs code that other people wrote — a notebook product, an AI agent that executes model-generated commands, a CI service, a per-customer scripting surface — then you are a multi-tenant code-execution platform whether you planned to be one or not. The defining property is that two mutually distrusting parties have their code running on the same physical hosts at the same time, and you are the only thing standing between them. That single fact rewrites your security requirements. The question is no longer "can this code do something bad to itself?" but "can tenant A reach tenant B, the host, or your control plane?"
This post is the requirements checklist for that setting: what you actually need, why each item is non-negotiable, and where the honest limits are. It builds on two companion pieces — /blog/why-docker-is-not-a-sandbox for why a shared kernel is the load-bearing weakness, and /blog/code-isolation-hierarchy for the full ladder of boundaries — and applies them to the multi-tenant case specifically.
The five requirements, up front
There are five things a multi-tenant execution platform has to get right. They compound: weakening any one of them undermines the others, and most real incidents are a chain through two or three. The list, then the reasoning.
- Hardware isolation between tenants — the boundary between one tenant's code and another's cannot depend on the integrity of a shared kernel.
- Per-tenant network isolation — a tenant's network namespace, egress path, and reachability must be its own, not a shared bridge where tenants can see or reach each other.
- Resource limits — CPU, memory, disk, and process/file-descriptor caps per tenant, so one tenant cannot starve or crash the host for everyone else.
- Ephemerality — workloads are disposable and reset to a known-clean state, so a compromise or leaked secret does not persist across runs or across tenants.
- Blast-radius containment — when (not if) something gets through, the damage is bounded to one tenant's slice and cannot reach the control plane or the rest of the fleet.
1. Hardware isolation between tenants
This is the requirement everything else rests on, and it's the one teams most often get wrong by reaching for a container. A container is a normal Linux process that the host kernel has been asked to treat specially — namespaces control what it sees, cgroups what it consumes, capability drops and seccomp-bpf what it's allowed to do. Every one of those mechanisms is enforced by the host kernel, the same kernel that every other tenant's container also calls into. The Linux syscall surface that's shared is enormous — well over 300 syscalls on x86-64, plus ioctls, the filesystem layer, and reachable device drivers. One kernel privilege-escalation bug reachable through a syscall any tenant is still allowed to call, and that tenant is root on the host, sitting next to everyone else.
Hardening helps and is worth doing — keep the default seccomp profile, drop every capability you can, run rootless, enable user namespaces. But it moves you along a spectrum of reduced shared-kernel surface; it does not make the kernel not-shared. The runtime is also in scope: runc, containerd, and the Docker daemon run with host privileges, and the canonical illustration is CVE-2019-5736, where a malicious container overwrote the host runc binary via a /proc/self/exe file descriptor — patched in 2019, but a clean example of the runtime itself being host-privileged code between hostile guests and the host.
The categorically different boundary is to stop sharing the kernel. A microVM — Firecracker or Cloud Hypervisor — gives each tenant workload its own guest kernel, isolated by CPU hardware virtualization (Intel VT-x / AMD-V) exposed through Linux KVM. The host no longer presents the full syscall ABI to the guest; what it exposes is the VMM plus the KVM ioctl interface plus a deliberately minimal virtio device model. Firecracker is written in Rust, emulates essentially virtio-net, virtio-block, and virtio-vsock, and runs behind a jailer that chroots the process and applies its own cgroups and a tight seccomp filter as defense-in-depth. That's a small, heavily audited surface versus the hundreds of syscalls a container shares — which is exactly why the microVM is the multi-tenant default. The mechanics are in /blog/what-is-a-microvm and /blog/firecracker-vs-docker.
2. Per-tenant network isolation
A hardware boundary on compute does nothing if all tenants share one network. The default Docker bridge, for instance, puts containers on a shared L2 segment where they can reach each other directly; that's a lateral-movement path and an east-west reachability problem before you've even discussed escapes. The requirement is that each tenant gets its own network namespace, its own interface, and its own egress path, with no implicit reachability to peers or to the host's internal services.
PandaStack's approach is NATID: every sandbox gets a dedicated Linux network namespace, a veth pair, and a tap device, carved from a pool of 16,384 /30 subnets per agent. Each sandbox sits in its own namespace with per-sandbox egress isolation — there is no shared bridge for one tenant to sniff or pivot through, and teardown of a tenant's networking is atomic with teardown of its namespace. The design is documented at /docs/concepts/networking-natid. The general principle holds regardless of implementation: per-tenant netns, not a shared switch, and egress controls (allowlists, metadata-endpoint blocking, rate limits) applied per tenant rather than globally.
3. Resource limits
Isolation is not only about confidentiality and integrity — availability is a cross-tenant property too. A tenant that forks bombs, fills the disk, or pins every core is attacking everyone else on the host even without escaping anything. So each tenant needs hard ceilings on CPU, memory, disk, and process/file-descriptor counts, enforced below the workload and not negotiable by it.
A microVM gives you a natural enforcement point here: vCPU count and guest RAM are fixed at the VM boundary by the hypervisor, so a runaway guest can saturate its own allocation and no more. Disk is bounded by the rootfs and any attached volume rather than the host filesystem. This is cleaner than cgroup limits inside a shared kernel because the limit is enforced by the VMM, outside the tenant's reachable surface. The operational caveat: per-VM limits protect the host from any single tenant, but you still need scheduling and admission control so the sum of tenants doesn't oversubscribe the host into thrashing — capacity planning is a requirement, not an afterthought.
4. Ephemerality
The fourth requirement is that workloads are disposable. A long-lived, mutable execution environment shared or reused across tenants accumulates state: leaked secrets in environment variables and temp files, a tenant's data left on disk, a foothold an attacker planted on a previous run. Ephemerality says every workload starts from a known-clean, immutable base and is destroyed afterward, so contamination cannot carry forward.
The historical objection to fresh-VM-per-task was startup cost, and that's the part worth engineering away. PandaStack keeps no warm pool of idle VMs; every create restores a baked Firecracker snapshot on demand, at a p50 of 179ms (about 203ms p99). That's the snapshot-restore create number — not a cold kernel boot, which on first spawn of a new template still takes around 3 seconds before the snapshot is baked. Because restore is cheap, a clean VM per task is affordable, which is what makes ephemerality practical rather than aspirational. Forking from a snapshot is first-class too: a same-host fork — copy-on-write guest memory via MAP_PRIVATE plus an XFS-reflink rootfs — lands around 400ms (cross-host 1.2–3.5s). See /blog/snapshot-and-fork-explained and /docs/internals/snapshot-restore for how that path works.
5. Blast-radius containment
The first four requirements reduce the probability of a breach. The fifth assumes one happens anyway and asks: how far does it get? You design for the boundary being broken, because over a long enough timeline and a large enough fleet, something will get through. Good blast-radius containment means a compromise of one tenant's VM yields one tenant's VM — not the host, not the agent, not the control plane, not the other tenants on the box.
Concretely that means the per-host agent and the control-plane API are not reachable from inside a tenant VM except through a narrow, authenticated interface; that a tenant VM can't read other tenants' rootfs or memory (the hardware boundary and per-VM disks handle this); that credentials are scoped per tenant so a stolen one is useless elsewhere; and that the jailer wrapping each VMM means even a VMM compromise lands in a chrooted, capability-stripped, seccomp-filtered process rather than as host root. Defense-in-depth here is the whole game: the microVM boundary is the primary wall, and the jailer plus per-tenant network and credential scoping are the walls behind it.
The residual risk you can't hand-wave away
A microVM is a meaningfully stronger boundary than a shared kernel. It is not an unbreakable one, and a multi-tenant platform that claims otherwise is lying to its customers. The honest residual risks:
- VMM device-model bugs — the virtio device emulation is the primary thing a hostile guest probes. Rust prevents memory-safety bugs but not logic bugs.
- KVM escape CVEs — the hypervisor layer itself has had guest-to-host escape vulnerabilities. Google's kvmCTF pays up to $250,000 for one, which signals both that they're real and that the surface is small enough to be a focused target.
- Microarchitectural side channels — Spectre-class branch-target injection, MDS, and newer guest-to-host variants can cross the VM boundary in principle, because branch predictors and caches are shared at the hardware level. A microVM is a memory-isolation boundary, not a microarchitectural-isolation one. If your threat model includes a tenant exfiltrating another's data via side channels, you may need physical or NUMA-level separation of sensitive tenants, or to move up to a different threat model entirely.
- The control plane and shared services — your scheduler, database, and object storage are shared by definition. They're authenticated and outside the tenant's VM, but they're still a surface, and a logic bug there (a missing tenant-scoping check on an API) is a cross-tenant breach with no kernel exploit required.
Isolation also isn't the whole job. A multi-tenant platform needs quotas and rate limits per tenant (so abuse is bounded economically, not just technically), abuse handling for the things isolation doesn't stop — crypto mining inside a tenant's own legitimate allocation, outbound spam or scanning, content policy violations — and audit logging so you can answer "what did tenant X do" after the fact. Those are operational requirements that sit alongside the five technical ones, and skipping them is how a technically-secure platform still ends up on an abuse blocklist.
Where PandaStack fits
PandaStack is built around exactly this checklist. Every sandbox, database, and hosted app is its own Firecracker microVM — its own guest kernel (5.10, Ubuntu 24.04), isolated by KVM, not a shared-kernel container — which is requirement one. NATID gives each sandbox a dedicated network namespace with per-sandbox egress isolation, which is requirement two. Per-VM CPU/memory ceilings enforced by the hypervisor cover requirement three. Snapshot-restore-per-create at ~179ms p50, with no warm pool, makes a clean VM per task affordable, which is requirement four. And the jailer-wrapped VMM plus per-tenant network and credential scoping bound the blast radius, requirement five.
The part that matters for a platform builder weighing this: the core is open source and Apache-2.0, so you can self-host it on your own Linux KVM hosts and keep the multi-tenant boundary on infrastructure you control and audit — running the control-plane API and a per-host agent, with tenant code executing on your machines. A hosted offering exists too, but self-hosting is first-class. None of this removes the residual risk above — side channels, KVM CVEs, and your own control-plane logic are still yours to reason about — but it does mean the foundational isolation requirement isn't something you're bolting onto a shared kernel and hoping. For the broader decision, start at /blog/how-to-sandbox-untrusted-code; for the per-rung trade-offs, /blog/code-isolation-hierarchy.
Frequently asked questions
What are the isolation requirements for multi-tenant code execution?
Five things, and they compound: hardware isolation between tenants (the boundary can't depend on a shared kernel, so containers alone aren't enough for mutually distrusting tenants); per-tenant network isolation (its own namespace and egress path, not a shared bridge); resource limits (CPU, memory, disk, and process caps per tenant); ephemerality (disposable workloads reset to a known-clean state so contamination doesn't carry forward); and blast-radius containment (a breach of one tenant's environment yields only that environment, not the host, control plane, or other tenants). On top of those, you also need per-tenant quotas, abuse handling, and audit logging.
Can I use containers for multi-tenant code execution?
Containers isolate cooperating workloads well, but for mutually distrusting tenants they're a weak boundary because every container shares the host kernel. A kernel privilege-escalation bug reachable through a syscall any tenant is still allowed to call becomes a cross-tenant breach — tenant A reaching tenant B through the kernel they share. Hardening (seccomp, dropped capabilities, user namespaces, rootless) reduces the shared-kernel surface but cannot make it not-shared. For untrusted multi-tenant code, the default is a microVM, which gives each tenant its own guest kernel isolated by hardware virtualization (KVM).
Why are microVMs the default for multi-tenant isolation?
Because they stop sharing the host kernel. A microVM (Firecracker or Cloud Hypervisor) gives each tenant its own guest kernel, isolated by CPU hardware virtualization (Intel VT-x / AMD-V) through KVM. The host exposes only a minimal VMM plus the KVM ioctl interface plus a small virtio device model instead of the full Linux syscall ABI — a small, heavily audited surface versus the 300-plus syscalls a container shares. Firecracker also runs behind a jailer that drops privileges as defense-in-depth. The historical objection was startup cost, which snapshot-restore (PandaStack creates a microVM in ~179ms p50) removes, making a fresh isolated VM per task affordable.
Does a microVM eliminate cross-tenant risk entirely?
No, and a platform that claims it does is overselling. A microVM is a meaningfully stronger boundary than a shared kernel but not an absolute one. Residual risks include VMM device-model logic bugs, KVM guest-to-host escape CVEs (Google's kvmCTF pays up to $250,000 for one), and microarchitectural side channels (Spectre-class, MDS, and newer variants) that can cross the VM boundary because caches and branch predictors are shared at the hardware level — a microVM is a memory-isolation boundary, not a microarchitectural one. Your shared control plane and services are also a surface, where a missing tenant-scoping check is a breach with no kernel exploit needed. Match the boundary to your threat model and don't skip quotas, abuse handling, and audit logging.
49ms p50 cold start. Fork, snapshot, and scale to zero.