all posts

The Real Cost of Hosted Sandboxes at Scale

Ajay Kumar··8 min read

There's a recurring thread on Hacker News and the agent-builder subreddits, and it always reads roughly the same: "we love the hosted sandbox, it took ten minutes to wire up — but the bill at scale is getting silly, what else is there?" It's a good question and it deserves a real answer rather than a pitch. The honest version of that answer starts with cost structure, not a price tag. If you understand exactly what a per-second hosted sandbox is charging you for, you can predict your own bill, decide whether you're past the point where it stops being a bargain, and know what a "cheap E2B alternative" actually has to be cheaper at. So this post is about the shape of the bill, the break-even reasoning, and the cases where the right move is still to stay hosted.

I'm the founder of PandaStack, so read this as a vendor's argument — and one with an obvious interest in you self-hosting. I keep it honest the only way that works: I state specific numbers (latency, license, fork times) only for PandaStack, and I describe every other product's pricing in general, structural terms — billing model and cost drivers, never invented per-second rates or dollar figures. I name E2B in the title because that's the search, but nothing here is an attack on it; E2B is a solid, open-source, Firecracker-based product, and below some scale it is genuinely the cheaper choice once you price in your own time. The exact numbers for any product belong on that vendor's own pricing page, dated to the day you read it.

Why hosted is cheap first

Start by giving the hosted model its due, because it earns it. A hosted sandbox API removes an enormous amount of work: you don't provision KVM hosts, you don't build a network fabric, you don't run an agent fleet, you don't carry the on-call pager when a host wedges at 3 a.m. You make an HTTP call and code runs in isolation. For a team shipping a product where the sandbox is a means and not the end, that's the correct trade for a long time. Every hour you'd spend operating infrastructure is an hour not spent on the thing your customers actually pay you for, and at low volume the dollar cost of the hosted meter is rounding error against an engineer's salary.

The reason the bill stays small early isn't generosity — it's arithmetic. A hosted sandbox is, structurally, the underlying compute plus a margin. At low volume the margin is a small number times a small number, so it disappears into the noise. The convenience is worth everything and the markup costs almost nothing. The thread-starters above aren't wrong that it was cheap; they're noticing that it stopped being cheap, and to understand why, you have to look at what the meter is actually counting.

What the meter is actually counting

Almost every per-second hosted sandbox bills some combination of the same handful of things. The marketing unit — a "session," a "sandbox-hour," a "run" — is usually one of these wearing a costume. Knowing the real drivers is most of the work, because it tells you which of your usage patterns is cheap and which is quietly expensive.

  • Sandbox-seconds — the dominant line item. You pay for wall-clock time a sandbox is alive, usually metered per second or finer, multiplied by the size you reserved. A 4 GiB / 4 vCPU box alive for a minute costs more than a 1 GiB / 1 vCPU box for the same minute. This is where the bulk of a real bill comes from.
  • Idle time — the silent killer for agents. If your agent creates a sandbox, runs a command, then waits on a model call or a human before the next command, every second of that wait may be billed at the full sandbox rate. Whether it is depends entirely on whether the vendor bills full-lifetime or active-CPU only — read that line carefully, because for a conversational agent it can dominate everything else.
  • Per-create overhead — the cost you pay for churn. If boot is slow, you keep boxes warm to hide the latency, and warm boxes burn sandbox-seconds doing nothing. If boot is fast and teardown is cheap, you can run truly ephemeral sandboxes and stop the meter the instant work finishes. The boot speed of the platform is, indirectly, a pricing lever.
  • Egress and storage — the add-ons people forget. Network egress (pulling large dependencies on every cold run, pushing big outputs), persistent disk, and snapshot storage are frequently billed separately from compute. A code interpreter that runs pip install or npm install on every fresh box pays that egress over and over.

There's often a fifth thing that isn't a resource at all: a platform or seat fee — a monthly minimum or plan price that sits on top of usage. Separate it from the metered cost when you compare, because a low per-second rate behind a high monthly floor is a different deal than the rate alone suggests. For a fuller taxonomy of these billing shapes, I wrote a companion piece at /blog/code-interpreter-api-pricing that walks each model end to end.

Where the math flips

Here's the argument the pricing pages don't make for you, because it's not in their interest to. Your hosted bill is, to a first approximation, (sandbox-seconds × size × per-second rate) + egress + platform fees — and that per-second rate already contains the vendor's markup over their raw compute. Your self-hosted bill is the amortized cost of the KVM hosts you run (reserved instances, spot, or metal you already own), plus your egress, plus your operational time. The two cross when the markup you're handing over every month grows past the all-in cost of running the hosts yourself.

The variable that moves the crossover most is the shape of your load, not its size. Two teams with identical total sandbox-seconds can land on opposite sides of the line:

  • Flat and sustained load favors self-host. If your sandboxes run near-continuously and your hosts would sit near capacity most of the day, you pay for the hardware once and skip the markup on every sandbox-second after that. The markup is the largest controllable line on the bill, and you delete it.
  • Spiky and low load favors hosted. If your usage is bursty — quiet nights, sharp daytime peaks — self-hosting means provisioning for the peak and eating the cost of idle hosts the rest of the time. The hosted vendor amortizes that idleness across all their customers; you can't. Their meter, markup and all, is cheaper than your under-utilized fleet.
  • Unpredictable or early-stage load favors hosted. If you can't forecast volume yet, every host you buy is a bet. Pay the markup and stay liquid until your load is legible enough to size a fleet against.

And the costs on the self-host side of the ledger are real — do not hand-wave them, because they're exactly what a hosted vendor is absorbing for you. You now operate KVM hosts, a network fabric, an agent fleet, and snapshot storage. You carry the on-call. You eat the under-utilization whenever load dips below the capacity you provisioned. For a team without infrastructure appetite, that operational weight can easily exceed the markup you'd save — which is the whole reason the hosted business exists and thrives. Self-host wins on cost only past a volume threshold and only if running infrastructure is something your team is set up to do well. If those two conditions aren't both true, the honest recommendation is to stay hosted and revisit when your load is flatter and bigger.

Do not trust any dollar figure or per-second rate from a blog post — including this one — as current. Rates, free-tier limits, idle-vs-active rules, egress charges, and monthly minimums in this category change often and quietly. Before you decide anything, model your own workload: estimate your real sandbox-seconds, the size you reserve, your idle ratio, and your egress, then run that against each vendor's live pricing page on the day you check it and against a concrete bill of materials for hosts you'd actually buy. Benchmark and price it for yourself — the headline rate alone will not tell you your bill, and neither will I.

PandaStack as the at-scale answer

If you've run the numbers above and landed on the self-host side — flat, sustained, predictable load and a team that can operate infrastructure — the question becomes which substrate to run. PandaStack's core is open source under Apache-2.0, and self-hosting is a first-class path rather than an afterthought bolted onto a hosted product. You run the control-plane API and a per-host agent on your own Linux KVM hosts (anything with /dev/kvm); the sandboxes execute on your infrastructure, not a vendor's. Every sandbox is a Firecracker microVM with its own guest kernel (5.10, Ubuntu 24.04), isolated by hardware virtualization rather than a shared container kernel. The pricing consequence is the entire point: with no vendor meter in the request path, there is no per-second markup. Your cost is the compute you actually run.

The architecture is built to keep the exact cost drivers from the meter section low, which is what decides whether your own hosts run cheap. There is no warm pool — every create restores a baked Firecracker snapshot at p50 179 ms, ~203 ms p99 (the first spawn of a new template cold-boots in ~3 s, then bakes the snapshot for everyone after). That speed is what makes truly ephemeral sandboxes practical: you tear a box down the moment it goes idle and recreate it in under 200 ms, so you never keep boxes warm to dodge a slow boot — you stop the meter, and on your own fleet, you free the host. Forking is first-class copy-on-write — same-host forks land in ~400 ms by mapping guest memory MAP_PRIVATE and reflinking the rootfs on XFS, so branching an already-warm state costs metadata, not a fresh pip install and its egress. Optional UFFD memory streaming pages the memory image in from object storage on demand (HTTP Range GET, 4 MiB chunks, zero-page elision, a prefetch trace, and a shared per-host chunk cache) so an agent boots without first downloading a multi-gigabyte memory image. Per-sandbox networking is NATID — each sandbox gets its own Linux netns, veth, and tap, with egress isolated per sandbox.

Two more things worth knowing before you price it out. First, the same microVM substrate runs more than a code interpreter: managed PostgreSQL 16, git-driven app hosting with scale-to-zero, serverless functions with cron, and durable volumes all sit on one platform — so consolidating workloads onto self-hosted infra can delete more than one vendor bill at once. Second, a hosted PandaStack offering exists too, for teams that want the convenience and aren't past the break-even yet. The SDKs (pandastack for Python, @pandastack/sdk for TypeScript, the pandastack CLI) read PANDASTACK_API_KEY and take a configurable base URL, so the same code points at the hosted endpoint today and your own fleet tomorrow — you don't rewrite anything when the math flips. That's the migration path most teams actually want: start hosted, move the meter onto your own hosts when, and only when, the scale justifies it.

For the deeper mechanics behind the cost levers above, see /docs/internals/snapshot-restore for the boot path, /docs/concepts/snapshots-and-forks and /docs/internals/fork-cow for forking, /docs/internals/streaming-restore for UFFD memory streaming, and /docs/concepts/networking-natid for per-sandbox egress. For a step-by-step on standing up the self-hosted path, /blog/self-hosted-code-execution-sandbox is the build guide; to compare the field by isolation model and hosting rather than price, /blog/e2b-alternatives is the honest-broker map, and /blog/code-interpreter-api-pricing breaks down the billing shapes in full. If your hosted bill is climbing and your load is flat, the only number that matters is whether the markup you're paying has crossed the cost of hosts you'd run yourself. Price that out for real, and if it has, self-hosting on an open-source substrate is how you stop paying it.

Frequently asked questions

What's a cheap E2B alternative for sandboxes at scale?

The cheapest alternative at high, steady volume is usually self-hosting an open-source sandbox substrate, because a hosted per-second rate always includes the vendor's markup over raw compute. PandaStack's core is open source under Apache-2.0 and runs on your own Linux KVM hosts, so there is no per-second meter in the path and your cost is the compute you actually run. But below some scale — especially for spiky or unpredictable load — staying hosted (E2B included) is genuinely cheaper once you price in your own operational time. Model your real sandbox-seconds, idle ratio, and egress before deciding.

Why does my hosted sandbox bill grow faster than expected at scale?

Three drivers usually do it. First, idle time: if your agent leaves a sandbox alive while it waits on a model call or a user, a full-lifetime meter bills every idle second at the full rate. Second, per-create overhead: slow boots push you to keep boxes warm, and warm boxes burn sandbox-seconds doing nothing. Third, egress: installing the same dependencies on every cold run pulls bytes you pay for repeatedly. At low volume the vendor's markup is rounding error; as your sandbox-seconds climb, that markup becomes the largest controllable line on the bill.

When should I self-host sandboxes instead of using a hosted API?

Self-host when two conditions are both true: your load is flat and sustained enough that your own KVM hosts would run near capacity most of the time, and your team is set up to operate infrastructure (hosts, networking, agent fleet, snapshot storage, on-call). Under those conditions you pay for hardware once and skip the per-second markup on every sandbox-second after. If your load is spiky, low, or unpredictable, or if running infra isn't a strength, stay hosted — the markup is cheaper than an under-utilized fleet plus the operational weight.

Does switching from a hosted sandbox to self-hosted PandaStack require a code rewrite?

No. The PandaStack SDKs (pandastack for Python, @pandastack/sdk for TypeScript, and the pandastack CLI) read PANDASTACK_API_KEY and accept a configurable base URL, so the same application code can point at the hosted endpoint or at your own self-hosted control-plane API by changing the base URL. The practical path is to start hosted while your load is small or unpredictable, then move the meter onto your own KVM hosts once your volume is flat and large enough to clear the self-host break-even — without rewriting your integration.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.