all posts

Database-per-Tenant Isolation with MicroVMs

Ajay Kumar··10 min read

Every multi-tenant SaaS eventually has the database-isolation conversation. It usually starts the day a single tenant's runaway analytical query pins a CPU and the latency graph for every other customer on that shared Postgres tips over at once. Suddenly "we put a tenant_id on every row" stops feeling like a clever space-saver and starts feeling like a single blast radius with a thousand customers inside it. The question underneath is always the same: how strong does the wall between tenant A and tenant B actually need to be, and what does it cost you to build it that strong?

This post walks the multi-tenancy spectrum — from one shared table with a tenant_id column all the way to a dedicated database per customer on its own isolated VM — and is honest about where each rung makes sense. Then it covers how PandaStack's managed Postgres-16 makes the strongest rung practical by giving each database its own Firecracker microVM and durable volume, and the cost trade-off you have to respect so you don't end up paying for fifty thousand idle VMs. It's the data-layer companion to /blog/multi-tenant-code-execution, which makes the same argument for the compute layer.

The multi-tenancy spectrum

There isn't one "multi-tenant database" design — there's a ladder of them, each trading isolation against density and operational simplicity. The lower rungs pack thousands of tenants into one database and amortize everything; the higher rungs give each tenant a real boundary and pay for it in resources and orchestration. Most companies start low and climb only as far as their largest, most demanding customers force them to.

  • Shared schema, shared database (tenant_id column) — every tenant's rows live in the same tables, separated by a tenant_id filter on every query. Maximum density and the cheapest to run, but the isolation is a WHERE clause: one missing filter is a cross-tenant data leak, and one tenant's load is everyone's load. This is where almost every SaaS begins.
  • Schema-per-tenant (one database, many schemas) — each tenant gets its own set of tables inside a shared database. Cleaner separation than a tenant_id column and per-tenant migrations get easier, but you still share a connection pool, a buffer cache, a WAL, and a single process — so noisy-neighbor and blast-radius are only marginally better. Postgres also starts to creak past a few thousand schemas.
  • Database-per-tenant (separate databases, possibly shared server) — each tenant gets a logically separate database. Real separation of data, backups, and credentials, and you can drop a tenant by dropping a database. But if those databases share one Postgres instance, they still share CPU, RAM, and the page cache — the noisy neighbor is quieter, not gone.
  • VM-per-tenant (database on its own isolated machine) — each tenant's database runs in its own virtual machine with its own kernel, CPU and memory allocation, disk, and network. This is where noisy-neighbor and blast-radius are actually solved rather than reduced, because the boundary is enforced by hardware virtualization instead of a shared process. The historical cost was a heavyweight VM per tenant; a microVM is what makes this rung affordable.

The thing to notice is that the first three rungs all share something load-bearing — a column, a process, or a host kernel and page cache. "Database-per-tenant" on one shared Postgres server sounds like strong isolation, but a tenant that exhausts shared_buffers or saturates the instance's I/O still degrades its neighbors. Real per-tenant isolation only arrives when the unit of separation is the machine, not a logical construct inside a shared one.

The two problems you're actually solving

Strip away the architecture diagrams and there are two distinct failure modes that drive teams up the ladder. They sound similar but they're not — one is about availability, the other is about security, and a design can fix one while leaving the other wide open.

Noisy neighbor (an availability problem)

On shared infrastructure, one tenant's resource consumption is every tenant's problem. A single customer ships a missing index and runs a sequential scan over a hundred-million-row table; the shared buffer cache gets evicted out from under everyone, I/O queues back up, and a connection pool that was comfortably sized fills with that one tenant's long-running transactions. Nobody escaped anything and no data leaked — but every other customer just got a latency spike or a timeout because of work they didn't do. The deeper you share (one process, one cache, one connection pool), the more directly one tenant's bad day becomes a fleet-wide incident.

Blast radius (a security and correctness problem)

Blast radius asks: when something goes wrong — a query bug, a credential leak, a corrupted index, a botched migration, a compromised connection — how many tenants does it touch? With a shared-schema design the answer is, in principle, all of them. A single application query that forgets its tenant_id filter returns another customer's rows. One leaked connection string is a key to the whole dataset. A migration that locks a table locks it for everyone. Per-tenant databases bound the data-leak surface (separate databases, separate credentials), and per-tenant VMs bound the resource and corruption surface too: a wedged or compromised database VM is one tenant's outage, not the platform's.

These two problems pull toward different solutions. Noisy-neighbor is mostly fixed by resource isolation — hard CPU/memory/IO ceilings the tenant can't exceed. Blast-radius is mostly fixed by boundary isolation — separate data, credentials, and a wall a fault can't cross. A microVM-per-database happens to deliver both at once: the hypervisor caps resources and enforces the boundary in the same move. That's why it's a genuinely clean unit, not just a stricter version of the same shared thing.

When full per-tenant DB isolation is worth it

Per-tenant database isolation is not the right default for everyone, and selling it as one would be dishonest. For a freemium product with a million low-value accounts, a shared schema with a rigorously enforced tenant_id and good query discipline is the correct, economical answer — you do not want a million databases. The dedicated-database (and dedicated-VM) pattern earns its cost in a specific set of situations:

  • Compliance and contractual isolation — when an enterprise contract, HIPAA, PCI, or a SOC 2 control requires that one customer's data is physically separated from another's, a WHERE clause is not an answer you want to defend in an audit. A dedicated database with dedicated credentials is.
  • Data residency — a customer who needs their data to live in a specific region or jurisdiction is far easier to satisfy when their database is a discrete thing you can place on a host in that region, rather than rows interleaved with everyone else's in a shared table.
  • Hostile or semi-trusted tenants — if tenants can run their own queries, functions, or extensions (anything approaching a database-as-a-product), you're now running untrusted database workloads side by side, and you want a hardware boundary between them for the same reason you'd want it between untrusted code. See /blog/firecracker-vs-docker for why the kernel boundary matters.
  • Large enterprise customers paying for it — the big logo that wants its own backup schedule, its own performance SLA insulated from your free tier, its own encryption keys, and a clean "delete everything" story at contract end. Per-tenant isolation is often a line item they're happy to fund.
  • Per-tenant operational independence — independent backups, restores, point-in-time recovery, version pinning, and migrations. Restoring one tenant to last Tuesday without touching anyone else is trivial with a dedicated database and genuinely hard with a shared one.

A common and sensible pattern is hybrid: shared schema for the long tail of small accounts, and dedicated per-tenant databases (or VMs) provisioned on demand for the enterprise tier and anyone who contractually requires it. You climb the ladder per customer, not for the whole platform at once.

Why a microVM is a clean unit of isolation

If you've decided a tenant needs real separation, the next question is what enforces it. A separate database on a shared Postgres instance gives you separate data and credentials but still shares the process, the cache, and the host. A separate full VM per tenant gives you everything but historically cost gigabytes of RAM and tens of seconds to boot — fine for a handful of whales, ruinous at any scale. A microVM is the middle that turned out to be the answer: a real VM with its own guest kernel, isolated by CPU hardware virtualization through KVM, but stripped down to a minimal device model so it's small and fast. The general mechanics are in /blog/what-is-a-microvm.

For a database specifically, the microVM boundary buys you three things at once. First, resource isolation: vCPU count and guest RAM are fixed at the VM boundary by the hypervisor, so a tenant's runaway query saturates its own allocation and no one else's — the noisy neighbor is contained by hardware, not by hopeful capacity planning. Second, fault and security isolation: the database runs on its own guest kernel with its own filesystem and network namespace, so a wedged process, a corrupted page cache, or a compromised connection is bounded to that one tenant's VM. Third, a clean lifecycle: provisioning is creating a VM and tearing a tenant down is destroying one, with no risk of leaving orphaned rows or schemas behind in a shared store.

How PandaStack does it: managed Postgres-16 per microVM

PandaStack's managed databases are built on exactly this model: every managed Postgres-16 database is its own dedicated Firecracker microVM with its own durable volume. There is no shared Postgres instance underneath with logical databases carved out of it — the unit of isolation is the VM, so each tenant database gets its own guest kernel, its own fixed CPU and memory allocation, and its own disk. Resource isolation and blast-radius containment come from the substrate rather than from query discipline. The full design is documented at /docs/concepts/databases.

A few specifics worth knowing. The data lives on a durable volume, not the ephemeral rootfs — so unlike a throwaway sandbox, the database persists across restarts. These databases are marked persistent: they're exempt from the idle reaper that recycles ordinary sandboxes, and they're pinned to their host (this pinning is a beta constraint). You connect two ways: a native TLS PostgreSQL connection for normal application traffic, and an HTTP query broker for environments that can't open a raw socket — edge functions and serverless runtimes that speak HTTP but not Postgres wire protocol. The native connection string looks like this:

# Create a managed Postgres-16 database (its own microVM + durable volume).
# CPU/memory are fixed by the postgres-16 template snapshot, not per-DB.
curl -sS -X POST https://api.pandastack.ai/v1/databases \
  -H "Authorization: Bearer $PANDASTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"label": "tenant-acme-prod"}'

# The response includes a connection_url. Native TLS connection:
#   postgres://pandastack:<pw>@<id>.db.pandastack.ai:5432/pandastack
# (TLS is required.) Connect with any standard client:
psql "postgres://pandastack:<pw>@<id>.db.pandastack.ai:5432/pandastack"

Provisioning a managed database is not instant — a create takes 30 to 90 seconds, because it blocks until Postgres has bootstrapped and is accepting connections, not just until the VM is up. That's the honest cost of handing back a database that's actually ready to use rather than one that's still initializing. It's a different latency class from PandaStack's plain sandbox create (snapshot-restore, p50 179ms) precisely because a database has to finish its own startup before it's useful.

The cost and ops trade-off

Strong isolation is not free, and the failure mode of over-applying it is real. Each per-tenant database VM holds memory and CPU for as long as it exists, whether the tenant is hammering it or hasn't logged in for a month. Provision one per account indiscriminately and you've built a fleet of fifty thousand mostly-idle VMs, paying full resource cost for tenants generating no value — the noisy-neighbor problem traded for a much larger idle-capacity bill.

Database-per-tenant is a tool for the tenants who need it, not a default for all of them. Reserve dedicated database VMs for the customers where isolation, compliance, residency, or contract value justifies the standing cost — and keep the long tail of small accounts on a well-disciplined shared schema. The right architecture for most platforms is a hybrid, sized so you're never paying for fifty thousand idle VMs to isolate fifty that mattered.

PandaStack's substrate softens this trade-off rather than eliminating it. Because the platform keeps no warm pool and restores from snapshots, sandbox-class workloads are cheap to spin up on demand (a same-host fork lands around 400–750ms), and an agent can address a very large number of isolated tenants per host — the NATID network model alone pre-allocates 16,384 /30 subnets per agent, so per-tenant network isolation isn't the scaling ceiling. But managed databases are deliberately persistent and pinned, so they don't auto-reap the way ephemeral sandboxes do. That's the right call for a database — you don't want your tenant's data VM garbage-collected because it was quiet overnight — but it does mean the discipline is on you: provision dedicated databases for the tenants who earn them, and don't confuse "we can isolate every tenant" with "we should."

Choosing your rung

Pick the lowest rung that satisfies your actual isolation requirement, and climb per customer when a specific tenant forces it. A shared schema with a disciplined tenant_id is the correct, economical default for the long tail. Schema-per-tenant buys cleaner migrations but barely moves noisy-neighbor or blast-radius. A dedicated database bounds the data-leak surface. And a dedicated database on its own microVM — what PandaStack's managed Postgres-16 gives you — is the rung where noisy-neighbor and blast-radius are genuinely solved, because the boundary is hardware-enforced and the resource ceiling is set by the hypervisor. Use it where compliance, residency, hostile tenants, or a paying enterprise demands it; keep everyone else on the shared rung; and you'll never wake up to one tenant's runaway query having become everyone's incident. Start with /docs/concepts/databases for the API and the operational details.

Frequently asked questions

What is the database-per-tenant pattern?

Database-per-tenant means each customer (tenant) of a multi-tenant SaaS gets its own logically separate database, rather than sharing tables with every other tenant behind a tenant_id filter. It gives real separation of data, backups, and credentials, and lets you drop a tenant by dropping a database. The strongest version runs each tenant's database on its own isolated VM, so resource limits and the fault boundary are enforced by the machine rather than by a shared process. PandaStack implements this with a dedicated Firecracker microVM and durable volume per managed Postgres-16 database.

When should I use database-per-tenant instead of a shared schema?

Use a shared schema with a disciplined tenant_id for the long tail of small accounts — it's the most economical default and you don't want thousands of idle databases. Move a specific tenant to a dedicated database (or dedicated VM) when something forces it: a compliance or contractual requirement for physical data separation (HIPAA, PCI, SOC 2 controls), data residency in a specific region, semi-trusted tenants who run their own queries or extensions, large enterprise customers paying for an isolated SLA and independent backups, or a need for per-tenant point-in-time restore. The common pattern is hybrid — shared schema for small accounts, dedicated databases provisioned on demand for the tier that requires it.

How does a microVM solve the noisy-neighbor problem for databases?

On shared infrastructure, one tenant's runaway query evicts the shared buffer cache, backs up I/O, and fills the connection pool — degrading every other tenant even though nothing escaped or leaked. A microVM fixes this because vCPU count and guest RAM are fixed at the VM boundary by the hypervisor, so a tenant's heavy query saturates only its own allocation. The database also runs on its own guest kernel with its own filesystem, so a corrupted cache or wedged process is bounded to that one VM. Resource isolation (the availability fix) and boundary isolation (the blast-radius fix) come from the same hardware boundary at once.

How do I connect to a PandaStack managed database?

Two ways. For normal application traffic, use the native PostgreSQL connection string the API returns on create — postgres://pandastack:<pw>@<id>.db.pandastack.ai:5432/pandastack — with any standard client (psql, your ORM, a driver). TLS is required. For environments that can't open a raw Postgres socket, such as edge functions and serverless runtimes that only speak HTTP, there's an HTTP query broker at the database's proxy endpoint. A managed database create takes 30 to 90 seconds because it blocks until Postgres has finished bootstrapping and is accepting connections.

What's the downside of giving every tenant their own database VM?

Cost and operational overhead. Each per-tenant database VM holds CPU and memory for as long as it exists, regardless of whether the tenant is active — provision one per account indiscriminately and you've built a fleet of mostly-idle VMs, trading the noisy-neighbor problem for a large idle-capacity bill. PandaStack's managed databases are deliberately persistent and pinned (they don't auto-reap like ephemeral sandboxes), which is correct for a data store but means the discipline is on you: reserve dedicated database VMs for tenants where isolation, compliance, residency, or contract value justifies the standing cost, and keep the long tail on a well-disciplined shared schema.

Run code in a microVM in one API call.

49ms p50 cold start. Fork, snapshot, and scale to zero.

Start free
Written by Ajay Kumar, Founder, PandaStack.