Database Advanced

Scaling Django Databases: Read Replicas, PgBouncer Connection Pooling, and Database Routers

When one PostgreSQL box stops keeping up, scale reads horizontally. Add streaming replicas, route reads with a Django database router, pool connections through PgBouncer, and handle replication lag without serving stale data.

DjangoZen Team Jun 06, 2026 15 min read 4 views

A single PostgreSQL primary handles a surprising amount of load — until it doesn't. Before you reach for sharding (hard) or a different database (risky), you can usually multiply your read capacity with replicas and fix connection exhaustion with PgBouncer. This is the production playbook for scaling Django's database tier without rewriting your app: replicas, routing, pooling, and the subtle bugs each one introduces.

Diagnose before you scale

Scaling the wrong thing wastes weeks. First find out why the database is the bottleneck. Is it read-bound (most web apps), write-bound, connection-bound, or just running unindexed queries? Check pg_stat_statements for the queries consuming the most total time, watch pg_stat_activity for connection counts and lock waits, and confirm your hot queries actually use indexes. Half the time "the database is slow" is a missing index or an N+1 in the ORM, and no amount of replicas will fix that — it will just spread the same bad query across more machines.

Streaming read replicas

PostgreSQL streams the primary's write-ahead log (WAL) to one or more read-only replicas in near real time. Writes still go to the primary; reads spread across replicas. For read-heavy workloads — which is almost every web app — this is the highest-leverage scaling move available, and it requires no application rewrite, only routing. You can add replicas behind a load balancer and scale reads roughly linearly until the primary's write throughput becomes the ceiling.

A Django database router

Declare the connections, then route reads to a replica and writes to the primary with a router:

DATABASES = {
    "default": {...},                      # primary, read-write
    "replica": {..., "TEST": {"MIRROR": "default"}},
}

class PrimaryReplicaRouter:
    def db_for_read(self, model, **hints):  return "replica"
    def db_for_write(self, model, **hints): return "default"
    def allow_relation(self, a, b, **hints): return True
    def allow_migrate(self, db, app_label, **hints):
        return db == "default"             # migrate only the primary
DATABASE_ROUTERS = ["app.routers.PrimaryReplicaRouter"]

With several replicas, randomize the read target inside db_for_read or, better, put the replicas behind a single load-balanced endpoint and let it spread the load. Keep allow_migrate pinned to the primary so schema changes never run against a read-only node.

Replication lag — the read-your-writes trap

Replicas are milliseconds behind, but milliseconds are enough to break a user's mental model: they save their profile (primary), the next page reads from a replica that has not caught up, and their change appears to have vanished. This "read-your-writes" violation is the number-one bug introduced by replicas. The fix is to pin reads to the primary immediately after a write:

from django.db import transaction

with transaction.atomic():        # everything inside hits the primary
    profile.save()
    return render_profile(Profile.objects.get(pk=profile.pk))

Or force a single query onto the primary explicitly when you know you just wrote:

Profile.objects.using("default").get(pk=pk)

A common pattern is "sticky primary": for a few seconds after a user writes, route all their reads to the primary, then relax back to replicas. Store a short per-user flag in Redis and check it in your router. Always monitor lag with pg_stat_replication and alert if it grows — a replica that falls minutes behind is serving stale data to everyone.

PgBouncer: connection pooling

Each gunicorn worker holds its own database connections, and PostgreSQL forks a backend process per connection. At a few hundred connections the server spends more time context-switching between processes than doing work, and memory balloons. PgBouncer sits between Django and PostgreSQL and multiplexes thousands of client connections onto a small pool of real backend connections:

[databases]
djzen = host=127.0.0.1 port=5432 dbname=djzen

[pgbouncer]
pool_mode = transaction       # the sweet spot for web apps
max_client_conn = 2000
default_pool_size = 20        # real backend connections per database

Twenty real connections can now serve two thousand clients, because web requests hold a connection only for the brief duration of a transaction.

The transaction-mode caveat

Transaction pooling is what makes PgBouncer so effective, but it changes the rules: a connection is handed back to the pool after each transaction, so anything that relies on session state across transactions breaks. Session-level SET, advisory locks held across statements, server-side prepared statements, and LISTEN/NOTIFY will not behave as expected. The Django-specific consequence: disable persistent connections, because PgBouncer is doing the pooling now:

DATABASES = {"default": {..., "CONN_MAX_AGE": 0}}

If you genuinely need session features, route those specific operations to a second connection in session pool mode, and keep the bulk of your traffic on the transaction-mode pool.

When the primary's writes are the ceiling

Replicas scale reads, not writes — every write still lands on the single primary. When write throughput is the limit, your options escalate in cost: batch writes to reduce transaction overhead, move high-volume append-only data (events, logs) to a separate store, introduce a write-behind queue so spikes are absorbed asynchronously, and only as a last resort shard by tenant or key across multiple primaries. Sharding is a large project that touches every query, so exhaust the cheaper options first — most apps never need it.

Indexing is still the cheapest win

No amount of hardware beats a missing index. Before and during any scaling work, keep auditing: every column you filter or join on should be indexed; composite indexes should lead with the most selective column; and partial indexes (WHERE active) keep hot subsets small. EXPLAIN (ANALYZE, BUFFERS) on your slowest queries tells you whether you are scanning or seeking. A correctly indexed single primary often outperforms a poorly indexed cluster.

Monitoring the database tier

You cannot scale what you cannot see. Track replication lag per replica, connection counts at both PgBouncer and PostgreSQL, cache-hit ratio, the slowest queries from pg_stat_statements, and lock-wait events. Alert on lag crossing a threshold and on the connection pool saturating — both are early warnings of a tier about to fall over. Put these on the same dashboard as your app latency so you can correlate a slow endpoint with a database event.

Rollout checklist

  • Confirm the bottleneck is reads (and not a missing index) with pg_stat_statements.
  • Add a replica; verify pg_stat_replication shows low, stable lag.
  • Ship the router; hunt for read-your-writes bugs on every post-write page.
  • Add sticky-primary for recently-writing users if those bugs appear.
  • Front everything with PgBouncer in transaction mode; set CONN_MAX_AGE = 0.
  • Load-test, then add replicas behind a load balancer as traffic grows.
  • Wire lag and connection-saturation alerts before you rely on the setup.

Caching: the scaling layer before replicas

The fastest query is the one you never run. Before adding replicas, make sure you are not asking the database the same question thousands of times a second. A Redis cache in front of expensive, slow-changing reads — a homepage's product list, a user's permission set, a dashboard's aggregates — can remove the majority of read load that would otherwise push you toward replicas in the first place. Cache at the right granularity: per-view for whole pages, fragment caching for expensive template pieces, and low-level cache.get_or_set for individual querysets. The hard part is invalidation; key entries by the data they depend on and bust them on write, or use short TTLs where slightly-stale is acceptable. Often a day of caching work buys more headroom than a month of database scaling — and it is cheaper to operate.

Materialized views for expensive reads

Some reads are inherently heavy: multi-table aggregations for reporting, leaderboards, analytics rollups. Running them on every request is wasteful when the underlying data changes slowly. A PostgreSQL materialized view computes the result once and stores it like a table, and you refresh it on a schedule. Reads against it are fast and predictable, and the refresh runs off the request path. Use REFRESH MATERIALIZED VIEW CONCURRENTLY so reads are not blocked during the refresh. This pattern moves the cost of an expensive query from "every read" to "once per refresh interval," which for reporting data is exactly the trade you want.

Partitioning large tables

When a single table grows into the hundreds of millions of rows, even well-indexed queries slow down and maintenance operations crawl. PostgreSQL table partitioning splits one logical table into many physical partitions — commonly by time range (one partition per month) for append-heavy data like events, orders, or logs. Queries that filter on the partition key touch only the relevant partitions (partition pruning), so a query for last week reads one small partition instead of scanning everything. Old partitions can be detached and archived cheaply, which keeps the hot set small. Partitioning is more operational work than indexing, so reserve it for genuinely huge, naturally time-ordered tables — but for those, it is transformative.

Vacuum, bloat, and autovacuum tuning

PostgreSQL's MVCC model means updates and deletes leave behind dead row versions that VACUUM reclaims. On a busy database, default autovacuum settings often cannot keep up, and tables bloat — growing on disk, slowing scans, and degrading index efficiency — until performance mysteriously declines. Monitor dead-tuple counts and table bloat, and tune autovacuum to run more aggressively on your hottest tables (lower the scale factor so it triggers sooner). This is invisible maintenance that quietly determines whether a write-heavy database stays fast over months. A surprising number of "we need to scale the database" situations are really "autovacuum has been losing the race for weeks."

Isolating analytics on a dedicated replica

Analysts and dashboards run heavy, long-running queries that can starve your transactional workload of resources. Rather than let a sprawling report lock up the database serving customers, point all reporting and BI traffic at a dedicated replica tuned for it. The transactional primary and its low-latency replicas stay responsive for the app, while the reporting replica absorbs the analytical load. This separation of workloads is one of the cleanest uses of replication — it is not about read capacity at all, but about preventing two very different query patterns from interfering with each other.

Failover and high availability

Scaling and availability overlap: once you run replicas, you are most of the way to a high-availability setup. With a tool like Patroni or your cloud provider's managed failover, a replica can be promoted to primary automatically when the primary fails, turning a multi-hour outage into a brief blip. The application side matters too — your connection layer (often PgBouncer plus a virtual IP or service endpoint) must repoint to the new primary, and your app must tolerate the short window of failed writes gracefully with retries. Design for the primary disappearing, because eventually it will, and the difference between a planned failover and an unplanned one is whether you tested it.

Capacity planning

Scaling is not a one-time event; it is a continuous read of headroom against growth. Watch the trend lines — connection saturation, replication lag under peak, CPU and IO on the primary, cache hit ratio — and act before they hit the ceiling, not after. Knowing your growth rate tells you whether you have months or weeks of runway, and which lever to pull next: another replica, a bigger primary, more caching, partitioning, or finally sharding. The teams that scale calmly are the ones watching these numbers continuously; the ones firefighting are the ones who looked only when the database was already on fire.

Summary

Read replicas plus a Django database router give you horizontal read scaling almost for free, and PgBouncer in transaction mode stops connection storms from killing PostgreSQL. The two bugs to watch are replication lag — pin post-write reads to the primary, optionally with sticky-primary, and your users never notice the replicas exist — and transaction-mode's loss of session state, which means setting CONN_MAX_AGE = 0 and keeping session-dependent features off the pooled path. Diagnose before you scale, keep indexing relentlessly, monitor lag and connections, and save sharding for the rare day replicas and pooling genuinely run out — most apps never reach it.