Database Advanced

PostgreSQL Advanced for Django: Indexing, JSONB, and Full-Text Search

Push your Django app beyond basic queries. Master B-tree vs GIN indexes, JSONB for semi-structured data, full-text search with ranking, and window functions for analytics.

DjangoZen Team Apr 17, 2026 18 min read 322 views

Most Django apps use PostgreSQL as a plain row store and leave its real power untouched. Yet the difference between a query that scans a million rows and one that seeks straight to ten is a single well-chosen index, and the difference between a slow JSON soup and a fast, queryable document store is knowing how JSONB works. This tutorial goes deep on the PostgreSQL features that matter most to Django developers: indexing, JSONB, full-text search, and the query-planning knowledge that ties them together.

Why the database is where performance is won

Application code is rarely the bottleneck in a web app — the database is. A view that feels slow is almost always waiting on a query that scans too much, lacks an index, or fires N+1 times. This means the highest-leverage performance skill a Django developer can have is understanding what PostgreSQL is actually doing with your queries. The ORM is a convenient abstraction, but it generates SQL that the database executes according to a plan, and when that plan is bad, no amount of caching or faster hardware fixes the root cause cleanly. Learning to see through the ORM to the SQL and the plan beneath is what turns mysterious slowness into a specific, fixable problem.

Reading the query plan with EXPLAIN

The single most important tool is EXPLAIN (ANALYZE, BUFFERS), which shows how PostgreSQL executes a query: the plan it chose, whether it used an index or scanned the whole table, how long each step took, and how its row estimates compared to reality. The cardinal sign of trouble is a sequential scan on a large table where you expected an index scan — PostgreSQL is reading every row because no useful index exists or its statistics are stale. Get into the habit of running EXPLAIN on any query that feels slow; it replaces speculation with a precise account of where the time and IO go, and it tells you exactly which fix — an index, a rewrite, fresher statistics — will help.

Indexes: the foundation

An index is a data structure that lets PostgreSQL find rows without scanning the whole table, and it is the difference between a query that takes milliseconds and one that takes seconds. The default B-tree index suits equality and range queries on a column — exactly what most filters and joins do. In Django you declare them in Meta.indexes or via db_index=True:

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField()
    class Meta:
        indexes = [models.Index(fields=["status", "created_at"])]

Every column you regularly filter or join on wants an index. The cost is that indexes slow writes slightly and take disk space, so index deliberately for your real query patterns rather than indexing everything.

Composite and partial indexes

A composite index covers multiple columns, and column order matters: an index on (status, created_at) efficiently serves queries filtering on status, or on status and a created-at range, but not one filtering on created_at alone — the leading column must be used. Order the columns to match how you actually query, most selective first. A partial index covers only a subset of rows defined by a condition, like WHERE status = 'active', which keeps the index small and fast when you only ever query that subset. Partial indexes are a quietly powerful tool: an index on just the active orders is a fraction of the size of one over all orders, and proportionally faster.

JSONB: structured flexibility

PostgreSQL's JSONB type stores JSON in a binary, indexed, queryable form, giving you schemaless flexibility inside a relational database. It is ideal for data whose shape varies or evolves — settings, metadata, event payloads, flexible product attributes — without forcing a migration for every new field. Django exposes it as JSONField:

class Product(models.Model):
    attributes = models.JSONField(default=dict)

# query into the JSON
Product.objects.filter(attributes__color="red")
Product.objects.filter(attributes__dimensions__width__gte=10)

You query nested keys with the double-underscore syntax, and it all happens in the database, not in Python. JSONB is the pragmatic middle ground between rigid columns and a separate document database.

Indexing JSONB

JSONB queries can be slow without the right index, because by default the database must inspect each row's JSON. A GIN index on a JSONB column makes containment and key-existence queries fast, letting PostgreSQL find matching documents without scanning. For queries that always hit a specific key, an expression index on just that key path is even tighter. The lesson mirrors ordinary columns: JSONB is only as fast as its indexing, so if you query a JSONB field at scale, add a GIN index or you have built a slow full scan. Used with the right index, JSONB delivers document-store query performance without leaving PostgreSQL.

When to use JSONB versus columns

JSONB is powerful but not a default. Data you query, filter, join, and constrain heavily belongs in proper columns, where types, foreign keys, and constraints protect integrity and the planner optimizes best. JSONB shines for the genuinely variable or sparse — attributes that differ per row, payloads you store but rarely query deeply, evolving structures you do not want to migrate constantly. A common and sound pattern is hybrid: the core, queried fields as columns, and a JSONB column for the long tail of flexible attributes. Reaching for JSONB to avoid designing a schema is a trap; using it where flexibility is genuinely needed is excellent engineering.

Full-text search

Before adding Elasticsearch, know that PostgreSQL has capable built-in full-text search — enough for most applications. It tokenizes text into lexemes, handles stemming so "running" matches "run", ranks results by relevance, and Django wraps it cleanly:

from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank

vector = SearchVector("title", weight="A") + SearchVector("body", weight="B")
query = SearchQuery("django performance")
Article.objects.annotate(rank=SearchRank(vector, query)).filter(
    rank__gte=0.1).order_by("-rank")

Weighting lets a title match outrank a body match. For real use, store the search vector in a column with a GIN index and keep it updated with a trigger, so search does not recompute vectors on every query.

Fuzzy matching with trigrams

Full-text search handles words; it does not handle typos or partial matches well. For "did you mean" behavior, autocomplete, and fuzzy matching, the pg_trgm extension breaks text into three-character chunks and measures similarity, so "djnago" still matches "django". A GIN or GiST trigram index makes these similarity queries fast. Trigram matching complements full-text search: use full-text for relevance-ranked document search, and trigrams for fuzzy name lookups, autocomplete, and tolerating the misspellings real users type. Together they cover most search needs without a separate search engine and its operational burden.

Constraints: integrity in the database

Validation in your application is necessary but not sufficient — a bug, a shell, or a second writer can bypass it, so your most important invariants belong in the database where nothing can escape them. PostgreSQL offers check constraints, unique constraints (including partial and multi-column), and exclusion constraints, all expressible in Django:

class Booking(models.Model):
    class Meta:
        constraints = [
            models.CheckConstraint(check=models.Q(end__gt=models.F("start")),
                                   name="end_after_start"),
            models.UniqueConstraint(fields=["room", "date"],
                                    name="one_booking_per_room_per_day"),
        ]

A constraint guarantees the rule holds for every row, no matter how the data got there. Treat the database as the final guardian of integrity, not just a passive store.

Transactions and locking

Transactions group operations so they all commit or all roll back, and getting them right is essential for correctness under concurrency. Wrap multi-step changes in transaction.atomic so a partial failure leaves no half-written state. For data that concurrent requests might modify together — inventory, balances — use select_for_update to lock the rows for the transaction's duration, so two requests cannot both read, both decide, and both write a conflicting result. Understanding when you need a lock, and keeping the locked section short to avoid contention, is the difference between a system that stays correct under load and one that silently corrupts data when two users act at once.

Connection management

Each PostgreSQL connection costs memory and a backend process, and at scale connection count itself becomes a bottleneck. Django's CONN_MAX_AGE reuses connections across requests rather than opening one each time, but the real answer at scale is a pooler like PgBouncer sitting between your app and the database, multiplexing many client connections onto a few real ones. Knowing how your connections are managed — persistent connections, a pooler, and the interaction between them — prevents the surprising failure where a perfectly fast database refuses new connections because too many are open. Connection management is invisible until it is suddenly the thing that takes you down.

VACUUM and table health

PostgreSQL's design leaves dead row versions behind after updates and deletes, and VACUUM reclaims that space. On busy tables, autovacuum can fall behind, causing bloat — tables growing on disk, scans slowing, indexes losing efficiency — until performance mysteriously degrades over weeks. Monitor dead-tuple counts and tune autovacuum to run more aggressively on your hottest tables. This is invisible maintenance that quietly determines whether a write-heavy database stays fast, and a surprising number of "we need to scale the database" situations are really "autovacuum has been losing the race." Keeping tables healthy is as important as indexing them.

Common table expressions and window functions

Two SQL features unlock queries that are awkward or impossible with the basic ORM. Common table expressions (CTEs, the WITH clause) let you name and compose query steps, and support recursion for hierarchical data like category trees or org charts. Window functions compute values across a set of rows related to the current one — running totals, rankings, moving averages — without collapsing them into a group. Django exposes both through expressions and Window, letting you push analytical queries into the database instead of computing them in Python. Reaching for these when a query gets convoluted often replaces a tangle of Python loops with one efficient SQL statement the database executes optimally.

Pooling and prepared statements

At scale, how connections and statements are managed matters as much as the queries themselves. A connection pooler like PgBouncer multiplexes many client connections onto a few backend ones, preventing the connection exhaustion that kills a database long before query load would. Prepared statements let PostgreSQL reuse a query plan across executions, saving planning overhead for hot queries — though their interaction with transaction-mode pooling needs care. Understanding the layer between your app and the database engine is part of advanced PostgreSQL: the fastest queries still fail if connections are mismanaged, and a database that seems slow is sometimes simply starved of available connections.

Monitoring what the database is doing

You cannot tune what you cannot see, and PostgreSQL exposes rich introspection. The pg_stat_statements extension records which queries consume the most total time — usually a small number of culprits dominate — pointing you straight at what to optimize. pg_stat_activity shows current connections and what they are waiting on, revealing lock contention and long-running queries. Watching cache hit ratios, index usage, and table bloat tells you whether the database is healthy. Making these signals visible turns database performance from a black box into something you can observe and reason about, so you optimize the queries that actually matter rather than guessing.

Partitioning large tables

When a single table grows into hundreds of millions of rows, even good indexes strain and maintenance slows. Table partitioning splits one logical table into many physical pieces, commonly by time range for append-heavy data like events or logs, so a query filtered on the partition key touches only the relevant partitions rather than scanning everything. Old partitions detach and archive cheaply, keeping the active set small. Partitioning is more operational work than indexing, so it is reserved for genuinely huge, naturally-ranged tables — but for those it is transformative, turning queries that would crawl over a monolithic table into fast scans of a small, relevant slice.

Summary

PostgreSQL is far more than a row store, and using it well is the highest-leverage performance work a Django developer can do. Learn to read query plans with EXPLAIN (ANALYZE, BUFFERS) so slowness becomes diagnosable, and index deliberately for your real query patterns — composite indexes ordered to match your filters, partial indexes for hot subsets. Reach for JSONB where flexibility is genuinely needed, always with a GIN index, but keep queried data in proper columns. Use built-in full-text search and trigram matching before adding a separate search engine, push your critical invariants into database constraints where nothing can bypass them, and handle concurrency with atomic transactions and row locks. Finally, mind connection management and table health, because a fast database undone by connection exhaustion or autovacuum bloat is just as down as a slow one. Master these and PostgreSQL will carry your app far further than most teams ever realize.

Ready to Build?

Skip the boilerplate. Get production-ready Django packages.

Browse Products