AI & LLMs Intermediate

Choosing a Vector Database — pgvector vs Pinecone vs Weaviate vs Qdrant

When your existing Postgres is enough, when to graduate to a dedicated vector DB, and how the major options compare for Django apps.

DjangoZen Team May 09, 2026 8 min read 3 views

The honest answer

For 90% of Django apps building RAG or semantic search: use pgvector. Your existing Postgres is fine, you avoid extra infrastructure, your data stays in one place, and the performance is more than enough up to millions of vectors.

The other vector databases exist for the 10% of cases where pgvector hits a real wall. This tutorial helps you tell which case you're in.

What a vector database does

It stores high-dimensional vectors (typically 256–3072 floats per item) and answers queries like "give me the K vectors most similar to this query vector." The non-trivial part is doing this fast on millions of items — naive scan is too slow, so they use approximate nearest neighbour algorithms (HNSW, IVF, etc.).

Beyond raw search, modern vector DBs offer:

  • Metadata filtering — "find similar vectors where category = 'X' and date > 2024"
  • Hybrid search — combine vector similarity with keyword (BM25) search
  • Versioning / namespaces — separate indexes for different tenants or environments
  • Distributed scaling — horizontal scale beyond one machine

The four players

pgvector (Postgres extension)

  • Pros: zero new infrastructure, transactional with your other data, free, simple, supports HNSW + IVF, metadata filtering via standard SQL WHERE clauses, mature
  • Cons: vertical scaling only (one Postgres instance), tuning HNSW is a learning curve, joins with very large vector tables can be slow
  • Best for: Django apps under ~10M vectors, single-tenant or modest multi-tenant, teams that already run Postgres
  • Cost: free (you pay for Postgres anyway)

Pinecone

  • Pros: fully managed, fast onboarding, scales to billions of vectors, good metadata filtering, namespaces for multi-tenancy
  • Cons: SaaS lock-in, additional service to monitor, costs add up at scale, your vector data lives outside your Postgres
  • Best for: very large indexes, multi-tenant SaaS where each customer has millions of vectors, teams that want zero database management
  • Cost: ~$70/month minimum for production tier; usage-based above

Weaviate

  • Pros: open source, can self-host or use cloud, strong hybrid search (vector + keyword), GraphQL API, modular ML model integration
  • Cons: more operational overhead than Pinecone, conceptual model is heavier than pgvector
  • Best for: teams that want open-source with hybrid search baked in, complex multi-modal use cases (text + images)
  • Cost: free if self-hosted; cloud tier varies

Qdrant

  • Pros: open source, written in Rust (very fast), simple API, excellent metadata filtering, can self-host or use cloud
  • Cons: newer than the others, smaller ecosystem, separate service to operate if self-hosted
  • Best for: high-throughput search, teams comfortable with Rust ecosystem, performance-critical applications
  • Cost: free if self-hosted; cloud starts ~$25/month

When to upgrade from pgvector

Real signals you've outgrown it:

  1. Vector table > 10M rows AND latency p95 > 500ms — even with HNSW, very large tables on a single Postgres start to hurt
  2. You need true horizontal scaling — pgvector tops out at one instance; sharding it manually is painful
  3. Multi-tenant where each tenant has millions of vectors — namespace management in Pinecone/Qdrant is purpose-built for this
  4. Hybrid search with sophisticated reranking is core to your product — Weaviate makes this easier than rolling your own
  5. Your DBA wants the vector workload off the main OLTP database — totally valid, vector search can be CPU-heavy and disrupt other queries

If none of these apply, stay on pgvector. Migrating later is a real but bounded project (a few weeks).

A rough cost comparison at common sizes

For 1M vectors of dimension 1536, queried at 10 QPS:

Option Monthly cost Operational overhead
pgvector on existing Postgres $0 (incremental) Low
pgvector on dedicated Postgres (4 vCPU / 16GB) ~$70 (Hetzner CPX41) Low
Pinecone Standard ~$140+ Zero
Weaviate Cloud ~$100+ Low
Qdrant Cloud ~$50+ Low
Self-hosted Qdrant on Hetzner ~$30 (CPX31) Medium

For a Django app early in its life, the gap between "free pgvector" and "$70+ managed service" is meaningful. For a mature product with millions of users, $140 a month is a rounding error and the operational savings of a managed service usually win.

A pragmatic Django decision tree

Are you building a vector search feature?
├── No → don't pick anything yet
└── Yes
    ├── Already running Postgres? (almost always yes for Django)
    │   └── Yes → start with pgvector
    └── Vector count > 10M and growing fast?
        └── Yes → evaluate Pinecone or Qdrant

That's it. Don't pick a vector database before you have a problem; pgvector is good enough for the first chapter of almost every product.

Migrating later isn't fatal

If you start with pgvector and outgrow it, your DocumentChunk model abstracts the storage. You can swap the retrieval implementation:

# myapp/rag/retrieve.py
from django.conf import settings

if settings.VECTOR_BACKEND == "pgvector":
    from .backends.pgvector_backend import retrieve
elif settings.VECTOR_BACKEND == "qdrant":
    from .backends.qdrant_backend import retrieve

The work to migrate is mostly bulk-export-and-import of vectors plus rewriting the retrieve function. Plan for a weekend, not a quarter.

Summary

Vector database FOMO is real. The honest answer is that pgvector handles most Django workloads, costs nothing additional, and keeps your stack simple. Reach for the bigger systems when you have measurable problems pgvector can't solve — not before.