DevOps Advanced

Profiling Django in Production: py-spy, django-silk, and Flame Graphs

Stop guessing why your app is slow. Profile a live Django process without restarting it using py-spy, trace per-request queries and timings with django-silk, read flame graphs, and turn findings into concrete fixes.

DjangoZen Team Jun 06, 2026 3 min read 5 views

"The site feels slow" is not a bug report you can act on. Profiling turns a vague complaint into a precise answer: this function, on this query, for 800ms. The good news — you can profile a running production process without restarting it or adding a single line of code. Here's how.

py-spy: zero-instrumentation sampling

py-spy is a sampling profiler that attaches to a live Python process by PID and reads its stacks from the outside. No code changes, no restart, negligible overhead — safe to run on production.

pip install py-spy

# live top-like view of where a worker spends time
py-spy top --pid 2372612

# record 30s and emit an interactive flame graph
py-spy record -o profile.svg --pid 2372612 --duration 30

Use py-spy dump --pid <PID> to instantly see the stack of a hung worker — invaluable for diagnosing a request stuck on a lock or a slow external call.

Reading a flame graph

A flame graph stacks call frames: the x-axis is time spent (wider = more), not chronological order; the y-axis is call depth. You read it by scanning for the widest bars — those are your hot paths. A wide plateau near the top is a single function eating CPU; a wide bar that's all database driver code means you're query-bound, not CPU-bound.

django-silk: per-request query tracing

Where py-spy shows CPU, django-silk shows the request lifecycle — every SQL query, its duration, and crucially duplicate queries (the N+1 smoking gun).

pip install django-silk
# settings.py
INSTALLED_APPS += ["silk"]
MIDDLEWARE = ["silk.middleware.SilkyMiddleware"] + MIDDLEWARE

Silk's UI shows "23 queries, 19 duplicates" on a page that should run two — an instant N+1 diagnosis. Fix it with select_related/prefetch_related and watch the count collapse. Run Silk on staging or behind admin-only access; it adds overhead and stores request data.

A real profiling workflow

  1. Confirm the symptom with metrics (p95 latency, slow-query log) — profile the right thing.
  2. py-spy top on a busy worker: CPU-bound or waiting?
  3. If waiting on the DB, django-silk the slow endpoint to find the offending queries.
  4. If CPU-bound, record a flame graph and attack the widest bar.
  5. Fix one thing, measure again. Never optimize without a before/after number.

Summary

Profile, don't guess. py-spy attaches to live production processes for CPU and hung-worker diagnosis with no code changes; django-silk exposes per-request queries and the duplicate-query N+1 patterns that dominate Django slowness; flame graphs point you straight at the widest, hottest path. Change one thing at a time and always measure the delta — that discipline is what separates real optimization from superstition.