Move slow work off the request/response cycle. Architect Celery + Redis for production: workers, queues, retries with exponential backoff, periodic jobs with Beat, and live monitoring with Flower.
If a request takes longer than a second, your users feel it — and email sending, PDF generation, third-party API calls, image processing, and report exports all routinely take far longer. These belong off the request/response cycle entirely. Celery is the de-facto answer for Django, and with Redis as the broker it scales from one worker on a small VPS to hundreds across a cluster. This tutorial covers Celery the way production demands: architecture, reliable retries, scheduling, routing, monitoring, and the failure modes that bite teams who treat it as fire-and-forget.
The request/response cycle should do the minimum needed to answer the user, and nothing more. Anything slow, unreliable, or non-essential to the immediate response — sending a confirmation email, generating an invoice, calling a payment provider, resizing an upload — should happen asynchronously, after the response is already on its way. This keeps pages fast and the user experience snappy regardless of how slow the underlying work is. It also isolates failure: if the email service is down, the order still completes and the email retries later, instead of the whole request failing. Moving slow work to the background is one of the most impactful architectural decisions for both performance and resilience.
Celery has three moving parts. Your Django app is the producer: it creates tasks. The broker (Redis or RabbitMQ) is a queue that holds task messages until a worker is ready. One or more workers are separate processes that pull tasks off the queue and execute them. A result backend optionally stores task return values if you need them, and Beat is a scheduler process that emits periodic tasks on a cron-like schedule. Understanding these pieces clarifies everything else: tasks flow from your app through the broker to workers, decoupled in time and space, which is exactly what makes background processing both scalable and resilient.
pip install "celery[redis]==5.4.*" django-celery-beat django-celery-results
# djzen/celery.py
import os
from celery import Celery
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "djzen.settings")
app = Celery("djzen")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()
# djzen/__init__.py
from .celery import app as celery_app
__all__ = ("celery_app",)
With autodiscovery wired up, any tasks.py in your apps is found automatically, and you are ready to define and run tasks.
CELERY_BROKER_URL = "redis://127.0.0.1:6379/1"
CELERY_RESULT_BACKEND = "django-db"
CELERY_TASK_TIME_LIMIT = 60 * 5 # hard kill at 5 min
CELERY_TASK_SOFT_TIME_LIMIT = 60 * 4 # SoftTimeLimitExceeded at 4 min
CELERY_TASK_ACKS_LATE = True # ack after success, not on receipt
CELERY_WORKER_PREFETCH_MULTIPLIER = 1 # one task at a time per worker
Two of these are quietly critical. acks_late means a task is acknowledged only after it completes, so if a worker crashes mid-task the message is redelivered rather than lost. The prefetch setting stops a worker from greedily reserving many tasks it then sits on while running a slow one. These defaults make Celery reliable rather than lossy.
from celery import shared_task
@shared_task(bind=True, max_retries=5, autoretry_for=(IOError,),
retry_backoff=True, retry_backoff_max=600, retry_jitter=True)
def send_invoice_email(self, order_id: int) -> None:
order = Order.objects.get(pk=order_id)
pdf = render_invoice(order)
smtp.send(order.email, attachment=pdf)
send_invoice_email.delay(order.id) # fire and forget
send_invoice_email.apply_async(args=[order.id], countdown=30) # in 30s
send_invoice_email.apply_async(args=[order.id], queue="email") # specific queue
The delay shortcut and the more configurable apply_async are how you enqueue work from views, signals, or other tasks.
A rule that prevents a whole class of bugs: pass primitive identifiers to tasks, never ORM instances. A task runs later, in another process, where a serialized model object may be stale — the database may have changed since you enqueued it. Passing the primary key and re-fetching inside the task guarantees the task works with current data. It also keeps task messages small and serializable. This small discipline avoids subtle, hard-to-reproduce bugs where a task acts on outdated state, and it is one of the first things to check when a task behaves strangely. Serialize references, not objects.
Tasks fail — networks blip, APIs rate-limit, services restart — and a good task retries intelligently. The combination of autoretry_for, retry_backoff, and retry_jitter retries on specific exceptions with exponentially increasing delays (1s, 2s, 4s, 8s…) plus randomness, so a flood of failed tasks does not retry in lockstep and hammer a struggling downstream service into the ground — the "thundering herd." Cap retries so a permanently broken task does not loop forever. Critically, retry only on transient errors; retrying on a programming bug or a permanent failure just wastes resources and can amplify damage. Thoughtful retry policy is the difference between resilient background work and a self-inflicted denial of service.
Because acks_late means a task can be redelivered after a worker crash, and retries re-run tasks, a task may execute more than once. If running it twice causes harm — two emails, two charges, two inventory decrements — you have a bug. Design tasks to be idempotent: running them again produces the same result as running them once. Check whether the work was already done before doing it, or key the side effect on a unique value so duplicates are no-ops. This mirrors the guarantee you need in any at-least-once system, and in Celery it is essential precisely because the reliability features that prevent lost tasks also make duplicate execution possible.
Scheduled work — nightly cleanups, hourly syncs, daily reports — runs through Celery Beat. With django-celery-beat, schedules live in the database and are editable from the Django admin:
CELERY_BEAT_SCHEDULER = "django_celery_beat.scheduler:DatabaseScheduler"
You add a crontab schedule in the admin and bind it to a periodic task. Run Beat as a single, separate process: celery -A djzen beat -l info. The one rule that matters: never run more than one Beat instance, or every scheduled task fires multiple times. Beat is the scheduler; the workers still do the actual work it triggers.
A single queue works until a slow task type starves a fast one — a batch of multi-minute PDF renders blocking quick notification emails. Routing different task types to different queues, served by different worker pools, keeps them isolated. Fast, latency-sensitive work gets its own queue and dedicated workers; slow, heavy work gets another. This prevents head-of-line blocking and lets you scale each kind of work independently — more workers for the heavy queue without touching the fast one. Queue routing is how a Celery deployment grows from "it works" to "it works predictably under mixed load," and it is worth setting up before a slow task type degrades everything behind it.
Beyond separate queues, Celery supports task rate limits and, with the right broker setup, priorities. Rate limiting a task type protects a fragile downstream service — "no more than ten of these per minute" — so background work does not overwhelm an external API. Priorities let urgent tasks jump ahead of a backlog of routine ones. These controls matter when your task workload is heterogeneous and some work is more time-sensitive or more dangerous to a dependency than other work. Used together with queue routing, they give you fine control over how your background workload consumes resources and pressures the systems it touches.
Real processes are often multi-step, and Celery's canvas primitives compose tasks into workflows. A chain runs tasks in sequence, passing each result to the next. A group runs many tasks in parallel. A chord runs a group and then a callback once all of them finish — fan out work, then aggregate. These let you express "process these hundred items in parallel, then send a summary" declaratively rather than wiring it by hand. Understanding the canvas turns Celery from a way to run isolated tasks into a way to orchestrate whole asynchronous workflows, while keeping each step independently retryable and observable.
Background work is invisible without monitoring — you cannot see a queue backing up or tasks failing the way you see a slow page. Flower is a web dashboard for Celery showing active tasks, queue lengths, success and failure rates, and worker status in real time:
celery -A djzen flower --port=5555
It lets you watch a deployment's task flow, spot a growing backlog before it becomes a problem, and inspect failures. Pair it with metrics on queue depth and task duration in your main observability stack, and alert when a queue grows unbounded — the classic sign that workers cannot keep up with producers.
Some tasks will never succeed — bad data, a deleted record, a genuine bug. Without a plan, these either retry forever or vanish silently. Configure a dead-letter destination or a failure handler so exhausted tasks are recorded for human attention rather than lost, and monitor that failure stream. A task that has exhausted its retries is telling you something is wrong, and you want to know. Handling permanent failure deliberately — capturing it, alerting on it, and giving yourself a way to inspect and reprocess — is what separates a robust background system from one that quietly drops work and leaves you to discover the gap when a customer complains.
Workers are long-running processes managed like any service — typically under systemd or in containers — with concurrency tuned to the work: more concurrency for I/O-bound tasks that mostly wait, fewer for CPU-heavy ones that saturate cores. Scale horizontally by adding workers, and scale specific queues by adding workers dedicated to them. Watch memory, since some tasks leak or accumulate, and configure workers to restart after a number of tasks to bound the damage. A deploy must drain or restart workers gracefully so in-flight tasks are not lost. Treating workers as first-class production services, monitored and scaled deliberately, is what keeps background processing reliable as load grows.
A few mistakes recur. Passing model objects instead of IDs, leading to stale data. Non-idempotent tasks that misbehave when retried or redelivered. Retrying on permanent errors and wasting resources. Running multiple Beat instances and firing scheduled tasks repeatedly. A single queue where slow tasks starve fast ones. No monitoring, so a backlog grows unseen until the queue is hopeless. Each of these is avoided by understanding that Celery is an at-least-once, distributed system where tasks run later, elsewhere, possibly more than once — design for that reality and the pitfalls disappear.
The broker is the heart of a Celery deployment, and the two main choices trade simplicity against guarantees. Redis is simple, fast, and likely already in your stack, making it the common default — but it is primarily an in-memory store, so its durability guarantees are weaker and a Redis failure can lose queued tasks unless configured carefully. RabbitMQ is a purpose-built message broker with stronger delivery guarantees and richer routing, at the cost of being another system to operate. For most applications Redis is the pragmatic choice; reach for RabbitMQ when you need its stronger durability and routing, and understand the durability implications of whichever you pick.
Celery can store each task's return value in a result backend, but many tasks are fire-and-forget and do not need it — and storing results you never read is pure overhead, filling storage with data nobody consumes. Enable the result backend only for tasks whose outcome you actually retrieve, and set result expiry so stored results do not accumulate forever. For the common case of a task that just does work and reports nothing back, skipping result storage is both faster and cleaner. Being deliberate about whether each task needs its result stored avoids a quiet source of wasted resources in busy Celery deployments.
Long-running worker processes can accumulate memory over many tasks — through leaks in task code or libraries, or fragmentation — until a worker grows large enough to cause problems. The standard defense is recycling workers after a configured number of tasks, so each is periodically replaced fresh, bounding how much memory any single worker can accumulate. This trades a little restart overhead for stable memory usage over time. Watching worker memory as a first-class metric, and recycling on a sensible interval, prevents the slow memory creep that otherwise leads to mysterious worker restarts or out-of-memory kills in a long-running Celery deployment.
With Redis as the broker, a subtle setting catches teams out: the visibility timeout, which governs how long a task can run before Redis considers it abandoned and redelivers it to another worker. If a legitimate long task runs longer than the visibility timeout, Redis hands it to a second worker while the first is still working, and the task runs twice. Set the visibility timeout comfortably above your longest task's duration, and design tasks to be idempotent as a backstop. This interaction between task duration and broker configuration is a classic source of mysterious duplicate execution, and understanding it is part of running Celery on Redis reliably.
Celery moves slow, unreliable, and non-essential work off the request/response cycle, keeping pages fast and isolating failure. Its architecture — producer, broker, workers, Beat — decouples task creation from execution, and the settings that make it reliable (acks_late, sensible prefetch, time limits) ensure tasks are redelivered rather than lost. Pass IDs not objects, make tasks idempotent because they can run more than once, and retry only transient errors with backoff and jitter to avoid hammering downstream services. Schedule recurring work with a single Beat instance, route task types to separate queues so slow work does not starve fast work, compose multi-step processes with chains and chords, and monitor everything with Flower and queue-depth alerts so a backlog never grows unseen. Handle permanent failures deliberately, and deploy workers as the first-class services they are. Treat Celery as the distributed, at-least-once system it is, and it will carry your background work reliably from one worker to a cluster.