Python Advanced

RPA with Python and Playwright: Browser Automation, Async Workflows, and Scheduling from Django

Build production RPA bots in Python: log into vendor portals, scrape dashboards, fill forms, and persist sessions — orchestrated from Django via Celery. The modern, license-free alternative to UiPath and Automation Anywhere.

DjangoZen Team Apr 25, 2026 20 min read 7 views

RPA — Robotic Process Automation — is the practice of automating repetitive UI-driven tasks: logging into vendor portals, downloading invoices, filling forms, scraping dashboards, monitoring competitor pricing. Commercial tools (UiPath, Automation Anywhere, Blue Prism) charge per "bot" and add up fast. Python + Playwright gives you the same capability for free, integrated directly with your Django app.

Why Playwright in 2026

Playwright (by Microsoft) has effectively replaced Selenium as the modern browser-automation library. It drives Chromium, Firefox, and WebKit out of the box, has a clean async API, auto-waits for elements (no flaky time.sleep), and ships with a recorder that generates code from your clicks. Selenium still works; Playwright is just nicer to live with.

Setup

pip install playwright
playwright install chromium    # downloads the browser binary (~120 MB)

First automation: login and scrape a table

import asyncio, os
from playwright.async_api import async_playwright

async def fetch_invoices() -> list[dict]:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context()
        page = await context.new_page()
        await page.goto("https://vendor.example.com/login")
        await page.fill("#email", os.environ["VENDOR_USER"])
        await page.fill("#password", os.environ["VENDOR_PASSWORD"])
        await page.click("button[type=submit]")
        await page.wait_for_url("**/dashboard")
        rows = await page.query_selector_all("table.invoices tbody tr")
        out = []
        for r in rows:
            cells = await r.query_selector_all("td")
            out.append({
                "number": (await cells[0].inner_text()).strip(),
                "amount": (await cells[1].inner_text()).strip(),
            })
        await browser.close()
        return out

Auto-wait does the heavy lifting: Playwright waits for elements to be visible, stable, and actionable before interacting. No time.sleep(2), no flaky tests.

Persist auth across runs

Logging in every time is slow, triggers MFA, and looks bot-like. Save the browser context to disk after one successful login:

# First run: login + save state
context = await browser.new_context()
# ... login ...
await context.storage_state(path="/var/data/vendor_auth.json")

# Subsequent runs: reuse
context = await browser.new_context(storage_state="/var/data/vendor_auth.json")

Cookies, localStorage, and sessionStorage are restored. Encrypt that file at rest — it's the session keys to a third-party site. Never commit it to git.

Run multiple workflows in parallel

async def run_all():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        results = await asyncio.gather(
            scrape_vendor_a(browser),
            scrape_vendor_b(browser),
            scrape_vendor_c(browser),
        )
        await browser.close()
        return results

Each workflow gets its own context (isolated cookies). Three vendors in the time of one.

Schedule from Django via Celery

Run automations as Celery tasks so failures retry, results land in your DB, and you get observability for free:

from celery import shared_task
from playwright.sync_api import sync_playwright

@shared_task(bind=True, max_retries=3, retry_backoff=True, retry_jitter=True)
def scrape_vendor_invoices(self, vendor_id: int) -> int:
    vendor = Vendor.objects.get(pk=vendor_id)
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        try:
            context = browser.new_context(storage_state=vendor.auth_path)
            page = context.new_page()
            page.goto(vendor.invoices_url)
            rows = page.query_selector_all("table.invoices tbody tr")
            for r in rows:
                cells = r.query_selector_all("td")
                Invoice.objects.update_or_create(
                    vendor=vendor, number=cells[0].inner_text().strip(),
                    defaults={"amount": cells[1].inner_text().strip()})
            return len(rows)
        finally:
            browser.close()

Use Celery Beat to fire it hourly. Failures retry automatically with exponential backoff. Sentry captures every exception with the full Playwright stack trace.

Screenshot + trace on failure (your future self thanks you)

context.tracing.start(screenshots=True, snapshots=True, sources=True)
try:
    page.click("#submit")
    page.wait_for_url("**/success")
except Exception:
    context.tracing.stop(path=f"/var/log/playwright/{datetime.now():%Y%m%d-%H%M%S}.zip")
    raise

Open the zip with playwright show-trace trace.zip — you get a time-traveling debugger of the entire run: every action, every network call, screenshots between actions, and the DOM at each step. This single feature is worth the switch from Selenium.

Headless detection and stealth

Some sites block headless browsers via fingerprinting (navigator.webdriver, missing plugins, telltale flags). Mitigations:

  • Use Chromium's modern headless: chromium.launch(headless=True, args=["--headless=new"]) — much harder to detect than legacy headless.
  • The playwright-stealth plugin patches the obvious tells.
  • Set a realistic user_agent, viewport (1920×1080), and locale.
  • If the site is hostile, run a headed browser inside Xvfb on the server.

Resilient selectors — semantics over CSS

Sites change #login-btn-v3-final overnight. Semantic locators survive redesigns:

await page.get_by_role("button", name="Sign in").click()
await page.get_by_label("Email address").fill("you@example.com")
await page.get_by_text("Welcome back").wait_for()

These match how a screen-reader sees the page — accessible markup is your stable API.

Ethics and legality

Automating your own account on a third-party site you have a contract with is generally fine. Scraping a competitor's pricing might violate their Terms of Service. In hiQ v. LinkedIn (US, 2022), the courts said scraping public pages isn't a CFAA violation, but ToS breach is still a contract issue. Read the site's ToS, respect robots.txt, and rate-limit your bot to human-plausible speeds. Don't overload someone's server because you can — and don't be the reason a site adds a Cloudflare challenge that breaks legitimate users.

Production gotchas

  • Resource hungry. Each Chromium instance ≈ 150 MB RAM. Cap concurrency.
  • Timeouts everywhere. page.set_default_timeout(30_000). Slow vendor sites hang forever otherwise.
  • Always close browsers. Use context managers or try/finally — leaked browsers fill the disk with profile dirs and eventually OOM the host.
  • Run in a container. The mcr.microsoft.com/playwright/python image has every system lib preinstalled. Don't fight libnss3 dependencies on bare Ubuntu.
  • Idempotent steps. Design every workflow so re-running it is safe — use update_or_create, not blind create.
  • Backoff on rate limits. If the site returns 429, sleep — don't hammer.

When to reach for a full RPA framework

If you're building dozens of bots needing shared infrastructure (orchestration UI, audit logs, secret vault, business-user editing of workflows), look at rpaframework or Robocorp's stack. For 1–10 automations integrated with your Django app, plain Playwright + Celery + Django ORM is simpler and good enough — and you keep everything in one codebase.

Summary

Modern RPA in Python is Playwright + Celery + a sensible failure story. Auto-waiting locators, persistent auth state, async parallelism, screenshot-on-failure, and tracing get you most of the way to commercial-grade automation — at zero license cost and fully under your Django app's control. If a human can do it in a browser, you can automate it.