Build production RPA bots in Python: log into vendor portals, scrape dashboards, fill forms, and persist sessions — orchestrated from Django via Celery. The modern, license-free alternative to UiPath and Automation Anywhere.
RPA — Robotic Process Automation — is the practice of automating repetitive UI-driven tasks: logging into vendor portals, downloading invoices, filling forms, scraping dashboards, monitoring competitor pricing. Commercial tools (UiPath, Automation Anywhere, Blue Prism) charge per "bot" and add up fast. Python + Playwright gives you the same capability for free, integrated directly with your Django app.
Playwright (by Microsoft) has effectively replaced Selenium as the modern browser-automation library. It drives Chromium, Firefox, and WebKit out of the box, has a clean async API, auto-waits for elements (no flaky time.sleep), and ships with a recorder that generates code from your clicks. Selenium still works; Playwright is just nicer to live with.
pip install playwright
playwright install chromium # downloads the browser binary (~120 MB)
import asyncio, os
from playwright.async_api import async_playwright
async def fetch_invoices() -> list[dict]:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context()
page = await context.new_page()
await page.goto("https://vendor.example.com/login")
await page.fill("#email", os.environ["VENDOR_USER"])
await page.fill("#password", os.environ["VENDOR_PASSWORD"])
await page.click("button[type=submit]")
await page.wait_for_url("**/dashboard")
rows = await page.query_selector_all("table.invoices tbody tr")
out = []
for r in rows:
cells = await r.query_selector_all("td")
out.append({
"number": (await cells[0].inner_text()).strip(),
"amount": (await cells[1].inner_text()).strip(),
})
await browser.close()
return out
Auto-wait does the heavy lifting: Playwright waits for elements to be visible, stable, and actionable before interacting. No time.sleep(2), no flaky tests.
Logging in every time is slow, triggers MFA, and looks bot-like. Save the browser context to disk after one successful login:
# First run: login + save state
context = await browser.new_context()
# ... login ...
await context.storage_state(path="/var/data/vendor_auth.json")
# Subsequent runs: reuse
context = await browser.new_context(storage_state="/var/data/vendor_auth.json")
Cookies, localStorage, and sessionStorage are restored. Encrypt that file at rest — it's the session keys to a third-party site. Never commit it to git.
async def run_all():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
results = await asyncio.gather(
scrape_vendor_a(browser),
scrape_vendor_b(browser),
scrape_vendor_c(browser),
)
await browser.close()
return results
Each workflow gets its own context (isolated cookies). Three vendors in the time of one.
Run automations as Celery tasks so failures retry, results land in your DB, and you get observability for free:
from celery import shared_task
from playwright.sync_api import sync_playwright
@shared_task(bind=True, max_retries=3, retry_backoff=True, retry_jitter=True)
def scrape_vendor_invoices(self, vendor_id: int) -> int:
vendor = Vendor.objects.get(pk=vendor_id)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
try:
context = browser.new_context(storage_state=vendor.auth_path)
page = context.new_page()
page.goto(vendor.invoices_url)
rows = page.query_selector_all("table.invoices tbody tr")
for r in rows:
cells = r.query_selector_all("td")
Invoice.objects.update_or_create(
vendor=vendor, number=cells[0].inner_text().strip(),
defaults={"amount": cells[1].inner_text().strip()})
return len(rows)
finally:
browser.close()
Use Celery Beat to fire it hourly. Failures retry automatically with exponential backoff. Sentry captures every exception with the full Playwright stack trace.
context.tracing.start(screenshots=True, snapshots=True, sources=True)
try:
page.click("#submit")
page.wait_for_url("**/success")
except Exception:
context.tracing.stop(path=f"/var/log/playwright/{datetime.now():%Y%m%d-%H%M%S}.zip")
raise
Open the zip with playwright show-trace trace.zip — you get a time-traveling debugger of the entire run: every action, every network call, screenshots between actions, and the DOM at each step. This single feature is worth the switch from Selenium.
Some sites block headless browsers via fingerprinting (navigator.webdriver, missing plugins, telltale flags). Mitigations:
chromium.launch(headless=True, args=["--headless=new"]) — much harder to detect than legacy headless.playwright-stealth plugin patches the obvious tells.user_agent, viewport (1920×1080), and locale.Xvfb on the server.Sites change #login-btn-v3-final overnight. Semantic locators survive redesigns:
await page.get_by_role("button", name="Sign in").click()
await page.get_by_label("Email address").fill("you@example.com")
await page.get_by_text("Welcome back").wait_for()
These match how a screen-reader sees the page — accessible markup is your stable API.
Automating your own account on a third-party site you have a contract with is generally fine. Scraping a competitor's pricing might violate their Terms of Service. In hiQ v. LinkedIn (US, 2022), the courts said scraping public pages isn't a CFAA violation, but ToS breach is still a contract issue. Read the site's ToS, respect robots.txt, and rate-limit your bot to human-plausible speeds. Don't overload someone's server because you can — and don't be the reason a site adds a Cloudflare challenge that breaks legitimate users.
page.set_default_timeout(30_000). Slow vendor sites hang forever otherwise.try/finally — leaked browsers fill the disk with profile dirs and eventually OOM the host.mcr.microsoft.com/playwright/python image has every system lib preinstalled. Don't fight libnss3 dependencies on bare Ubuntu.update_or_create, not blind create.If you're building dozens of bots needing shared infrastructure (orchestration UI, audit logs, secret vault, business-user editing of workflows), look at rpaframework or Robocorp's stack. For 1–10 automations integrated with your Django app, plain Playwright + Celery + Django ORM is simpler and good enough — and you keep everything in one codebase.
Modern RPA in Python is Playwright + Celery + a sensible failure story. Auto-waiting locators, persistent auth state, async parallelism, screenshot-on-failure, and tracing get you most of the way to commercial-grade automation — at zero license cost and fully under your Django app's control. If a human can do it in a browser, you can automate it.