How WAFs work, the classes of bypass techniques attackers use, and the defensive controls that don't rely solely on signature matching.
A Web Application Firewall sits in front of your application and inspects HTTP traffic. For each request, it makes a decision: allow, log, or block — based on rules. Rules range from simple regex matches to ML-based anomaly detection.
What WAFs are good at:
What WAFs are bad at:
A WAF is a force multiplier for your other defenses, not a substitute. The best mental model: WAF buys you time and filters noise; your application code provides the real defense.
Sit in front of your origin servers. Decrypt TLS, inspect, re-encrypt to your origin. Operate at line rate, scale horizontally, integrate with CDN and DDoS protection.
Run as a module in your reverse proxy. Self-hosted, open-source rule sets (OWASP CRS).
Run inside the application process. See the parsed request, the framework's interpretation, the actual SQL queries being run. Better signal but more invasive.
Native to the cloud platform. Tight integration with load balancers, IAM, observability.
The most widely deployed open WAF rule set. Categories:
Paranoia levels 1-4 control sensitivity. Higher = more false positives. Production typically runs at 1-2.
The same way they get past every static filter: encoding, encoding, encoding.
SELECT vs SeLeCt vs select
Defense: case-insensitive matching (standard in CRS).
URL encoding, double URL encoding, Unicode normalization, HTML entities:
<script> → vulnerable string
%3Cscript%3E → URL encoded, often catches
%253Cscript%253E → double URL encoded, sometimes passes
<script> → HTML entities, sometimes passes
Defense: normalize input before matching. Multiple decoding passes.
UNION SELECT password FROM users
UN/**/ION SELECT password FROM users
UNION%0aSELECT password FROM users -- newline injection
UNION%23%0aSELECT -- comment then newline
Defense: strip comments before matching, but careful — comments aren't always safe to remove from genuine queries.
ad' + 'min — string concatenation in JavaScript
admi%6e — partial URL encoding
char(97,100,109,105,110) — char-by-char SQL
Defense: regex against patterns rather than literal strings.
Content-Type: application/json; charset=utf-7
Force charset that decodes attack strings differently than the WAF expects.
Two parsers disagree on request boundaries → smuggle hidden request past the WAF.
HTTP/0.9 GET request — many WAFs don't handle this
HTTP/2 with smuggled headers
WebSocket framing tricks
WAF parses multipart/form-data differently than your app → fields the WAF didn't see make it to the app.
Content-Type: multipart/form-data; boundary=---boundary
-----boundary
Content-Disposition: form-data; name="file"; filename="x.txt"
Content-Type: text/plain
UNION SELECT password FROM users
-----boundary--
Defense: process body the same way at WAF and app. Use the same parser.
If your origin server's IP is discoverable (via DNS history, certificate transparency logs, or accidental disclosure), attackers connect directly, bypassing the WAF.
Defense: - Whitelist Cloudflare/AWS WAF IPs at your firewall, drop all other inbound on port 443 - Use AWS WAF's strict origin protection - Cloudflare's authenticated origin pulls
Stacking is the point. Each layer alone is bypassable. The combination is hard.
from pydantic import BaseModel, Field, EmailStr
class RegistrationRequest(BaseModel):
email: EmailStr
password: str = Field(min_length=12, max_length=128)
name: str = Field(min_length=1, max_length=100, pattern=r'^[\w\s\-\']+$')
Validate everything before it touches business logic. Reject early.
Django's ORM does this automatically. For raw SQL:
# WRONG
cursor.execute(f"SELECT * FROM users WHERE email = '{email}'")
# RIGHT
cursor.execute("SELECT * FROM users WHERE email = %s", [email])
Django templates auto-escape. Where you need to opt out:
{{ user_input }} <!-- escaped -->
{{ user_input|safe }} <!-- not escaped — DANGEROUS -->
{{ user_input|escape }} <!-- explicit -->
{{ user_input|escapejs }} <!-- for JavaScript context -->
Use the right filter for the right context. HTML, JS, attribute, URL all have different escape rules.
A strict Content Security Policy stops XSS even when it gets past your output encoding:
CSP_DEFAULT_SRC = ("'self'",)
CSP_SCRIPT_SRC = ("'self'", "'nonce-{nonce}'") # Generate per-request nonce
CSP_STYLE_SRC = ("'self'", "'nonce-{nonce}'")
CSP_IMG_SRC = ("'self'", "data:", "https:")
CSP_FONT_SRC = ("'self'", "https://fonts.gstatic.com")
CSP_CONNECT_SRC = ("'self'", "https://api.stripe.com")
CSP_REPORT_URI = "/csp-report/"
Nonce-based CSP is significantly stronger than allowlist-based. Plus you get CSP violation reports, which are great for detecting XSS attempts.
<script src="https://cdn.example.com/lib.js"
integrity="sha384-..."
crossorigin="anonymous"></script>
Browser verifies the script content matches the hash. Defense against compromised CDN delivering modified content.
# settings.py
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
USE_X_FORWARDED_HOST = False # Use ALLOWED_HOSTS validation
If you trust XFF for client IP, validate that the request came through a known proxy.
WAFs see syntactic patterns. Your application sees semantic events. Detection rules that the WAF can't see:
These are application-level signals. Implement them as application middleware or in the SIEM (covered in tutorial 10).
In some cases the WAF causes more pain than benefit:
Don't disable. Tune. Disabling = no protection. Tuning = working protection.
WAFs help. WAFs alone are not enough. The defense-in-depth model puts WAF at the edge, hardening at every layer behind it, and observability across all of them. Each layer catches what others miss.
Attackers will probe the boundaries between layers — request smuggling, parser differentials, encoding chains. The defenders who do well are the ones who control those boundaries: same parsing assumptions, same character handling, same trust model from edge to origin.
Tutorial 7 covers what attackers actually use to find the gaps in your stack.