Security Advanced

Web Application Incident Response — Investigating an Active Compromise

From the first alert through containment, eradication, forensics, and post-incident review — a structured playbook for handling web app breaches.

DjangoZen Team May 10, 2026 15 min read 1 views

Why this matters

Incidents will happen. The companies that survive them well aren't the ones with the fewest incidents — they're the ones with the most practiced response. This tutorial is the operational playbook for handling a web app breach end-to-end.

The phases of incident response

Standard model (NIST SP 800-61):

Preparation — before anything happens
Identification — "is this an incident?"
Containment — stop the bleeding
Eradication — remove the attacker
Recovery — restore safe operations
Lessons learned — improve

Each has specific actions, specific dangers, specific deliverables.

Phase 1 — Preparation

The single highest-leverage investment in IR. Before an incident:

Documented playbook

A written guide covering common scenarios. For a web app shop:

Suspected credential compromise
Active web shell on a server
Suspected data exfiltration
DDoS attack ongoing
Discovered vulnerability being exploited in the wild

Each scenario has: trigger, who's involved, first 30 minutes of actions, decision points, escalation criteria, communication templates.

Roles defined

Incident Commander — runs the response, makes decisions, owns timelines
Technical Lead — directs investigation and remediation
Communications Lead — internal updates, external customer comms, regulatory notification
Legal Lead — counsel for notification obligations
Liaison — external (DFIR firm, law enforcement, customers)

Small teams: one person wears multiple hats. The roles still need to be assigned and acknowledged.

Tooling ready

Centralized log access (Sentry, your aggregator)
Forensic image collection process documented
Out-of-band communication channel (Signal, separate Slack workspace)
Contact list for legal counsel, DFIR firms, key customers, regulators
Pre-drafted communication templates

Backups verified

Quarterly restore tests. Documentation of what's where. Immutable copies (so attackers can't tamper).

Tabletop exercises

Annual at minimum. Run hypothetical incidents through the team. Find gaps.

Phase 2 — Identification

"Is this an incident?" Initial triage.

Common alert sources

Sentry exception spike
Anomalous traffic in logs (large data downloads, many 500s, unusual API patterns)
Customer report ("my account did something I didn't do")
Threat intelligence notification (your domain in a breach dump)
Third-party notification (Stripe, GitHub, AWS detecting suspicious activity)
Self-discovery during routine work
Public disclosure (security researcher tweet, news article)

Initial questions

Document timestamps for each:

What is the indicator? (Be precise — "500 errors" isn't enough; "500 errors on /api/orders/ from IP X spiking at HH:MM")
Could this be a false positive? (Sometimes the answer is yes after 10 minutes of investigation. Document the reasoning.)
What's the severity? (Production outage? Data exposure? Suspected compromise of specific account?)
Who needs to know in the first hour? (Engineering on-call? Leadership? Legal? Customers?)
What's our confidence level? ("Suspect" vs "confirmed" matters for what actions you can take.)

Triggers for declaring an incident

Decide in advance. Examples:

Evidence of unauthorized access to any system
Customer data accessed by unauthorized party
Encryption activity on file servers (ransomware indicator)
Significant unexplained outbound traffic
Public security disclosure naming your company

When a trigger fires: declare. Convene the response team. Open a dedicated incident channel. Start the clock.

Phase 3 — Containment

Stop the active damage. Two goals: prevent expansion AND preserve evidence.

Short-term containment

The first hour. Actions taken under uncertainty, with the goal of limiting blast radius.

Isolate compromised hosts — pull from network, NOT power off (lose memory state). For cloud, change security groups to drop all traffic.
Disable compromised accounts — but document first; revert if false positive
Revoke active sessions — python manage.py clearsessions for Django, or invalidate specific sessions
Block known malicious IPs at edge — Cloudflare, AWS WAF, ufw
Take database snapshot — preserve state for forensics
Capture memory images of affected servers if forensically capable
Preserve logs — copy to immutable storage, lock down access

What NOT to do in short-term containment

Don't reboot servers — destroys memory state that has forensic value
Don't delete files — destroys evidence
Don't change passwords on all accounts immediately — destroys evidence of which accounts were used
Don't communicate externally until you know what happened
Don't "clean up" — the urge to restore normalcy will destroy the investigation

Long-term containment

After initial chaos settles (hours to days). Goals: maintain controlled state while preparing eradication.

Patch the entry vector if known
Maintain enhanced monitoring on affected systems
Limit administrative access
Document changes in a single source of truth

Phase 4 — Eradication

Remove the attacker entirely. Common mistake: declare victory too early.

Investigation first

Before eradication, you must know:

Initial access vector (how did they get in?)
Timeline of activity (when did it start? how long were they present?)
Scope of access (what did they touch?)
Persistence mechanisms (where are their backdoors?)
Data exfiltration (what did they take?)

Without this, eradication is whack-a-mole. The attacker is back tomorrow via the same or a similar vector.

Forensic discipline

Chain of custody — document every action, who took it, when
Working copies, not originals — never mutate evidence
Hash everything — every artifact has a SHA-256 hash for integrity verification
Timestamp logs — UTC, ISO 8601, source-stamped

Tools

Standard forensic tools — Volatility (memory), Plaso/log2timeline (timeline), Autopsy (disk)
Application logs — your Sentry + log aggregator
Cloud audit logs — CloudTrail (AWS), Cloud Logging (GCP), Activity Log (Azure)
Endpoint EDR — CrowdStrike, SentinelOne, Defender if deployed

What to eradicate

Web shells, malicious cron jobs, suspicious binaries
Backdoor accounts (added admin accounts, SSH keys, OAuth grants)
Modified configurations (changed firewall rules, disabled security tools)
Persistence mechanisms (scheduled tasks, systemd services, init scripts)
Modified application code (injected JavaScript, backdoored views)

Validation

After eradication, validate. Re-scan. Re-image where appropriate. Don't trust a "cleaned" system without verifying.

Phase 5 — Recovery

Bring systems back. Two priorities: business resumption AND defending against the same attack happening tomorrow.

Restoration sources

Clean backups (verified to predate the compromise)
Rebuilt from scratch (especially compromised servers — never trust a host you've cleaned, only one you've rebuilt)
Patched and reconfigured existing systems

Validation before bringing systems back

Vulnerability scan
Configuration review
Patch level confirmation
Account audit (no unexpected accounts, all MFA enrolled)
Log monitoring tuned for anomalies

Phased return

Critical systems first, broader systems later. Monitor each tier as it comes online.

Password and credential rotation

Force password reset for all users
Revoke all sessions
Rotate API keys, SSH keys, certificates
Rotate secrets in secret manager (database passwords, third-party API keys)
Re-enroll MFA if compromise included MFA bypass mechanisms

This is the boring, painful work. It's also where many recoveries fail — the attacker had a credential you forgot to rotate.

Phase 6 — Lessons learned

The post-incident review. Done within 1-2 weeks of incident closure, before memory fades.

Blameless review

The goal is improvement, not punishment. People who fear consequences hide information. The review must explicitly be blameless.

Standard agenda

Timeline reconstruction — what happened, when, in what order
Root cause analysis — beyond "the firewall let it through"; the human and process factors
Decision review — what calls were made, what worked, what didn't
Detection review — could we have caught this earlier? What signal should have fired?
Response review — was the playbook followed? Was it sufficient?
Action items — specific, owned, time-bounded improvements

Deliverables

Internal incident report — full timeline, root cause, response, action items
Executive summary — for leadership and board
Customer communication — if data was exposed, what happened and what's been done
Regulatory filing — if required by jurisdiction (GDPR within 72 hours)
Public disclosure — if appropriate, often months later when context is full

Specific action items typically include

Technical: patch the specific vulnerability, add the missing detection rule, harden the broken control
Process: update playbook for the gap that hurt, add the missing escalation path
People: hire if capacity was the issue, train if knowledge was the issue
Tooling: deploy the missing monitoring, integrate the right systems

Regulatory and customer notification

Often time-sensitive. Don't wait for full investigation.

GDPR

Notify supervisory authority within 72 hours of becoming aware (not within 72 hours of resolving)
Notify affected individuals if high risk to rights and freedoms
"Becoming aware" interpretation: when you have reasonable certainty an incident occurred, not when you have full forensic certainty

Sector regulators

Financial (DNB), healthcare, critical infrastructure — sector-specific timelines.

Contractual obligations

Your customer contracts may have stricter notification windows than law requires (24-48 hours common in enterprise B2B). Check before you need to know.

What to say

What happened (factual, not speculative)
What information was affected
What you've done to contain and remediate
What recipients should do
How they can contact you for questions

What NOT to say:

Detailed attribution ("It was nation-state actor X") — usually wrong, always speculative
Promises you can't keep ("This will never happen again")
Minimizing language ("a minor incident," "limited impact")

A first-hour checklist

For when an alert just fired and you need a playbook on the wall:

[ ] Document the alert: timestamp, source, indicator
[ ] Triage: is this a real incident? (5-10 minutes)
[ ] Declare if it is. Open the incident channel.
[ ] Page the on-call team
[ ] Identify Incident Commander
[ ] Start a timeline document — every action gets logged with timestamp
[ ] Capture immediate evidence: log queries, screenshots, memory dumps if possible
[ ] Implement short-term containment (isolate, revoke, block)
[ ] Notify internal stakeholders (engineering leadership, legal counsel)
[ ] Reach out to DFIR firm if scope is unclear
[ ] Schedule the next checkpoint (30 minutes? 1 hour?)
[ ] Do not communicate externally yet

That's enough structure to avoid the worst mistakes while you figure out what's actually happening.

Closing the series

These ten tutorials cover the web application security landscape from threat models to incident response. The thread connecting them: defense in depth, applied with discipline, tested under realistic adversary pressure.

You will not prevent every incident. You will not detect every attack. Your goal is to make attacks expensive enough that most adversaries pick another target, detect the ones who persist before they finish, and respond effectively when something does succeed.

That outcome is achievable. The companies that achieve it are not the ones with the biggest security budget — they're the ones that prioritize, practice, and improve continuously. Tools matter. Discipline matters more.

Good luck out there.

Ready to Build?

Skip the boilerplate. Get production-ready Django packages.

Browse Products