Why this matters
Incidents will happen. The companies that survive them well aren't the ones with the fewest incidents — they're the ones with the most practiced response. This tutorial is the operational playbook for handling a web app breach end-to-end.
The phases of incident response
Standard model (NIST SP 800-61):
- Preparation — before anything happens
- Identification — "is this an incident?"
- Containment — stop the bleeding
- Eradication — remove the attacker
- Recovery — restore safe operations
- Lessons learned — improve
Each has specific actions, specific dangers, specific deliverables.
Phase 1 — Preparation
The single highest-leverage investment in IR. Before an incident:
Documented playbook
A written guide covering common scenarios. For a web app shop:
- Suspected credential compromise
- Active web shell on a server
- Suspected data exfiltration
- DDoS attack ongoing
- Discovered vulnerability being exploited in the wild
Each scenario has: trigger, who's involved, first 30 minutes of actions, decision points, escalation criteria, communication templates.
Roles defined
- Incident Commander — runs the response, makes decisions, owns timelines
- Technical Lead — directs investigation and remediation
- Communications Lead — internal updates, external customer comms, regulatory notification
- Legal Lead — counsel for notification obligations
- Liaison — external (DFIR firm, law enforcement, customers)
Small teams: one person wears multiple hats. The roles still need to be assigned and acknowledged.
Tooling ready
- Centralized log access (Sentry, your aggregator)
- Forensic image collection process documented
- Out-of-band communication channel (Signal, separate Slack workspace)
- Contact list for legal counsel, DFIR firms, key customers, regulators
- Pre-drafted communication templates
Backups verified
Quarterly restore tests. Documentation of what's where. Immutable copies (so attackers can't tamper).
Tabletop exercises
Annual at minimum. Run hypothetical incidents through the team. Find gaps.
Phase 2 — Identification
"Is this an incident?" Initial triage.
Common alert sources
- Sentry exception spike
- Anomalous traffic in logs (large data downloads, many 500s, unusual API patterns)
- Customer report ("my account did something I didn't do")
- Threat intelligence notification (your domain in a breach dump)
- Third-party notification (Stripe, GitHub, AWS detecting suspicious activity)
- Self-discovery during routine work
- Public disclosure (security researcher tweet, news article)
Initial questions
Document timestamps for each:
- What is the indicator? (Be precise — "500 errors" isn't enough; "500 errors on /api/orders/ from IP X spiking at HH:MM")
- Could this be a false positive? (Sometimes the answer is yes after 10 minutes of investigation. Document the reasoning.)
- What's the severity? (Production outage? Data exposure? Suspected compromise of specific account?)
- Who needs to know in the first hour? (Engineering on-call? Leadership? Legal? Customers?)
- What's our confidence level? ("Suspect" vs "confirmed" matters for what actions you can take.)
Triggers for declaring an incident
Decide in advance. Examples:
- Evidence of unauthorized access to any system
- Customer data accessed by unauthorized party
- Encryption activity on file servers (ransomware indicator)
- Significant unexplained outbound traffic
- Public security disclosure naming your company
When a trigger fires: declare. Convene the response team. Open a dedicated incident channel. Start the clock.
Phase 3 — Containment
Stop the active damage. Two goals: prevent expansion AND preserve evidence.
Short-term containment
The first hour. Actions taken under uncertainty, with the goal of limiting blast radius.
- Isolate compromised hosts — pull from network, NOT power off (lose memory state). For cloud, change security groups to drop all traffic.
- Disable compromised accounts — but document first; revert if false positive
- Revoke active sessions —
python manage.py clearsessions for Django, or invalidate specific sessions
- Block known malicious IPs at edge — Cloudflare, AWS WAF, ufw
- Take database snapshot — preserve state for forensics
- Capture memory images of affected servers if forensically capable
- Preserve logs — copy to immutable storage, lock down access
What NOT to do in short-term containment
- Don't reboot servers — destroys memory state that has forensic value
- Don't delete files — destroys evidence
- Don't change passwords on all accounts immediately — destroys evidence of which accounts were used
- Don't communicate externally until you know what happened
- Don't "clean up" — the urge to restore normalcy will destroy the investigation
Long-term containment
After initial chaos settles (hours to days). Goals: maintain controlled state while preparing eradication.
- Patch the entry vector if known
- Maintain enhanced monitoring on affected systems
- Limit administrative access
- Document changes in a single source of truth
Phase 4 — Eradication
Remove the attacker entirely. Common mistake: declare victory too early.
Investigation first
Before eradication, you must know:
- Initial access vector (how did they get in?)
- Timeline of activity (when did it start? how long were they present?)
- Scope of access (what did they touch?)
- Persistence mechanisms (where are their backdoors?)
- Data exfiltration (what did they take?)
Without this, eradication is whack-a-mole. The attacker is back tomorrow via the same or a similar vector.
Forensic discipline
- Chain of custody — document every action, who took it, when
- Working copies, not originals — never mutate evidence
- Hash everything — every artifact has a SHA-256 hash for integrity verification
- Timestamp logs — UTC, ISO 8601, source-stamped
Tools
- Standard forensic tools — Volatility (memory), Plaso/log2timeline (timeline), Autopsy (disk)
- Application logs — your Sentry + log aggregator
- Cloud audit logs — CloudTrail (AWS), Cloud Logging (GCP), Activity Log (Azure)
- Endpoint EDR — CrowdStrike, SentinelOne, Defender if deployed
What to eradicate
- Web shells, malicious cron jobs, suspicious binaries
- Backdoor accounts (added admin accounts, SSH keys, OAuth grants)
- Modified configurations (changed firewall rules, disabled security tools)
- Persistence mechanisms (scheduled tasks, systemd services, init scripts)
- Modified application code (injected JavaScript, backdoored views)
Validation
After eradication, validate. Re-scan. Re-image where appropriate. Don't trust a "cleaned" system without verifying.
Phase 5 — Recovery
Bring systems back. Two priorities: business resumption AND defending against the same attack happening tomorrow.
Restoration sources
- Clean backups (verified to predate the compromise)
- Rebuilt from scratch (especially compromised servers — never trust a host you've cleaned, only one you've rebuilt)
- Patched and reconfigured existing systems
Validation before bringing systems back
- Vulnerability scan
- Configuration review
- Patch level confirmation
- Account audit (no unexpected accounts, all MFA enrolled)
- Log monitoring tuned for anomalies
Phased return
Critical systems first, broader systems later. Monitor each tier as it comes online.
Password and credential rotation
- Force password reset for all users
- Revoke all sessions
- Rotate API keys, SSH keys, certificates
- Rotate secrets in secret manager (database passwords, third-party API keys)
- Re-enroll MFA if compromise included MFA bypass mechanisms
This is the boring, painful work. It's also where many recoveries fail — the attacker had a credential you forgot to rotate.
Phase 6 — Lessons learned
The post-incident review. Done within 1-2 weeks of incident closure, before memory fades.
Blameless review
The goal is improvement, not punishment. People who fear consequences hide information. The review must explicitly be blameless.
Standard agenda
- Timeline reconstruction — what happened, when, in what order
- Root cause analysis — beyond "the firewall let it through"; the human and process factors
- Decision review — what calls were made, what worked, what didn't
- Detection review — could we have caught this earlier? What signal should have fired?
- Response review — was the playbook followed? Was it sufficient?
- Action items — specific, owned, time-bounded improvements
Deliverables
- Internal incident report — full timeline, root cause, response, action items
- Executive summary — for leadership and board
- Customer communication — if data was exposed, what happened and what's been done
- Regulatory filing — if required by jurisdiction (GDPR within 72 hours)
- Public disclosure — if appropriate, often months later when context is full
Specific action items typically include
- Technical: patch the specific vulnerability, add the missing detection rule, harden the broken control
- Process: update playbook for the gap that hurt, add the missing escalation path
- People: hire if capacity was the issue, train if knowledge was the issue
- Tooling: deploy the missing monitoring, integrate the right systems
Regulatory and customer notification
Often time-sensitive. Don't wait for full investigation.
GDPR
- Notify supervisory authority within 72 hours of becoming aware (not within 72 hours of resolving)
- Notify affected individuals if high risk to rights and freedoms
- "Becoming aware" interpretation: when you have reasonable certainty an incident occurred, not when you have full forensic certainty
Sector regulators
Financial (DNB), healthcare, critical infrastructure — sector-specific timelines.
Contractual obligations
Your customer contracts may have stricter notification windows than law requires (24-48 hours common in enterprise B2B). Check before you need to know.
What to say
- What happened (factual, not speculative)
- What information was affected
- What you've done to contain and remediate
- What recipients should do
- How they can contact you for questions
What NOT to say:
- Detailed attribution ("It was nation-state actor X") — usually wrong, always speculative
- Promises you can't keep ("This will never happen again")
- Minimizing language ("a minor incident," "limited impact")
A first-hour checklist
For when an alert just fired and you need a playbook on the wall:
- [ ] Document the alert: timestamp, source, indicator
- [ ] Triage: is this a real incident? (5-10 minutes)
- [ ] Declare if it is. Open the incident channel.
- [ ] Page the on-call team
- [ ] Identify Incident Commander
- [ ] Start a timeline document — every action gets logged with timestamp
- [ ] Capture immediate evidence: log queries, screenshots, memory dumps if possible
- [ ] Implement short-term containment (isolate, revoke, block)
- [ ] Notify internal stakeholders (engineering leadership, legal counsel)
- [ ] Reach out to DFIR firm if scope is unclear
- [ ] Schedule the next checkpoint (30 minutes? 1 hour?)
- [ ] Do not communicate externally yet
That's enough structure to avoid the worst mistakes while you figure out what's actually happening.
Closing the series
These ten tutorials cover the web application security landscape from threat models to incident response. The thread connecting them: defense in depth, applied with discipline, tested under realistic adversary pressure.
You will not prevent every incident. You will not detect every attack. Your goal is to make attacks expensive enough that most adversaries pick another target, detect the ones who persist before they finish, and respond effectively when something does succeed.
That outcome is achievable. The companies that achieve it are not the ones with the biggest security budget — they're the ones that prioritize, practice, and improve continuously. Tools matter. Discipline matters more.
Good luck out there.