Security Advanced

Advanced Web App Recon — JS Crawling, Subdomain Takeover, API Discovery

Beyond nmap and dirbuster: how modern attackers map a target's web attack surface using JavaScript analysis, subdomain enumeration, and API discovery.

DjangoZen Team May 10, 2026 16 min read 129 views

Why this matters

The first phase of any web app attack is reconnaissance. Modern web apps expose far more attack surface than the visible UI: API endpoints, internal admin paths, embedded JavaScript with secrets, forgotten subdomains. Attackers and defenders both need tools to enumerate this surface comprehensively.

This tutorial is from the offensive side — what's done, with what tools, and where you'd be wise to look at yourself first.

The recon mindset

Two principles guide modern web app recon:

Anything publicly accessible is publicly enumerable. If you have it on the internet, someone can find it.
Inventory beats sophistication. Most successful attacks come from comprehensive enumeration finding something forgotten, not clever exploitation of well-defended endpoints.

Spend most time on coverage. Sophistication comes later when you've found something interesting.

Subdomain enumeration

Most web apps have many more subdomains than their main marketing site reveals. Each is a separate attack surface.

Passive (no traffic to target)

Certificate Transparency logs — every TLS cert issued is logged. Cheapest, most comprehensive single source.

curl -s 'https://crt.sh/?q=%25.djangozen.com&output=json' | \
  jq -r '.[].name_value' | sort -u | head -20

Subfinder — aggregates multiple passive sources:

subfinder -d djangozen.com -all -silent

Amass passive mode:

amass enum -passive -d djangozen.com -src

SecurityTrails, VirusTotal, Shodan — APIs that have indexed subdomains. Some free tiers.

Active (sends traffic to target)

DNS brute force — try common subdomain names:

gobuster dns -d djangozen.com -w subdomains-top1million-5000.txt

HTTP-level discovery — once you have a list of subdomains, see which are alive:

echo "www.djangozen.com\nstaging.djangozen.com\napi.djangozen.com" | \
  httpx -silent -title -status-code

Subdomain takeover — the high-impact bug

Pattern: a subdomain points (via CNAME) to a service that no longer exists. Attacker claims that service name, now serves content under your subdomain.

Common targets: GitHub Pages, AWS S3, Azure Storage, Heroku, Surge, Netlify, Cloudflare Workers — anywhere you can claim a name.

Detection (do this against yourself):

# Check every subdomain's CNAME and verify the target is owned
subjack -d domains.txt -ssl -t 100

Defense:

Audit DNS regularly — remove records pointing to services you no longer use
Use scoped credentials for cloud resources, not your root account
When tearing down services, remove DNS first, then the service

Wayback Machine — finding the historical attack surface

Sites change. Old endpoints that still exist but aren't linked anymore are gold for attackers.

# Get every URL ever archived for your domain
waybackurls djangozen.com | sort -u > urls.txt

# Filter for interesting patterns
grep -E '\.php|\.bak|\.sql|\.zip|\.git|admin|api/v1|debug' urls.txt

Surprisingly often this reveals working admin paths, debug endpoints, or old API versions that never got removed.

Defenses

Remove old endpoints when deprecating, don't just stop linking to them
410 Gone or 404 Not Found is acceptable; 200 OK with old content is dangerous
Monitor for unauthorized access to known-old paths in your logs

JavaScript analysis — secrets and endpoints

Modern web apps push significant logic to the browser. The JavaScript bundles often contain:

API endpoint URLs (including admin endpoints not linked from the UI)
Comments and TODOs revealing intent
Hardcoded credentials (often leaked by accident)
Internal IDs, debug flags, feature toggles
Third-party service integration details

Extracting

# Fetch every JS file linked from a page
katana -u https://djangozen.com -d 5 -jc -silent | grep -E '\.js$' > js_urls.txt

# Or use a proxy and crawl
gospider -s https://djangozen.com -d 3 --js

Searching for interesting strings

# Download all JS files
mkdir -p js && cd js
xargs -n1 -P10 curl -s -O < ../js_urls.txt

# Look for endpoints
grep -hoE '"(/api/[^"]*)"' *.js | sort -u

# Look for likely secrets
grep -hE '(api[_-]?key|secret|token|password)' *.js | head -20

# AWS access key pattern
grep -hE 'AKIA[0-9A-Z]{16}' *.js

Specialized tools: TruffleHog, gitleaks, secretfinder (Python tool for JS).

Defenses

Don't embed any secret in client-side code — even minified, even obfuscated
Use environment variables on the server, never on the client
For client-side service integration, use scoped/short-lived tokens issued by your backend
For analytics keys etc., assume they're public — that's why they're called "public" keys

API discovery — both REST and GraphQL

Modern apps are increasingly API-driven. The API often has more attack surface than the UI.

Finding REST APIs

# Common API paths
ffuf -u 'https://djangozen.com/api/FUZZ' -w api-wordlist.txt -fc 404

# Versioned APIs
for v in v1 v2 v3 v4; do
    echo "Checking /api/$v/"
    curl -s -o /dev/null -w "%{http_code}\n" https://djangozen.com/api/$v/
done

# Common API endpoints
ffuf -u 'https://djangozen.com/api/v1/FUZZ' -w api-endpoints.txt

OpenAPI / Swagger documents

Many APIs accidentally expose their own documentation:

/swagger/
/swagger.json
/swagger-ui/
/openapi.json
/redoc/
/api/docs/
/api/schema/

If found, that's the API contract — every endpoint, every parameter, every authentication requirement, gift-wrapped.

GraphQL endpoints

GraphQL has its own discovery patterns:

# Common locations
curl https://djangozen.com/graphql
curl https://djangozen.com/api/graphql

# Introspection query — gets the full schema
curl -X POST -H "Content-Type: application/json" \
  -d '{"query":"{__schema{types{name,fields{name}}}}"}' \
  https://djangozen.com/graphql

Introspection should be disabled in production. If it's not, the entire schema is exposed.

API parameter discovery

Once you find an API, brute force parameter names:

# Try known parameter names on a parameter-less endpoint
ffuf -u 'https://djangozen.com/api/users/?FUZZ=test' -w params.txt -fs 0

Tools that do this systematically: Param Miner (Burp extension), Arjun.

Content discovery — beyond directory brute force

Old-school dirbuster: brute-force common paths. Modern alternatives are smarter.

Recursive crawling with intelligence

katana (ProjectDiscovery):

katana -u https://djangozen.com -d 5 -jc -aff -o crawled.txt

Crawls recursively, parses JavaScript for additional URLs, follows forms, handles single-page apps.

gospider:

gospider -s https://djangozen.com -d 3 --js --robots --sitemap

Smart wordlists

Assetnote wordlists — curated by experience: https://wordlists.assetnote.io/

SecLists — the everything-and-the-kitchen-sink collection.

Use technology-specific wordlists. If the target runs Django (visible in headers/cookies), use Django-specific paths. If WordPress, WP paths. Etc.

Status code interpretation

200 — exists, content
301/302 — redirect (interesting if the redirect target is suspicious)
401 — exists, requires auth (often valuable)
403 — exists, forbidden (sometimes bypassable; often valuable)
404 — doesn't exist (or is hidden behind 404)
500 — server error — bug surface

A 403 on /admin/ is often more interesting than a 200 on the home page. It tells you the admin exists, you just need to authenticate or bypass.

Technology fingerprinting

Knowing the stack tells you what bugs to try.

# Wappalyzer browser extension — easiest manual check
# whatweb — command line
whatweb https://djangozen.com

# httpx with tech detection
httpx -u https://djangozen.com -tech-detect -title

What you'll learn:

Framework (Django, Rails, Laravel, .NET) → known framework-specific bugs
Web server (nginx, Apache, IIS) → version-specific issues
CDN (Cloudflare, Akamai) → WAF presence
Analytics, ads, fonts → third-party attack surface
JS libraries → dependency vulnerabilities

Putting it together — a full recon for a Django target

A bug bounty hunter's morning workflow:

# 1. Subdomains
subfinder -d djangozen.com -all -silent > subs.txt
amass enum -passive -d djangozen.com >> subs.txt
sort -u subs.txt > subs-unique.txt

# 2. Alive hosts
httpx -l subs-unique.txt -silent -title -tech-detect -status-code > alive.txt

# 3. Historical URLs
waybackurls djangozen.com > wayback.txt
gau djangozen.com >> wayback.txt
sort -u wayback.txt > urls.txt

# 4. Active crawling
katana -u https://djangozen.com -d 5 -jc -aff -silent | tee -a urls.txt

# 5. Interesting patterns
grep -E '/api/|/admin|debug|swagger|\.git|\.env|backup' urls.txt > interesting.txt

# 6. Take screenshots of every alive URL
gowitness file -f alive.txt -P screenshots/

# 7. Manual review of interesting URLs and screenshots

By the time you've done this for an app, you've seen everything publicly visible — admin panels, API endpoints, old debug pages, dev/staging environments forgotten in DNS, JS bundles with hidden secrets.

Defenses — what to do if you're the target

This list is identical to running this recon against your own infrastructure:

DNS hygiene — quarterly audit, remove records for retired services
Subdomain takeover monitoring — automated alerts on dangling CNAMEs
JS bundle audit — strip secrets, comments revealing intent, debug flags from production builds
API discoverability control — disable Swagger in production, disable GraphQL introspection
Old endpoint cleanup — 410 Gone, not still-working endpoints
Asset inventory — know what you have, monitor for what shouldn't exist
Bug bounty or responsible disclosure program — let researchers tell you what they find
Continuous external assessment — run these tools against yourself monthly

If you find things you didn't know you had, that's the point of the exercise. Better to find them yourself than read about them in a breach disclosure.

Why reconnaissance comes first

Every serious attack begins with reconnaissance, because you cannot exploit what you have not found. Attackers invest heavily in mapping a target before touching it — enumerating subdomains, crawling JavaScript for hidden endpoints, discovering APIs, fingerprinting technologies — and the breadth of what they uncover often determines whether they find a way in. For defenders, this means your exposure is the sum of everything discoverable, including the things you forgot you exposed. Understanding recon from the attacker's side is what lets you reduce your own discoverable footprint, because the staging server, the old subdomain, and the undocumented API endpoint are exactly what a thorough recon phase will surface.

The danger of forgotten assets

The most common recon win is not a clever technique but a forgotten asset: a subdomain pointing at a service that was decommissioned, a staging environment left exposed, an old API version still live, a debug endpoint never removed. These exist because organizations add far more than they retire, and what is forgotten is not monitored or patched. Maintaining an accurate inventory of your assets and decommissioning them properly — not just turning them off but removing the DNS records and access that point to them — closes off the easy discoveries that reconnaissance relies on, denying attackers the soft targets at the edges of your estate.

Reducing your discoverable footprint

Defending against recon is about minimizing and monitoring what can be found. Keep an inventory of subdomains and decommission stale ones to prevent takeovers, avoid leaking internal endpoints in client-side code, require authentication on anything not meant to be public, and monitor for unexpected exposure. You cannot stop reconnaissance, but you can ensure it finds a small, hardened, well-understood surface rather than a sprawl of forgotten assets. Treating your attack surface as something to actively shrink and watch, rather than let grow unchecked, turns the attacker's recon phase from a productive treasure hunt into a frustrating dead end.

Closing

Recon is patient, methodical, comprehensive. The good attackers spend disproportionate time here because the bug is usually findable; the trick is finding what others miss.

Defenders win by knowing their own attack surface better than attackers do. The tools above are open source and free. Run them. Schedule them. Treat "new asset detected" as an actionable alert. The half-life of unknown assets is measured in months before they become a problem.

Tutorial 8 covers what happens when an attacker has finished reconnaissance and starts going after the auth layer.

Ready to Build?

Skip the boilerplate. Get production-ready Django packages.

Browse Products

Advanced Web App Recon — JS Crawling, Subdomain Takeover, API Discovery

Why this matters

The recon mindset

Subdomain enumeration

Passive (no traffic to target)

Active (sends traffic to target)

Subdomain takeover — the high-impact bug

Wayback Machine — finding the historical attack surface

Defenses

JavaScript analysis — secrets and endpoints

Extracting

Searching for interesting strings

Defenses

API discovery — both REST and GraphQL

Finding REST APIs

OpenAPI / Swagger documents

GraphQL endpoints

API parameter discovery

Content discovery — beyond directory brute force

Recursive crawling with intelligence

Smart wordlists

Status code interpretation

Technology fingerprinting

Putting it together — a full recon for a Django target

Defenses — what to do if you're the target

Why reconnaissance comes first

The danger of forgotten assets

Reducing your discoverable footprint

Closing

Related Tutorials

Hardening Django APIs: Rate Limiting, HMAC Request Signing, and Mutual TLS

Passwordless Django: WebAuthn and Passkeys for Phishing-Resistant Authentication

Web Application Incident Response — Investigating an Active Compromise

Red Team Web App Tactics — Phishing into Apps, Lateral Movement, Persistence

Web Authentication Attacks — Sessions, JWT, OAuth, SSO, Account Takeover

Categories

Ready to Build?