A Bloodhound-style web crawler for penetration testing that maps web application attack surfaces.
✅ Current (CLI v1.0)
- Recursive web crawling with depth control
- HTTP method detection (GET, POST, etc.)
- Form input extraction
- API endpoint discovery from JavaScript
- Attack surface identification (admin panels, file uploads, APIs)
- Graph data structure (ready for visualization)
- JSON export for analysis
- Statistics and reporting
🚧 Planned (Web v2.0)
- FastAPI backend wrapper
- WebSocket live updates
- React + Cytoscape.js visualization
- Interactive graph exploration
- Authentication handling
- Advanced JavaScript parsing
pip install -r requirements.txtpython crawler.py https://example.compython crawler.py https://example.com 5 1.0
# max_depth=5, delay=1.0s between requests- Console: Summary statistics and progress
- JSON file:
crawl_<domain>.jsonwith full results
{
"base_url": "https://example.com",
"total_endpoints": 42,
"endpoints": {
"https://example.com/admin": {
"url": "...",
"methods": ["GET", "POST"],
"params": ["id", "filter"],
"attack_surface": ["admin_panel", "query_params"],
"type": "page"
}
},
"graph": {
"nodes": [...],
"edges": [...]
},
"stats": {...}
}The tool automatically flags:
- 🔐 Admin panels (
/admin,/dashboard) - 📁 File upload endpoints
- 🔌 API endpoints (
/api/, REST, GraphQL) - 🎯 Query parameters (potential injection points)
WebCrawler class
├── crawl() - Main entry point (reusable for API)
├── extract_endpoints() - HTML parsing
├── analyze_endpoint() - Deep analysis
└── get_results() - Structured JSON output
The code is designed to be wrapped by FastAPI with minimal changes. The callback parameter already supports progress updates for WebSocket streaming.
🚀 Starting crawl of https://example.com
Max depth: 3, Delay: 0.5s
Crawling: https://example.com (depth: 0)
Crawling: https://example.com/about (depth: 1)
Crawling: https://example.com/api/users (depth: 1)
...
🎯 Web Attack Surface Map for: https://example.com
============================================================
📊 Statistics:
Total URLs discovered: 47
Total endpoints: 42
🔍 By Type:
page: 28
api: 8
static: 6
🌐 By HTTP Method:
GET: 35
POST: 7
⚠️ Attack Surface Areas:
api_endpoint: 8
query_params: 12
admin_panel: 2
============================================================
💾 Full results saved to: crawl_example.com.json
- Start shallow: Use depth=2 for initial recon
- Analyze JSON: Look for hidden API endpoints in the graph
- Check attack_surface: Focus on flagged endpoints first
- Forms = Gold: POST forms often have interesting vulnerabilities
- Review params: Each param is a potential injection point
- Add authentication support (cookies, JWT)
- JavaScript rendering (Selenium/Playwright)
- Subdomain enumeration
- Technology fingerprinting
- CORS policy detection
- Web UI with graph visualization