██████╗ ██╗ ██╗ ██████╗ ███████╗████████╗██╗ ██╗███╗ ██╗██╗ ██╗
██╔════╝ ██║ ██║██╔═══██╗██╔════╝╚══██╔══╝██║ ██║████╗ ██║██║ ██╔╝
██║ ███╗███████║██║ ██║███████╗ ██║ ██║ ██║██╔██╗ ██║█████╔╝
██║ ██║██╔══██║██║ ██║╚════██║ ██║ ██║ ██║██║╚██╗██║██╔═██╗
╚██████╔╝██║ ██║╚██████╔╝███████║ ██║ ███████╗██║██║ ╚████║██║ ██╗
╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝ ╚══════╝╚═╝╚═╝ ╚═══╝╚═╝ ╚═╝
Hunt Down Dead Links Before They Haunt Your Users
Find and exorcise the ghost links haunting your website.
Features • Quick Start • Alerts • Security • Examples
Dead links are ghost links - invisible problems that haunt your website, damage SEO, and frustrate users. GhostLink hunts them down before they can do damage:
- Deep Scanning - Crawls your entire site recursively
- Smart Detection - Finds broken links, images, scripts, stylesheets
- Fix Suggestions - Wayback Machine links, typo detection, similar URLs
- Instant Alerts - Email, Slack, or webhook notifications
- History Tracking - Only alerts on NEW ghosts, not old ones
- Security First - SSRF protection, input validation, rate limiting
| Ghost Type | Detection |
|---|---|
| Dead Links | Full recursive crawl |
| Broken Images | <img>, srcset, CSS backgrounds |
| Missing Scripts | <script src> |
| Lost Stylesheets | <link href> |
| Vanished Favicons | All icon types |
| External Ghosts | Optional external link checking |
GhostLink doesn't just find ghosts - it helps you banish them:
👻 GHOST FOUND: https://example.com/blog/my-psot.html
Status: 404 Not Found
🔮 Suggestions:
- Fix typo: https://example.com/blog/my-post.html
- Similar URL: https://example.com/blog/my-posts.html
- Wayback Machine: https://web.archive.org/web/*/...
- Try HTTPS version
| Channel | Setup |
|---|---|
| SMTP (Gmail, SendGrid, etc.) | |
| Slack | Webhook URL |
| Webhook | Any HTTP endpoint |
| Console | Default output |
# Clone the repository
git clone https://github.com/Sheraz-k/ghostlink.git
cd ghostlink
# Or download directly
curl -O https://github.com/Sheraz-k/ghostlink/main/ghostlink.pypython ghostlink.py https://yourwebsite.compython ghostlink.py https://yourwebsite.com --output ghost-report.htmlpython ghostlink.py https://yourwebsite.com --depth 5 --max-pages 1000python ghostlink.py https://yourwebsite.com \
--slack-webhook https://hooks.slack.com/services/XXX/YYY/ZZZUsage: ghostlink.py [OPTIONS] URL
Scan Options:
--depth, -d Maximum crawl depth (default: 3, max: 20)
--max-pages, -m Maximum pages to scan (default: 500, max: 50000)
--timeout, -t Request timeout in seconds (default: 10, max: 120)
--delay Delay between requests (default: 0.2s)
--no-external Don't check external links
--no-images Don't check images
--ignore URL patterns to ignore (regex)
--threads Number of threads (default: 5)
Output Options:
--output, -o Output file path (.html or .json)
--format, -f Output format: html, json, console
Alert Options:
--email Email address for alerts
--smtp-host SMTP host (default: smtp.gmail.com)
--smtp-port SMTP port (default: 587)
--smtp-user SMTP username
--smtp-pass SMTP password
--slack-webhook Slack webhook URL
--webhook Generic webhook URL
--alert-all Alert on all ghosts, not just new ones
GhostLink is security-hardened with multiple layers of protection:
- Blocks scanning of localhost, 127.0.0.1
- Blocks private IP ranges (10.x, 172.16.x, 192.168.x)
- Blocks dangerous protocols (file://, javascript://, data://)
- URL length limits (2048 chars)
- Config value validation (depth, timeout, etc.)
- Regex pattern validation for ignore rules
- Email format validation
- Webhook URL validation
- Response size limits (10MB max)
- Redirect loop detection (max 5 redirects)
- Rate limiting between requests
- Timeout enforcement
- HTML escaping in reports
- Path traversal prevention
- System directory write protection
# Run daily at 6 AM
0 6 * * * python /path/to/ghostlink.py https://yoursite.com --slack-webhook https://hooks.slack.com/...name: Daily Ghost Hunt
on:
schedule:
- cron: '0 6 * * *'
jobs:
hunt:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: |
python ghostlink.py https://yoursite.com \
--output report.html \
--slack-webhook ${{ secrets.SLACK_WEBHOOK }}
- uses: actions/upload-artifact@v4
with:
name: ghost-report
path: report.htmlpython ghostlink.py https://example.com --no-externalpython ghostlink.py https://example.com \
--ignore "/admin/*" \
--ignore "/api/*" \
--ignore "*.pdf"python ghostlink.py https://example.com \
--depth 10 \
--max-pages 5000 \
--output full-audit.html# Exit code 1 if ghosts found
python ghostlink.py https://staging.example.com || exit 1Beautiful, shareable report with:
- Summary statistics with visual indicators
- Ghost links with error details
- Fix suggestions for each ghost
- Pages where ghosts were found
{
"site_url": "https://example.com",
"scan_completed": "2024-01-15T10:30:00",
"total_pages_scanned": 150,
"broken_links": [
{
"url": "https://example.com/missing",
"status_code": 404,
"found_on": ["https://example.com/blog"],
"suggestions": ["Similar: https://example.com/missing-page"]
}
]
}============================================================
GhostLink Scan Report
============================================================
Site: https://example.com
Pages Scanned: 150
Ghosts Found: 3
============================================================
[X] GHOST LINKS:
----------------------------------------
[404] https://example.com/old-page.html
Found on: https://example.com/sitemap
Suggestions:
- Similar URL found: https://example.com/new-page.html
- Check Wayback Machine
Contributions welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE for details.
Sheraz Khan - @Sheraz-k
If GhostLink helps you hunt down ghosts, please give it a ⭐
No ghosts were harmed in the making of this software 👻