Skip to content

Sheraz-k/ghostlink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GhostLink

 ██████╗ ██╗  ██╗ ██████╗ ███████╗████████╗██╗     ██╗███╗   ██╗██╗  ██╗
██╔════╝ ██║  ██║██╔═══██╗██╔════╝╚══██╔══╝██║     ██║████╗  ██║██║ ██╔╝
██║  ███╗███████║██║   ██║███████╗   ██║   ██║     ██║██╔██╗ ██║█████╔╝
██║   ██║██╔══██║██║   ██║╚════██║   ██║   ██║     ██║██║╚██╗██║██╔═██╗
╚██████╔╝██║  ██║╚██████╔╝███████║   ██║   ███████╗██║██║ ╚████║██║  ██╗
 ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝   ╚═╝   ╚══════╝╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝

Hunt Down Dead Links Before They Haunt Your Users

Python 3.8+ License: MIT Zero Dependencies Security Hardened

Find and exorcise the ghost links haunting your website.

FeaturesQuick StartAlertsSecurityExamples


Why GhostLink?

Dead links are ghost links - invisible problems that haunt your website, damage SEO, and frustrate users. GhostLink hunts them down before they can do damage:

  • Deep Scanning - Crawls your entire site recursively
  • Smart Detection - Finds broken links, images, scripts, stylesheets
  • Fix Suggestions - Wayback Machine links, typo detection, similar URLs
  • Instant Alerts - Email, Slack, or webhook notifications
  • History Tracking - Only alerts on NEW ghosts, not old ones
  • Security First - SSRF protection, input validation, rate limiting

Features

What It Hunts

Ghost Type Detection
Dead Links Full recursive crawl
Broken Images <img>, srcset, CSS backgrounds
Missing Scripts <script src>
Lost Stylesheets <link href>
Vanished Favicons All icon types
External Ghosts Optional external link checking

Exorcism Suggestions

GhostLink doesn't just find ghosts - it helps you banish them:

👻 GHOST FOUND: https://example.com/blog/my-psot.html
   Status: 404 Not Found

   🔮 Suggestions:
   - Fix typo: https://example.com/blog/my-post.html
   - Similar URL: https://example.com/blog/my-posts.html
   - Wayback Machine: https://web.archive.org/web/*/...
   - Try HTTPS version

Alert Channels

Channel Setup
Email SMTP (Gmail, SendGrid, etc.)
Slack Webhook URL
Webhook Any HTTP endpoint
Console Default output

Quick Start

Installation

# Clone the repository
git clone https://github.com/Sheraz-k/ghostlink.git
cd ghostlink

# Or download directly
curl -O https://github.com/Sheraz-k/ghostlink/main/ghostlink.py

Basic Hunt

python ghostlink.py https://yourwebsite.com

Generate HTML Report

python ghostlink.py https://yourwebsite.com --output ghost-report.html

Deep Scan

python ghostlink.py https://yourwebsite.com --depth 5 --max-pages 1000

With Slack Alerts

python ghostlink.py https://yourwebsite.com \
  --slack-webhook https://hooks.slack.com/services/XXX/YYY/ZZZ

Command Line Options

Usage: ghostlink.py [OPTIONS] URL

Scan Options:
  --depth, -d        Maximum crawl depth (default: 3, max: 20)
  --max-pages, -m    Maximum pages to scan (default: 500, max: 50000)
  --timeout, -t      Request timeout in seconds (default: 10, max: 120)
  --delay            Delay between requests (default: 0.2s)
  --no-external      Don't check external links
  --no-images        Don't check images
  --ignore           URL patterns to ignore (regex)
  --threads          Number of threads (default: 5)

Output Options:
  --output, -o       Output file path (.html or .json)
  --format, -f       Output format: html, json, console

Alert Options:
  --email            Email address for alerts
  --smtp-host        SMTP host (default: smtp.gmail.com)
  --smtp-port        SMTP port (default: 587)
  --smtp-user        SMTP username
  --smtp-pass        SMTP password
  --slack-webhook    Slack webhook URL
  --webhook          Generic webhook URL
  --alert-all        Alert on all ghosts, not just new ones

Security

GhostLink is security-hardened with multiple layers of protection:

SSRF Protection

  • Blocks scanning of localhost, 127.0.0.1
  • Blocks private IP ranges (10.x, 172.16.x, 192.168.x)
  • Blocks dangerous protocols (file://, javascript://, data://)

Input Validation

  • URL length limits (2048 chars)
  • Config value validation (depth, timeout, etc.)
  • Regex pattern validation for ignore rules
  • Email format validation
  • Webhook URL validation

Resource Protection

  • Response size limits (10MB max)
  • Redirect loop detection (max 5 redirects)
  • Rate limiting between requests
  • Timeout enforcement

Output Safety

  • HTML escaping in reports
  • Path traversal prevention
  • System directory write protection

Scheduling Daily Hunts

Linux/Mac (Cron)

# Run daily at 6 AM
0 6 * * * python /path/to/ghostlink.py https://yoursite.com --slack-webhook https://hooks.slack.com/...

GitHub Actions

name: Daily Ghost Hunt

on:
  schedule:
    - cron: '0 6 * * *'

jobs:
  hunt:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: |
          python ghostlink.py https://yoursite.com \
            --output report.html \
            --slack-webhook ${{ secrets.SLACK_WEBHOOK }}
      - uses: actions/upload-artifact@v4
        with:
          name: ghost-report
          path: report.html

Examples

Hunt Internal Ghosts Only

python ghostlink.py https://example.com --no-external

Ignore Certain Paths

python ghostlink.py https://example.com \
  --ignore "/admin/*" \
  --ignore "/api/*" \
  --ignore "*.pdf"

Full Site Audit

python ghostlink.py https://example.com \
  --depth 10 \
  --max-pages 5000 \
  --output full-audit.html

CI/CD Integration

# Exit code 1 if ghosts found
python ghostlink.py https://staging.example.com || exit 1

Output Formats

HTML Report

Beautiful, shareable report with:

  • Summary statistics with visual indicators
  • Ghost links with error details
  • Fix suggestions for each ghost
  • Pages where ghosts were found

JSON Export

{
  "site_url": "https://example.com",
  "scan_completed": "2024-01-15T10:30:00",
  "total_pages_scanned": 150,
  "broken_links": [
    {
      "url": "https://example.com/missing",
      "status_code": 404,
      "found_on": ["https://example.com/blog"],
      "suggestions": ["Similar: https://example.com/missing-page"]
    }
  ]
}

Console Output

============================================================
  GhostLink Scan Report
============================================================
  Site: https://example.com
  Pages Scanned:    150
  Ghosts Found:     3
============================================================

[X] GHOST LINKS:
----------------------------------------
  [404] https://example.com/old-page.html
      Found on: https://example.com/sitemap
      Suggestions:
        - Similar URL found: https://example.com/new-page.html
        - Check Wayback Machine

Contributing

Contributions welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE for details.

Author

Sheraz Khan - @Sheraz-k


If GhostLink helps you hunt down ghosts, please give it a ⭐

Report BugRequest Feature

No ghosts were harmed in the making of this software 👻

About

Hunt down dead links before they haunt your users - Security-hardened broken link detector with fix suggestions, alerts, and beautiful reports

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages