ASD (AI Spam Detector) is a comprehensive, production-ready spam detection system. It's built with a modern stack including Python, Flask, PyTorch, and a fine-tunable DistilBERT transformer model. This project is designed to be a practical tool for developers and end-users, offering multiple ways to integrate spam detection into your workflow.
- Accurate Spam Detection: Utilizes a DistilBERT model that can be fine-tuned on your own datasets for higher accuracy.
- REST API: A Flask-based API for batch email predictions, allowing integration with any application.
- Interactive Web Dashboard: A user-friendly interface to check individual emails and view real-time statistics.
- IMAP Integration: Automatically scan your email accounts (like Gmail or Outlook) for spam.
- Gmail Browser Extension: A client-side extension that integrates spam detection directly into the Gmail interface.
- Modular Architecture: Use individual components or run the entire system as an integrated application.
- Easy Setup: A simple setup script to get you started in minutes.
The system is designed in a modular way, with different components for different functionalities. Here's a high-level overview of the architecture:
┌─────────────────────────────────────────────────────┐
│ User Interfaces │
├──────────────────┬──────────────────┬───────────────┤
│ Gmail Browser │ Web Dashboard │ REST API │
│ Extension │ (Port 5001) │ (Port 5000) │
│ (JavaScript) │ │ │
└────────┬─────────┴────────┬─────────┴────────┬──────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌─────────────▼─────────────┐
│ Flask Applications │
│ ├─ email_processor.py │
│ ├─ run_api.py │
│ └─ email_integration.py │
└────────────┬──────────────┘
│
┌────────────▼─────────────┐
│ Core ML Engine │
│ (spam_detec.py) │
│ ├─ DistilBERT Model │
│ ├─ Tokenizer │
│ └─ Trainer │
└────────────┬─────────────┘
│
┌──────▼──────┐
│ PyTorch GPU │
│ or CPU │
└─────────────┘
spam_detec.py: This is the core of the system, containing theEmailSpamDetectorclass which handles all the machine learning logic, from loading the model to training and prediction.run_training.py: This script is used to train the spam detection model on your own dataset.run_api.py: This script runs a Flask server that exposes a REST API for batch spam predictions.email_processor.py: This script runs another Flask server with a web dashboard for checking individual emails.email_integration.py: This script connects to your email account via IMAP and scans for spam.email_extension/: This directory contains the source code for the Gmail browser extension.master_startup.py: This is an orchestrator script that can start and manage all the services of the system.quick_setup.py: This script helps you to set up the environment and install all the required dependencies.
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Run the quick setup script: This will create a
requirements.txtfile and install all the necessary packages.python quick_setup.py
You can fine-tune the model on your own dataset. The training data should be a CSV file with 'text' and 'label' columns (0 for legitimate, 1 for spam).
python run_training.pyThe trained model will be saved in the ./trained_spam_model directory. If you skip this step, the system will use the pre-trained DistilBERT model.
You can start all the services using the master_startup.py script.
python master_startup.pyThis will:
- Check for dependencies.
- Check the model status.
- Start the API server on port 5000.
- Start the email processor and dashboard on port 5001.
- Display a status dashboard with links to the services.
- REST API:
http://localhost:5000GET /health: Health check.POST /predict: Batch email predictions.- Request body:
{"emails": ["email1_content", "email2_content"]}
- Request body:
- Web Dashboard:
http://localhost:5001/dashboard
You can use email_integration.py to scan your email account. You'll need to provide your email credentials and IMAP server details in the script.
The email_extension can be loaded as an unpacked extension in Chrome or Firefox. It will automatically connect to the email processor service running on port 5001.
This project uses the following major libraries:
transformerstorchpandasnumpydatasetsscikit-learnflasktqdmacceleraterequests
A full list of dependencies is available in requirements.txt.
This project is licensed under the MIT License. See the LICENSE file for details.