Skip to content

DannyMPaul/ASD

Repository files navigation

ASD: AI-Powered Spam Detector

ASD (AI Spam Detector) is a comprehensive, production-ready spam detection system. It's built with a modern stack including Python, Flask, PyTorch, and a fine-tunable DistilBERT transformer model. This project is designed to be a practical tool for developers and end-users, offering multiple ways to integrate spam detection into your workflow.

Features

  • Accurate Spam Detection: Utilizes a DistilBERT model that can be fine-tuned on your own datasets for higher accuracy.
  • REST API: A Flask-based API for batch email predictions, allowing integration with any application.
  • Interactive Web Dashboard: A user-friendly interface to check individual emails and view real-time statistics.
  • IMAP Integration: Automatically scan your email accounts (like Gmail or Outlook) for spam.
  • Gmail Browser Extension: A client-side extension that integrates spam detection directly into the Gmail interface.
  • Modular Architecture: Use individual components or run the entire system as an integrated application.
  • Easy Setup: A simple setup script to get you started in minutes.

System Architecture

The system is designed in a modular way, with different components for different functionalities. Here's a high-level overview of the architecture:

┌─────────────────────────────────────────────────────┐
│              User Interfaces                        │
├──────────────────┬──────────────────┬───────────────┤
│  Gmail Browser   │  Web Dashboard   │  REST API     │
│  Extension       │  (Port 5001)     │  (Port 5000)  │
│  (JavaScript)    │                  │               │
└────────┬─────────┴────────┬─────────┴────────┬──────┘
         │                  │                  │
         └──────────────────┼──────────────────┘
                            │
              ┌─────────────▼─────────────┐
              │  Flask Applications       │
              │ ├─ email_processor.py     │
              │ ├─ run_api.py             │
              │ └─ email_integration.py   │
              └────────────┬──────────────┘
                           │
              ┌────────────▼─────────────┐
              │  Core ML Engine          │
              │  (spam_detec.py)         │
              │ ├─ DistilBERT Model      │
              │ ├─ Tokenizer             │
              │ └─ Trainer               │
              └────────────┬─────────────┘
                           │
                    ┌──────▼──────┐
                    │ PyTorch GPU │
                    │ or CPU      │
                    └─────────────┘

How It Works

  • spam_detec.py: This is the core of the system, containing the EmailSpamDetector class which handles all the machine learning logic, from loading the model to training and prediction.
  • run_training.py: This script is used to train the spam detection model on your own dataset.
  • run_api.py: This script runs a Flask server that exposes a REST API for batch spam predictions.
  • email_processor.py: This script runs another Flask server with a web dashboard for checking individual emails.
  • email_integration.py: This script connects to your email account via IMAP and scans for spam.
  • email_extension/: This directory contains the source code for the Gmail browser extension.
  • master_startup.py: This is an orchestrator script that can start and manage all the services of the system.
  • quick_setup.py: This script helps you to set up the environment and install all the required dependencies.

Setup and Installation

  1. Clone the repository:

    git clone <repository-url>
    cd <repository-directory>
  2. Run the quick setup script: This will create a requirements.txt file and install all the necessary packages.

    python quick_setup.py

Usage

1. Train the Model (Optional)

You can fine-tune the model on your own dataset. The training data should be a CSV file with 'text' and 'label' columns (0 for legitimate, 1 for spam).

python run_training.py

The trained model will be saved in the ./trained_spam_model directory. If you skip this step, the system will use the pre-trained DistilBERT model.

2. Start the System

You can start all the services using the master_startup.py script.

python master_startup.py

This will:

  • Check for dependencies.
  • Check the model status.
  • Start the API server on port 5000.
  • Start the email processor and dashboard on port 5001.
  • Display a status dashboard with links to the services.

3. Access the Services

  • REST API: http://localhost:5000
    • GET /health: Health check.
    • POST /predict: Batch email predictions.
      • Request body: {"emails": ["email1_content", "email2_content"]}
  • Web Dashboard: http://localhost:5001/dashboard

4. IMAP Email Integration

You can use email_integration.py to scan your email account. You'll need to provide your email credentials and IMAP server details in the script.

5. Gmail Browser Extension

The email_extension can be loaded as an unpacked extension in Chrome or Firefox. It will automatically connect to the email processor service running on port 5001.

Dependencies

This project uses the following major libraries:

  • transformers
  • torch
  • pandas
  • numpy
  • datasets
  • scikit-learn
  • flask
  • tqdm
  • accelerate
  • requests

A full list of dependencies is available in requirements.txt.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

AI_Spam_Detector using Python Flask Pytorch , Hugging Face Transformer "DistilBERT ". The model can be trained using your own datasets from your own Email. Application can be hosted on Local or Cloud (Integrated Docker Support)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages