Skip to content

CSE-5160/classification-algorithms-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Title

A brief description of your project and its purpose.

Table of Contents

Introduction

This project implements a machine learning model to predict labels based on textual data, utilizing various preprocessing techniques and evaluation metrics, including the classification report.

Installation

To set up the project, ensure you have Python 3.6 or later installed. You can install the necessary libraries using pip:

pip install pandas scikit-learn numpy


# Model Performance Comparison

This document compares the performance of four different machine learning models: Random Forest, Gradient Boosting, Logistic Regression, and Naive Bayes. The comparison is based on evaluation metrics including ROC AUC scores, accuracy, precision, recall, F1 scores, and time taken for evaluation.

## Models Evaluated

- Random Forest
- Gradient Boosting
- Logistic Regression
- Naive Bayes

## Evaluation Metrics

| Metric                       | Random Forest             | Gradient Boosting        | Logistic Regression      | Naive Bayes             |
|------------------------------|---------------------------|--------------------------|--------------------------|--------------------------|
| **Time Taken for Evaluation** | 88.93 seconds            | 315.17 second            | 15.20 seconds            | 34.53 seconds            |

## Summary of Results

- **Naive Bayes**:
  - Best performance across all metrics, including time efficiency, ROC AUC scores, accuracy, precision, recall, and F1 scores.
  
- **Gradient Boosting**:
  - Effective but slower and lower performance compared to Naive Bayes and Logistic Regression.
  
- **Random Forest**:
  - Similar performance to Gradient Boosting, with slightly lower scores across metrics.

## Conclusion

Based on the evaluation, **Naive Bayes** is recommended for its superior performance, efficiency, and excellent generalization across training and test data. For scenarios where interpretability is crucial, **Gradient boosting** is a good alternative.

About

Machine learning algorithm for classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors