Welcome to the Mail/SMS Spam Detection project! This project aims to build a machine learning model to classify Mail and SMS messages as either "spam" or "ham" (non-spam). We use a dataset containing labeled messages for training and evaluation.
E-Mail & SMS spam is a common issue that affects web users. The goal of this project is to develop a robust E-Mail/SMS spam detection model that can automatically classify incoming text messages as spam or not. We leverage natural language processing (NLP) techniques and machine learning algorithms like Naive Bayes, K Nearest Neighbours, Random Forest, and more to achieve this.
The project is organized as follows:
- dataset: This folder contains the dataset used, named "spam.csv."
- resources: This folder includes images and other files related to the project.
- .gitignore: Gitignore file to specify which files or directories should be ignored in version control.
- AboutTheCode.md: Detailed explanation of the code and its components.
- app.py: A Streamlit web app to host the model locally for interactive testing.
- code.txt: A backup text file containing the code used in the project.
- model.pkl: Serialized machine learning model for SMS spam detection.
- vectorizer.pkl: Serialized feature vectorizer (TF-IDF or Count Vectorizer) used for text data.
- requirements.txt: List of Python packages and dependencies required to run the project.
- nltk.txt: Text file containing NLTK library imports, including stopwords and punkt.
- spam-detection.ipynb: Jupyter Notebook source file containing the code for data preprocessing, model training, and evaluation.
To get started with this project, follow these steps:
- Clone the repository to your local machine:
git clone https://github.com/arindal1/email-spam-detector.git
- Navigate to the project directory:
cd email-spam-detector
- Create a Python virtual environment (recommended):
python -m venv venv
- Activate the virtual environment:
# On Windows
venv\Scripts\activate
# On macOS and Linux
source venv/bin/activate
- Install the project dependencies:
pip install -r requirements.txt
-
Explore the Jupyter Notebook (
spam-detection.ipynb
) for in-depth details on data preprocessing, model training, and evaluation. -
To run the Streamlit app, execute the following command:
streamlit run app.py
This will launch a local web app for SMS spam detection.
- Python
- Jupyter Notebook
- StreamLit
Note: I tried to host the model globally using Heroku, but couldn't because of their Billing policies. If you have Heroku, you can create a new app and host the model globally.
The dataset used in this project is available on Kaggle: SMS Spam Collection Dataset. It contains SMS messages labeled as "spam" or "ham." You can download the dataset from the provided link and place it in the "dataset" folder.
For detailed information about the code and its components, please refer to the AboutTheCode.md file in this repository.
See the open issues for a list of proposed features (and known issues).
Contributions to this project are welcome! If you have any ideas, bug reports, or feature requests, please open an issue or submit a pull request. Contributions are what make the open-source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
If you have any questions or suggestions related to this project, you can reach out to me at:
- GitHub: arindal1
- LinkedIn: arindalchar
Note: This is a personal project created for educational and demonstrative purposes. I made this project just for fun and learn more about Machine Learning in the process, and also record my progress in this field. Feel free to customize the content, links, and images to match your project's specifics.