This project uses the dataset Aquaculture - Water Quality Dataset
(Veeramsetty, Venkataramana; Arabelli, Rajeshwarrao; Bernatin, T., 2024) to train and test three different classifiers:
Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). Finally, the best-performing model is used to classify real samples of water.
You can access the app directly using this Streamlit link.
-
Clone the Repository
git clone https://github.com/Josemtobon/Water-Quality-Analysis-and-Prediction.git cd Water-Quality-Analysis-and-Prediction
-
Install Dependencies
pip install -r requirements.txt
-
Run the Streamlit App
streamlit run Home.py
├── best_model.joblib # Trained model with the best overall performance
├── clf_eval.py # Script to train and test models
├── data
│ ├── params.tsv # Parameters choose for models
│ ├── performance_results.tsv # Performance metrics
│ └── WQD.tsv # Dataset file
├── Home.py # Main Streamlit app
├── images # Confusion matrices and ROC curve visualizations
│ ├── confusion_matrix_K-Nearest Neighbors.png
│ ├── confusion_matrix_Random Forest.png
│ ├── confusion_matrix_SVM.png
│ ├── roc_curve_K-Nearest Neighbors.png
│ ├── roc_curve_Random Forest.png
│ └── roc_curve_SVM.png
├── pages # Streamlit pages
│ ├── 2_📊_Distribution of Parameters.py
│ ├── 3_⚙️-_Analysis_of_Classifiers.py
│ └── 4_🧪_Classify Water Quality.py
├── requirements.txt # Dependecies
└── scaler.joblib # Preprocessing scaler file
Dataset: Aquaculture - Water Quality Dataset by Veeramsetty, Venkataramana; Arabelli, Rajeshwarrao; Bernatin, T. (2024).