Diabetes Prediction Project

Overview

This project is a supervised machine learning project focused on predicting the likelihood of diabetes in individuals based on certain diagnostic measurements. The dataset used for this project is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.

Project Description

In this project, we aim to build a binary classification model to predict whether a patient has diabetes or not. We have explored the dataset, performed data preprocessing, developed machine learning models, and evaluated their performance using various metrics.

Dataset

Source: Kaggle Diabetes Dataset
Description: The dataset contains several diagnostic measurements such as glucose level, BMI, blood pressure, etc., along with the target variable indicating the presence or absence of diabetes.

Exploratory Data Analysis (EDA)

Conducted exploratory data analysis to gain insights into the dataset.
Explored missing values, data distributions, correlations, and potential outliers.

Data Preprocessing

Handled missing values by imputing with mean values.
Scaled and normalized numeric columns.
Performed feature engineering and addressed imbalanced data.

Machine Learning Models

Developed two models: Logistic Regression and Random Forest.
Tuned hyperparameters using grid search or randomized search.
Utilized cross-validation to assess model generalization.

Evaluation

Evaluated model performance using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.
Compared the performance of Logistic Regression and Random Forest models.

Findings

Logistic Regression generally outperformed Random Forest in terms of accuracy, precision, and F1-score.
Cross-validation results indicated stable performance for both models.
Feature importance analysis was used, primarily for Random Forest.

Next Steps

Further hyperparameter tuning and feature selection.
Consideration of model interpretability and domain-specific requirements.
Deployment of the chosen model for real-world predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Final Project - Description - SupervisedLearning.docx		Final Project - Description - SupervisedLearning.docx
Final Project - Description.docx		Final Project - Description.docx
Final Project Rubric - Machine Learning.xlsx		Final Project Rubric - Machine Learning.xlsx
README.md		README.md
Supervised Learning - Project.ipynb		Supervised Learning - Project.ipynb
diabetes.csv		diabetes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Prediction Project

Overview

Table of Contents

Project Description

Dataset

Exploratory Data Analysis (EDA)

Data Preprocessing

Machine Learning Models

Evaluation

Findings

Next Steps

About

Releases

Packages

Languages

inderpalk/diabetes_data-analysis

Folders and files

Latest commit

History

Repository files navigation

Diabetes Prediction Project

Overview

Table of Contents

Project Description

Dataset

Exploratory Data Analysis (EDA)

Data Preprocessing

Machine Learning Models

Evaluation

Findings

Next Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages