Machine Learning Pipeline for Loan Delinquency Prediction as an analog of Deliquency on P2P Lending

This is a Python code for a complete machine learning pipeline to predict loan delinquency using the Ridge regression algorithm.The larger aim of this model is to act as a proof of concept for a Peer to Peer Lending on Dataset provided by Payments apps + apps like splitwise.

Dataset

The LendingClub dataset is a publicly available dataset consisting of data from LendingClub, an American peer-to-peer lending company. The dataset contains information on loans issued from 2007 to 2018 and includes information on loan characteristics such as loan amount, interest rate, loan grade, loan purpose, employment length, and borrower credit score. It also contains information on borrower demographic characteristics such as age, income, and home ownership status.The dataset original aswell as a tailored version is uploaded. The dataset has been tailored in accordance to the fact that all features applicable to Loan borrowing are not applicable in a P2P Lending scenario.

Dependencies

This code requires the following libraries:

numpy
pandas
scikit-learn
pickle

Data Cleaning and Preprocessing

The read_data() function reads the CSV file into a pandas DataFrame. The data_clean() function removes rows with missing values. The data_encoding() function encodes categorical features using one-hot encoding. The data_normalization() function scales the data and performs principal component analysis (PCA) for dimensionality reduction.

Model Training

The train() function splits the preprocessed data into training and testing sets, applies a polynomial degree to eliminate linearity problems, and trains a Ridge regression model using cross-validation to optimize hyperparameters. The trained model is saved as a pickle file.

Model Prediction

The predict() function loads the trained model and makes predictions on the test set. Predictions are clipped to a maximum value of 100 to represent percentage values.

API

app.py contains information to build an API using the model above to build web apps etc.

Hugging Face

Hugging Face contains a visualiazation of the data and predictions. Here is a link to the Hugging Face space

Usage

To use this code, simply run the train() function on your own loan data CSV file and adjust any hyperparameters as desired. Then, run the predict() function on the resulting trained model to make predictions on new loan data.To us the API using app.py run the code and open the speified url to run the api.

Disclaimer

This code is intended for educational purposes only and should not be used for actual loan delinquency prediction without careful consideration and validation.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
__pycache__		__pycache__
P2Pdeliquency.py		P2Pdeliquency.py
README.md		README.md
app.py		app.py
inputs.py		inputs.py
loans_clean_schema.csv		loans_clean_schema.csv
loans_full_schema.csv		loans_full_schema.csv
model.pickle		model.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Pipeline for Loan Delinquency Prediction as an analog of Deliquency on P2P Lending

Dataset

Dependencies

Data Cleaning and Preprocessing

Model Training

Model Prediction

API

Hugging Face

Usage

Disclaimer

About

Releases

Packages

Contributors 3

Languages

amaysood/Cybersprint

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Pipeline for Loan Delinquency Prediction as an analog of Deliquency on P2P Lending

Dataset

Dependencies

Data Cleaning and Preprocessing

Model Training

Model Prediction

API

Hugging Face

Usage

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages