Housing price prediction using K Nearest Neighbour Regression

The notebook presents a pipeline to predict Housing Prices, given 79 features that might influence the prices. The project utilizes KNN in order to make the predictions and K Means Clustering to analyze the data. Then, the model is deployed via Gradio. The following methodology has been used for the same:

A) Segregation of Features: Features are segregated into three segments: Continuous, Categorical Values, Discrete Numerical Values

B) Exploratory Data Analysis

Metrics/Toos Used: Coefficient of Variation, Box Plots, Correlation Matrix

Outcome: To analyze the outliers, Relevance of certain features, and correlation between features

C) Data Preprocessing

I) Removing features with a huge amount of missing data

II) Imputation of Missing data for remaining columns via KNN Imputation for Numerical Data (continuous and discrete both)

III) Mode-Based Imputation of Object-typed Categorical Values

IV) Winsorizing of data to remove outliers

V) Removing the features which have a Coefficient of Variation of less than or equal to 1

VI) Splitting Object and Num type features and tokenizing Object-typed features, then combining the data

VII) Elimination of features based on a Mutual Info Classification Threshold (heuristically determined threshold)

VIII) Scaling the data via minmax scaler

D) Fitting the data in a KNN Model (n = 19)

E) Evaluation of Root Mean Squared Error of log of the output (RMSE = 0.19838051208068913)

F) Implementing K-Means Clustering for insights into the model (No useful insights obtained)

G) Building a user-friendly interface using Gradio

Dependencies:

pandas, numpy, matplotlib, seaborn, csv, sklearn, scipy, warnings
openai, tiktoken, kaleido, cohere, typing-extensions, jiwer, gradio typing-extensions, cast_control==0.10.11

Important Instructions:

train.csv and test.csv must be in the same directory as that of the .ipynb file to successfully load them.
Kernel Must be restarted post installation of the packages and modules mentioned in point (2) to import gradio successfully.

Submission.csv contains the Predicted Sale Price values for the test dataset

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
HousePrices_KNN_Himanshu.ipynb		HousePrices_KNN_Himanshu.ipynb
README.md		README.md
Submission.csv		Submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Housing price prediction using K Nearest Neighbour Regression

About

Releases

Packages

Languages

BlakkTyger/HousingPricesPrediction_KNN

Folders and files

Latest commit

History

Repository files navigation

Housing price prediction using K Nearest Neighbour Regression

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages