Expedia-Hotel-Recommendation

About -

Employing various machine learning models like Random Forest Classifier, Keras, and Support Vector Machine, we endeavored to identify the optimal model for accurately categorizing recommendations into the correct cluster among 100 target clusters. In addressing the challenge posed by a large dataset comprising 37 million rows, we adopted three distinct strategies to reduce dataset size while ensuring the retention of crucial features through Principal Component Analysis.

These approaches included: i) utilizing a dataset of randomly selected 50,000 unique users, ii) working with a dataset focusing on the top 10 clusters, and iii) implementing clustering as a pre-processing step, involving the creation of an optimal number of clusters derived from the target clusters. Notably, the third method emerged as the most effective, particularly in conjunction with Keras, showcasing outstanding accuracies of 87.1% and 93% for 10 and 20 clusters, respectively. This successful outcome underscores the importance of strategic dataset reduction and pre-processing techniques in overcoming the challenges posed by large datasets and enhancing the accuracy of machine learning models.

Steps followed -

Filter train data removing is_booking with 0 value.
Filter users removing agents
Update column orig_destination_distance with category column to remove null values
Convert object format to date format-srch_ci,srch_co,date_time
Create new columns -stay_dur,no_of_days_before_booking,current_mon,current_year,srch_ci_day,srch_ci_mon,srch_ci_year,srch_co_mon,srch_co_year
Drop date columns -date_time,srch_ci,srch_co,is_booking
Update orig_destination_distance with label encoding
Perform PCA on destination data set
Merge updated destination set with train data
Check for null values and update empty cells with 0
Understand correlation with Target variable
To deal with the large dataset and limited computational power, 3 major methods of reducing the dataset were used:
- Data of random 50,000 unique users
- Data of top 10 clusters
- Creating clusters of the target variable (merging multiple clusters into 1 to reduce the total number of target clusters)
Train & Testing

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Cluster_10		Cluster_10
Clustering		Clustering
Users50k		Users50k
images		images
Expedia Hotel Recommendation System Report.pdf		Expedia Hotel Recommendation System Report.pdf
Project Final PPT_Codes.ipynb		Project Final PPT_Codes.ipynb
README.md		README.md
clean_dataset.ipynb		clean_dataset.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expedia-Hotel-Recommendation

About -

Steps followed -

Results

About

Releases

Packages

Languages

ashita03/Expedia-Hotel-Recommendation

Folders and files

Latest commit

History

Repository files navigation

Expedia-Hotel-Recommendation

About -

Steps followed -

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages