Skip to content

Exploring Machine Learning methods to provide the best recommendation for Expedia

Notifications You must be signed in to change notification settings

ashita03/Expedia-Hotel-Recommendation

 
 

Repository files navigation

Expedia-Hotel-Recommendation

About -

Employing various machine learning models like Random Forest Classifier, Keras, and Support Vector Machine, we endeavored to identify the optimal model for accurately categorizing recommendations into the correct cluster among 100 target clusters. In addressing the challenge posed by a large dataset comprising 37 million rows, we adopted three distinct strategies to reduce dataset size while ensuring the retention of crucial features through Principal Component Analysis.

These approaches included: i) utilizing a dataset of randomly selected 50,000 unique users, ii) working with a dataset focusing on the top 10 clusters, and iii) implementing clustering as a pre-processing step, involving the creation of an optimal number of clusters derived from the target clusters. Notably, the third method emerged as the most effective, particularly in conjunction with Keras, showcasing outstanding accuracies of 87.1% and 93% for 10 and 20 clusters, respectively. This successful outcome underscores the importance of strategic dataset reduction and pre-processing techniques in overcoming the challenges posed by large datasets and enhancing the accuracy of machine learning models.

Steps followed -

  • Filter train data removing is_booking with 0 value.
  • Filter users removing agents
  • Update column orig_destination_distance with category column to remove null values
  • Convert object format to date format-srch_ci,srch_co,date_time
  • Create new columns -stay_dur,no_of_days_before_booking,current_mon,current_year,srch_ci_day,srch_ci_mon,srch_ci_year,srch_co_mon,srch_co_year
  • Drop date columns -date_time,srch_ci,srch_co,is_booking
  • Update orig_destination_distance with label encoding
  • Perform PCA on destination data set
  • Merge updated destination set with train data
  • Check for null values and update empty cells with 0
  • Understand correlation with Target variable
  • To deal with the large dataset and limited computational power, 3 major methods of reducing the dataset were used:
    • Data of random 50,000 unique users
    • Data of top 10 clusters
    • Creating clusters of the target variable (merging multiple clusters into 1 to reduce the total number of target clusters)
  • Train & Testing

Results

Results

About

Exploring Machine Learning methods to provide the best recommendation for Expedia

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%