- Bryan Liauw @bryanliauw
- Dick Jovian @jovian6
Big Data Final Project to create Recommendation System using Alternating Least Squares. This Recommendation uses explicit data such as rating as input to methods. We use Pyspark to process this massive netflix data
- We formatted data in netflix in
combined_data_1.txt
,combined_data_2.txt
,combined_data_3.txt
,combined_data_4.txt
to txt and then change it to .csv files - Then the data is ready to use
- Install Java 8 to run Pyspark https://www.oracle.com/java/technologies/downloads/#java8
- Install Pyspark
- Install Jupyter Notebook
- run main.ipynb in Jupyter Notebook
https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data