A recommender made in PySpark, using LastFM 360k User Dataset Assumes you have sources the LastFM 360k User datset and have a PySpark cluster running. For this notebook, you could get away with one machine. See the lastFM.pdf for slides.
You will need to provide your own 360k User LastFM dataset (Should be able to find in a web search) and run a Spark cluster in distributed or standalone mode.
Download data from: http://mtg.upf.edu/static/datasets/last.fm/lastfm-dataset-360K.tar.gz
- python -m venv .als
- source .als/bin/activate
- pip install -r requirements
- LastFM-Munging cleans and structures the data.
- LastFM-Descriptive is for summary stats and some additional munging.
- LastFM-Model is the model notebook.
Note: For the model, you do not have to run the grid search (large hyper-parameter selection loop), as it does take some time and really not necessary.