A project for CSC 422 Automated Learning & Data Analysis at NC State.
We hoped to be able to predict an albums popularity on the year end Billboard top charts based on various acoustic features. Our models assumed an album was popular if the rank was ≤ 25 or not popular if the rank was > 25.
In order to assess whether or not an album is popular, we utilized different machine learning models:
- Naive-Bayes
- Decision Tree (utilizing Information Gain and Entropy)
- Support Vector Machine
- Deep Neural Networks
- Full Album Data with Acoustic Features (Link to Dataset)
Created using data from:
- Acoustic and meta features of albums and songs on the Billboard 200
- The Billboard Year End Top Albums List
The first dataset was used for the acoustic features and the the Top Albums List was scraped for the album name
We performed a 70/30 training testing spilt and standardized the data
Model | Accuracy |
---|---|
Naive Bayes Model (Gaussian) | 74.9% |
Decision Tree Model (Gini) | 86.5% |
SVM Model | 85.3 % |
2-NN + 10-Fold CV | 85.58% |
Deep Neural Network | 86.00% |
- Make sure you have installed virtualenv, or if not then run
pip3 install virtualenv
- Create the python three virtual environment
virtualenv venv
- Start the environment
source venv/bin/activate
- Automatically install all relevant dependencies using the following command
pip install -r requirements.txt
Allow dataset_download.sh
permission to execute by running
$ chmod +x dataset_download.sh
Download the data byt running
$ ./dataset_download.sh
The training and testing data should be available in data/
In the root folder of the program run this command to start the virtual environment
$ source venv/bin/activate
After the virtual environment has started run this command to start the program
$ python models/decision_tree.py
$ python models/knn_model.py
$ python models/naive_bayes_model.py
$ python models/neural_net.py
$ python models/svm_model.py