MSD split that I've used for my experiments in the papers
Sample-level Deep Convolutional Neural Networks for Music Auto-tagging using Raw Waveforms, SMC(to be published), 2017 and
Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural Networks for Music Auto-tagging, IEEE SPL(to be published), 2017.
The split is based on the split version in https://github.com/keunwoochoi/MSD_split_for_tagging. The difference is that we have only left a list with more than 29.1 seconds of audio.
filtered_list_train.cP: train list (I used first 201680 IDs for training set and rest 11774 IDs for validation)
filtered_list_test.cP: test list (total 28435 MSD IDs)
7D_id_to_path.pkl: dictionary (Keys: 7digital ID, Values: file_path (i.e. "8/9/8947470.clip.mp3") )
MSD_id_to_7D_id.pkl: dictionary (Keys: MSD ID, Values: 7digital ID)
msd_id_to_tag_vector.cP: dictionary (Keys: MSD ID, Values: 50 tag vectors)
50tagList.txt: 50 tag list
In python, by using these commands we can simply use given data.
import cPickle as cP
train_list = cP.load(open('filtered_list_train.cP','r'))
valid_list = train_list[201680:]
train_list = train_list[0:201680]
test_list = cP.load(open('filtered_list_test.cP','r'))
id7d_to_path = cP.load(open('7D_id_to_path.pkl','r'))
idmsd_to_id7d = cP.load(open('MSD_id_to_7D_id.pkl','r'))
idmsd_to_tag = cP.load(open('msd_id_to_tag_vector.cP','r'))
# for loading a file (train_list[0])
dir_path = "YOUR PATH"
file_dir = dir_path + id7d_to_path[idmsd_to_7d7d[train_list[0]]]
# for loading 50 tag vector of (train_list[0])
tag_boolean = idmsd_to_tag[train_list[0]]