Lei Mao
University of Chicago
This is a singing voice sepration tool developed using recurrent neural network (RNN). It could seperate the singer voice and the background music from the original song. It is still in the development stage since the separation has not been perfect yet. Please check the demo for the performance.
- Python 3.5
- Numpy 1.14
- TensorFlow 1.8
- RarFile 3.0
- ProgressBar2 3.37.1
- LibROSA 0.6
- FFmpeg 4.0
- Matplotlib 2.1.1
- MIR_Eval 0.4
.
├── demo
├── download.py
├── evaluate.py
├── figures
├── LICENSE.md
├── main.py
├── model
├── model.py
├── preprocess.py
├── README.md
├── songs
├── statistics
├── train.py
└── utils.py
Multimedia Information Retrieval, 1000 song clips (MIR-1K), dataset for singing voice separation.
To download the whole dataset, and split into train, validation, and test set, in the terminal:
$ python download.py
To train the model, in the terminal:
$ python train.py
The training took roughly 45 minutes for 50,000 iterations on the train set of MIR-1K dataset using NVIDIA GTX TITAN X graphic card.
The program loads all the MIR-1K dataset into memory and stores all the processed MIR-1K data in the memory to accelerate the data sampling for training. However, this may cosume more than 10 GB of memory.
The trained model would be saved to the model
directory.
To evaludate the model, in the terminal:
$ python evaluate.py
The evaluation took roughly 1 minute on the test set of MIR-1K dataset using NVIDIA GTX TITAN X graphic card. The separated sources, together with the monaural source, would be saved to the demo
directory.
GNSDR | GSIR | GSAR | |
---|---|---|---|
Vocal | 7.40 | 12.75 | 9.34 |
BGM | 7.45 | 13.17 | 9.25 |
To do: The evaluation statistics would be saved.
To separate sources for customized songs, put the MP3 formatted songs to the songs
directory, in the terminal:
$ python main.py
The separated sources, together with the monaural source, would be saved to the demo
directory.
The MP3 of "Backstreet Boys - I want it that way", backstreet_boys-i_want_it_that_way.mp3
, was put to the songs
directory. Using the pre-trained model in the model
diretory, in the terminal:
$ python main.py
The separated sources, backstreet_boys-i_want_it_that_way_src1.mp3
and backstreet_boys-i_want_it_that_way_src2.mp3
, together with the monaural source, backstreet_boys-i_want_it_that_way_mono.mp3
, were saved to the demo
directory.
- Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis, Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks. 2014.
- Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis, Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation. 2015.
- Dabi Ahn's Music Source Separation Repository
- Evaluation metrics
- Hyper parameter tuning
- Argparse