Authors: Da-Yi Wu*, Wen-Yi Hsiao*, Fu-Rong Yang*, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang
*equal contribution
Official PyTorch Implementation of ISMIR2022 paper "DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation".
In this repository:
- We propose a novel singing vocoders based on subtractive synthesizer: SawSing
- We present a collection of different ddsp singing vocoders
- We demonstrate that ddsp singing vocoders have relatively small model size but can generate satisfying results with limited resources (1 GPU, 3-hour training data). We also report the result of an even more stringent case training the vocoders with only 3-min training recordings for only 3-hour training time.
pip install -r requirements.txt
Please refer to dataset.md for more details.
Train vocoders from scratch.
- Modify the configuration file
..config/<model_name>.yaml
- Run the following command:
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml \
--stage training \
--model SawSinSub
- Change
--model
argument to try different vocoders. Currently, we have 5 models:SawSinSub
(Sawsing),Sins
(DDSP-Add),DWS
(DWTS),Full
,SawSub
. For more details, please refer to our documentation - DDSP Vocoders.
Our training resources: single Nvidia RTX 3090 Ti GPU
Run validation: compute loss and real-time factor (RTF).
- Modify the configuration file
..config/<model_name>.yaml
- Run the following command:
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml \
--stage validation \
--model SawSinSub \
--model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
--output_dir ./test_gen
Synthesize audio file from existed mel-spectrograms. The code and specfication for extracting mel-spectrograms can be found in preprocess.py
.
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml \
--stage inference \
--model SawSinSub \
--model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
--input_dir ./path/to/mel
--output_dir ./test_gen
In Sawsing, we found there are buzzing artifacts in the harmonic part singals, so we develop a post-processing codes to remove them. The method is simple yet effective --- applying a voiced/unvoiced mask. For more details, please refer to here.
- Checkpoints
- Sins (DDSP-Add):
./exp/f1-full/sins/ckpts/
- SawSinSub (Sawsing):
./exp/f1-full/sawsinsub-256/ckpts/
- The full experimental records, reports and checkpoints can be found under the
exp
folder.
- Sins (DDSP-Add):
- Documentation
@article{sawsing,
title={DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation},
author={Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang},
journal = {Proc. International Society for Music Information Retrieval},
year = {2022},
}