VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Snythesis [paper]
This repositroy contains PyTorch implementation of the Korean VISinger architecture, along with examples. Feel free to use/modify the code.
Architecture of VISInger
## We tested on Linux/Ubuntu 20.04.
## Install Python 3.8+ first (Anaconda recommended).
export PYTHONPATH=.
# build a virtual env (recommended).
conda create -n venv python=3.8
conda activate venv
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
The supported datasets are
- CSD: a single-singer Korean datasets contains 2.12 hours in total.
Run base_preprocess.py for preprocessing.
python preprocessor/runs/base_preprocess.py --config config/datasets/svs/csd/preprocess.yaml
After that, run base_binarize.py for training.
python preprocessor/runs/base_binarize.py --config config/datasets/svs/csd/preprocess.yaml
Trian model with
CUDA_VISIBLE_DEVICES=0 python tasks/runs/run.py --config config/models/visinger.yaml --exp_name "[dir]/[folder_name]"
You have to download the pretrained models (will be uploaded) and put them in ./checkpoints/svs/visinger
. You have to prepare MIDI data which contains lyrics with the same amount of notes. We uploaded the sample file in ./data/source/svs/new_midi/
(will be uploaded).
You can inference new singing voice with
python inference/visinger.py
please setting the file path of MIDI data in ./inference/visinger.py
.
- Korean singing voice synthesis (SVS) do not requires duration prediction. We just split the each syallble into three components:
onset
,nucleus
, andcoda
. SVS has a long vowel duration and thenucleus
of Korean syllable is equivalent to the vowel. In this repository, we assignedonset
andcoda
to a maximum three frames and assigned the remaining frames to thenucleus
. - We will upload the checkpoints of VISinger trained on CSD datasets (will be upload after march 2023)
Our codes are influenced by the following repos: