-
This is a PyTorch implementation of Microsoft's FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
-
Now supporting about 900 speakers in 🔥 LibriTTS for multi-speaker text-to-speech.
This project supports 2 muti-speaker datasets:
- LJSpeech
-
LibriTTS
-
VCTK
Configurations are in:
- config/dataset.yaml
- config/hparams.py
Please modify the dataest and mfa_path in hparams.
In this repo, we're using MFA v1. Migrating to MFA v2 is a TODO item.
- preprocess.py
- train.py
- synthesize.py
[DATASET] / wavs / speaker / wav_files [DATASET] / txts / speaker / txt_files
- wav_dir : the folder containing speaker dirs ( [DATASET] / wavs )
- txt_dir : the folder containing speaker dirs ( [DATASET] / txts )
- save_dir : the output directory (e.g. "./processed" )
- --prepare_mfa : create mfa_data
- --mfa : create textgrid files
- --create_dataset : generate mel, phone, f0 ....., metadata.json
- LJSpeech:
#run the script for organizing LJSpeech first
python ./script/organizeLJ.py
python preprocess.py /storage/tts2021/LJSpeech-organized/wavs /storage/tts2021/LJSpeech-organized/txts ./processed/LJSpeech --prepare_mfa --mfa --create_dataset
- LibriTTS:
python preprocess.py /storage/tts2021//LibriTTS/train-clean-360 /storage/tts2021//LibriTTS/train-clean-360 ./processed/LibriTTS --prepare_mfa --mfa --create_dataset
- VCTK:
python preprocess.py /storage/tts2021/VCTK-Corpus/wav48/ /storage/tts2021/VCTK-Corpus/txt ./processed/VCTK --prepare_mfa --mfa --create_dataset
- spker table
- traning data
- validation data
- data_dir : the preprocessed data directory
- --comment: some comments
- LJSpeech:
python train.py ./processed/LJSpeech --comment "Hello LJSpeech"
- LibriTTS:
python train.py ./processed/LibriTTS --comment "Hello LibriTTS"
- VCTK:
python train.py ./processed/VCTK --comment "Hello VCTK"
- --ckpt_path: the checkpoint path
- --output_dir: the directory to put the synthesized audios
python synthesize.py --ckpt_path ./records/LJSpeech_2021-11-22-22:42/ckpt/checkpoint_125000.pth.tar --output_dir ./output
- FastSpeech 2: Fast and High-Quality End-to-End Text to Speech, Y. Ren, et al.
- FastSpeech: Fast, Robust and Controllable Text to Speech, Y. Ren, et al.
- xcmyz's FastSpeech implementation
- rishikksh20's FastSpeech2 implementation
- TensorSpeech's FastSpeech2 implementation
- NVIDIA's WaveGlow implementation
- seungwonpark's MelGAN implementation