Multispeaker Speech Synthesis with Feedback Constraint from speaker verificaiton

This is a tensorflow implementation of the multispeaker TTS network introduced in paper From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint. This repository also contains a deep speaker verification model that is used in multi-speaker TTS model as the feedback network. Synthesized samples are provided online.

Citation

@inproceedings{Cai2020,
  author={Zexin Cai and Chuxiong Zhang and Ming Li},
  title={{From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint}},
  year=2020,
  booktitle={Proc. Interspeech 2020}
}

Model Architecture

where the speaker embedding network is a ResNet-based network:

Training

Speaker verification model

The speaker verification model is located in directory deep_speaker. By default setting, the speaker verification model is trained with data Voxceleb 1 and Voxceleb 2. You can find the file list in the directory. Hyperparameters are set in vox12_hparams.py.

To train the speaker verificaiton model from scratch, prepare the data as listed in file list and run:

CUDA_VISIBLE_DEVICES=0 python train.py

TTS synthesizer (without feedback control)

By default setting, the synthesizer is trained using dataset VCTK.

Extract audio feature using process_audio.ipynb
Extract speaker embeddings using ipython notebook deep_speaker/get_gvector.ipynb

Train a baseline multispeaker TTS system

CUDA_VISIBLE_DEVICES=0 python synthesizer_train.py vctk datasets/vctk/synthesizer

Feel free to evaluate and synthesize samples using syn.ipynb during training

Neural vocoder (WaveRNN)

By default setting, the vocoder is also trained using dataset VCTK. It would be easy after you have the acoustic feature extracted from the previous section (TTS synthesizer). For better performance, please use GTA Mel-spectrogram obtained by vocoder_preprocess.py after the synthesizer training is finished.

CUDA_VISIBLE_DEVICES=0 python vocoder_train.py -g --syn_dir datasets/vctk/synthesizer vctk datasets/vctk

TTS synthesizer with feedback constraint

Set the path to the two pretrained model (the speaker verification model and the multispeaker synthesizer) by changing the corresponding keys in hparams.py.
Train the model and evaluate anytime with feedback_syn.ipynb
```
CUDA_VISIBLE_DEVICES=0 python fc_synthesizer_train.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multispeaker Speech Synthesis with Feedback Constraint from speaker verificaiton

Citation

Model Architecture

where the speaker embedding network is a ResNet-based network:

Training

Speaker verification model

TTS synthesizer (without feedback control)

Neural vocoder (WaveRNN)

TTS synthesizer with feedback constraint

Pretrained-models

References and Resources

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
datasets/vctk		datasets/vctk
deep_speaker		deep_speaker
feedback_synthesizer		feedback_synthesizer
synthesizer		synthesizer
utils		utils
vocoder		vocoder
.gitignore		.gitignore
LICENSE		LICENSE
Multi_TTS.png		Multi_TTS.png
README.md		README.md
fc_synthesizer_train.py		fc_synthesizer_train.py
feedback_syn.ipynb		feedback_syn.ipynb
process_audio.ipynb		process_audio.ipynb
requirements.txt		requirements.txt
resnet.png		resnet.png
syn.ipynb		syn.ipynb
synthesizer_train.py		synthesizer_train.py
vocoder_preprocess.py		vocoder_preprocess.py
vocoder_train.py		vocoder_train.py

License

caizexin/tf_multispeakerTTS_fc

Folders and files

Latest commit

History

Repository files navigation

Multispeaker Speech Synthesis with Feedback Constraint from speaker verificaiton

Citation

Model Architecture

where the speaker embedding network is a ResNet-based network:

Training

Speaker verification model

TTS synthesizer (without feedback control)

Neural vocoder (WaveRNN)

TTS synthesizer with feedback constraint

Pretrained-models

References and Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages