StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Unofficial PyTorch Implementation of paper. Most of codes are based on Link

LibriTTS dataset (train-clean-100 and train-clean-360) is used.
Sampling rate is set to 22050Hz (default).

Prerequisites

Clone this repository.
Install python requirements. Please refer requirements.txt

Preparing

Run

python prepare_align.py --data_path [LibriTTS DATAPATH]

for some preparations. (You can change the sampling rate by adding --resample_rate [SR])

Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. 1-1. Download MFA following the command in the website. 1-2. Run the below codes

$ conda activate aligner
$ mfa model download acoustic english_mfa
$ mfa align ......LibriTTS/wav22 lexicon.txt english_us_arpa .........LibriTTS/Textgrid

Run

python preprocess.py

(Check input&output data paths)

Training

python train.py --data_path [Preprocessed LibriTTS DATAPATH]

Inference

Mel generation

python synthesize.py --checkpoint_path [CKPT PATH] --ref_audio [REF AUDIO PATH]

Waveform generation (Use hifi-gan)

cd hifi-gan
python inference_e2e.py --checkpoint_file [VOCODER CKPT PATH]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
audio		audio
configs		configs
hifi-gan		hifi-gan
lexicon		lexicon
models		models
preprocessors		preprocessors
text		text
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
env.py		env.py
evaluate.py		evaluate.py
optimizer.py		optimizer.py
prepare_align.py		prepare_align.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
train.py		train.py
train_meta.py		train_meta.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Prerequisites

Preparing

Training

Inference

About

Releases

Packages

Languages

License

hcy71o/StyleSpeech

Folders and files

Latest commit

History

Repository files navigation

StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Prerequisites

Preparing

Training

Inference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages