Skip to content

Unofficial Pytorch Implementation of StyleSpeech

License

Notifications You must be signed in to change notification settings

hcy71o/StyleSpeech

Repository files navigation

StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Unofficial PyTorch Implementation of paper. Most of codes are based on Link

  1. LibriTTS dataset (train-clean-100 and train-clean-360) is used.
  2. Sampling rate is set to 22050Hz (default).

Prerequisites

  • Clone this repository.
  • Install python requirements. Please refer requirements.txt

Preparing

  1. Run
python prepare_align.py --data_path [LibriTTS DATAPATH]

for some preparations. (You can change the sampling rate by adding --resample_rate [SR])

  1. Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. 1-1. Download MFA following the command in the website. 1-2. Run the below codes
$ conda activate aligner
$ mfa model download acoustic english_mfa
$ mfa align ......LibriTTS/wav22 lexicon.txt english_us_arpa .........LibriTTS/Textgrid
  1. Run
python preprocess.py

(Check input&output data paths)

Training

python train.py --data_path [Preprocessed LibriTTS DATAPATH]

Inference

  1. Mel generation
python synthesize.py --checkpoint_path [CKPT PATH] --ref_audio [REF AUDIO PATH]
  1. Waveform generation (Use hifi-gan)
cd hifi-gan
python inference_e2e.py --checkpoint_file [VOCODER CKPT PATH]

About

Unofficial Pytorch Implementation of StyleSpeech

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages