Releases: keonlee9420/Comprehensive-Transformer-TTS
Releases · keonlee9420/Comprehensive-Transformer-TTS
v0.2.1
Fix and update codebase & pre-trained models with demo samples
- Fix variance adaptor to make it work with all combinations of building block and variance type/level
- Update pre-trained models with demo samples of LJSpeech and VCTK under "transformer_fs2" building block and "cwt" pitch conditioning
- Share the result of ablation studies of comparing "transformer" vs. "transformer_fs2" paired among three types of pitch conditioning ("frame", "ph", and "cwt")
v0.2.0
A lot of improvements with new features!
-
Prepare two different types of data pipeline in preprocessor to maximize unsupervised/supervised duration modelings
-
Adopt wavelet for pitch modeling & loss
-
Add fine-trained duration loss
-
Apply
var_start_steps
for better model convergence, especially under unsupervised duration modeling -
Remove dependency of energy modeling on pitch variance
-
Add "transformer_fs2" building block, which is more close to the original FastSpeech2 paper
-
Add two types of prosody modeling methods
-
Loss camparison on validation set:
- LJSpeech - blue: v0.1.1 / green: v0.2.0
- VCTK - skyblue: v0.1.1 / orange: v0.2.0
v0.1.1
multi-speaker aligner
v0.1.0
Initial commit