Skip to content

Commit

Permalink
update pre-trained models with audio demos
Browse files Browse the repository at this point in the history
  • Loading branch information
keonlee9420 committed Feb 17, 2022
1 parent 814cdda commit 6c4fb0e
Show file tree
Hide file tree
Showing 156 changed files with 15 additions and 6 deletions.
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,9 +116,13 @@ tensorboard --logdir output/log
to serve TensorBoard on your localhost.
The loss curves, synthesized mel-spectrograms, and audios are shown.

<!-- ![](./img/tensorboard_loss.png)
## Normal Model
![](./img/tensorboard_loss.png)
![](./img/tensorboard_spec.png)
![](./img/tensorboard_audio.png) -->
![](./img/tensorboard_audio.png)

## Small Model Loss
![](./img/tensorboard_loss_small.png)

# Notes

Expand All @@ -129,12 +133,17 @@ The loss curves, synthesized mel-spectrograms, and audios are shown.
```yaml
# In the train.yaml
aligner:
helper_type: "ctc" # ["ctc", "dga", "none"]
helper_type: "dga" # ["dga", "ctc", "none"]
```
- "ctc": [Connectionist Temporal Classification (CTC)](https://dl.acm.org/doi/pdf/10.1145/1143844.1143891) Loss with forward-sum algorithm
- "dga": [Diagonal Guided Attention (DGA)](https://arxiv.org/abs/1710.08969) Loss
- The default setting is "ctc". If you set "none", no helper loss will be applied during training.
- "ctc": [Connectionist Temporal Classification (CTC)](https://dl.acm.org/doi/pdf/10.1145/1143844.1143891) Loss with forward-sum algorithm
- If you set "none", no helper loss will be applied during training.
- The alignments comparision of three methods ("dga", "ctc", and "none" from top to bottom):
![](./img/val_attn_step_125000_LJ040-0055_dga.png)
![](./img/val_attn_step_125000_LJ040-0055_ctc.png)
![](./img/val_attn_step_125000_LJ040-0055_none.png)
- The default setting is "dga". Although "ctc" makes the strongest alignment, the output quality and the accuracy are worse than "dga".
- But still, there is a room for the improvement of output quality. The audio quality and the alingment (accuracy) seem to be a trade-off.
- Will be extended to a **multi-speaker TTS**.
<!-- - Two options for embedding for the **multi-speaker TTS** setting: training speaker embedder from scratch or using a pre-trained [philipperemy's DeepSpeaker](https://github.com/philipperemy/deep-speaker) model (as [STYLER](https://github.com/keonlee9420/STYLER) did). You can toggle it by setting the config (between `'none'` and `'DeepSpeaker'`).
- DeepSpeaker on VCTK dataset shows clear identification among speakers. The following figure shows the T-SNE plot of extracted speaker embedding.
Expand Down
Binary file removed demo/LJSpeech/normal/100k/LJ001-0165.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ001-0165.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ001-0174.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ001-0174.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ002-0002.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ002-0002.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ011-0269.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ011-0269.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ017-0025.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ017-0025.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ028-0403.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ028-0403.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ045-0141.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/100k/LJ045-0141.wav
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ001-0165.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ001-0165.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ001-0174.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ001-0174.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ002-0002.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ002-0002.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ011-0269.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ011-0269.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ017-0025.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ017-0025.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ028-0403.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ028-0403.wav
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ045-0141.png
Binary file not shown.
Binary file removed demo/LJSpeech/normal/125k/LJ045-0141.wav
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/Hello!.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/LJSpeech/normal/200k/Hello!.wav
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/LJ001-0165.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/LJSpeech/normal/200k/LJ001-0165.wav
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/LJ001-0174.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/LJSpeech/normal/200k/LJ001-0174.wav
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/LJ002-0002.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/LJSpeech/normal/200k/LJ002-0002.wav
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/LJ011-0269.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/LJSpeech/normal/200k/LJ011-0269.wav
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/LJ017-0025.png
Binary file added demo/LJSpeech/normal/200k/LJ017-0025.wav
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/LJ028-0403.png
Binary file added demo/LJSpeech/normal/200k/LJ028-0403.wav
Binary file not shown.
Binary file added demo/LJSpeech/normal/200k/LJ045-0141.png
Binary file added demo/LJSpeech/normal/200k/LJ045-0141.wav
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed demo/LJSpeech/small/150k/LJ001-0165.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/150k/LJ001-0165.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/150k/LJ001-0174.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/150k/LJ001-0174.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/150k/LJ002-0002.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/150k/LJ002-0002.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/150k/LJ011-0269.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/150k/LJ011-0269.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/150k/LJ017-0025.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/150k/LJ017-0025.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/150k/LJ028-0403.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/150k/LJ028-0403.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/150k/LJ045-0141.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/150k/LJ045-0141.wav
Binary file not shown.
Diff not rendered.
Binary file not shown.
Diff not rendered.
Binary file not shown.
Diff not rendered.
Binary file not shown.
Binary file removed demo/LJSpeech/small/175k/LJ001-0165.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/175k/LJ001-0165.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/175k/LJ001-0174.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/175k/LJ001-0174.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/175k/LJ002-0002.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/175k/LJ002-0002.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/175k/LJ011-0269.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/175k/LJ011-0269.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/175k/LJ017-0025.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/175k/LJ017-0025.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/175k/LJ028-0403.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/175k/LJ028-0403.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/175k/LJ045-0141.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/175k/LJ045-0141.wav
Binary file not shown.
Diff not rendered.
Binary file not shown.
Diff not rendered.
Binary file not shown.
Diff not rendered.
Binary file not shown.
Binary file removed demo/LJSpeech/small/200k/LJ001-0165.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/200k/LJ001-0165.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/200k/LJ001-0174.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/200k/LJ001-0174.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/200k/LJ002-0002.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/200k/LJ002-0002.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/200k/LJ011-0269.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/200k/LJ011-0269.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/200k/LJ017-0025.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/200k/LJ017-0025.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/200k/LJ028-0403.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/200k/LJ028-0403.wav
Binary file not shown.
Binary file removed demo/LJSpeech/small/200k/LJ045-0141.png
Diff not rendered.
Binary file removed demo/LJSpeech/small/200k/LJ045-0141.wav
Binary file not shown.
Diff not rendered.
Binary file not shown.
Diff not rendered.
Binary file not shown.
Diff not rendered.
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/Hello!.png
Binary file added demo/LJSpeech/small/320k/Hello!.wav
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/LJ001-0165.png
Binary file added demo/LJSpeech/small/320k/LJ001-0165.wav
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/LJ001-0174.png
Binary file added demo/LJSpeech/small/320k/LJ001-0174.wav
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/LJ002-0002.png
Binary file added demo/LJSpeech/small/320k/LJ002-0002.wav
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/LJ011-0269.png
Binary file added demo/LJSpeech/small/320k/LJ011-0269.wav
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/LJ017-0025.png
Binary file added demo/LJSpeech/small/320k/LJ017-0025.wav
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/LJ028-0403.png
Binary file added demo/LJSpeech/small/320k/LJ028-0403.wav
Binary file not shown.
Binary file added demo/LJSpeech/small/320k/LJ045-0141.png
Binary file added demo/LJSpeech/small/320k/LJ045-0141.wav
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified img/tensorboard_audio.png
Binary file modified img/tensorboard_loss.png
Binary file added img/tensorboard_loss_small.png
Binary file modified img/tensorboard_spec.png
Binary file added img/val_attn_step_125000_LJ040-0055_ctc.png
Binary file added img/val_attn_step_125000_LJ040-0055_dga.png
Binary file added img/val_attn_step_125000_LJ040-0055_none.png

0 comments on commit 6c4fb0e

Please sign in to comment.