From 81be46d594709e35bbd730c01dc5890e1bd7e2f0 Mon Sep 17 00:00:00 2001 From: keonlee9420 Date: Fri, 8 Oct 2021 23:18:45 +0900 Subject: [PATCH] add Relative Multi-Head Attention and unify masking --- README.md | 3 +- config/LJSpeech/train.yaml | 4 +- model/PortaSpeech.py | 6 +- model/blocks.py | 280 ++++++++++++--------------------- model/linguistic_encoder.py | 101 ++++-------- model/loss.py | 3 - model/variational_generator.py | 41 ++--- utils/tools.py | 8 +- 8 files changed, 156 insertions(+), 290 deletions(-) diff --git a/README.md b/README.md index 2088da0..e218c96 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ PyTorch Implementation of [PortaSpeech: Portable and High-Quality Generative Tex | Module | Normal | Small | Normal (paper) | Small (paper) | | :----- | :-----: | :-----: | :-----: | :-----: | | *Total* | 34.3M | 9.6M | 21.8M | 6.7M -| *LinguisticEncoder* | 14M | 3.4M | - | - +| *LinguisticEncoder* | 14M | 3.5M | - | - | *VariationalGenerator* | 11M | 2.8M | - | - | *FlowPostNet* | 9.3M | 3.4M | - | - @@ -122,7 +122,6 @@ to serve TensorBoard on your localhost. - For vocoder, **HiFi-GAN** and **MelGAN** are supported. - Add convolution layer and residual layer in **VariationalGenerator** to match the shape of conditioner and output. - No ReLU activation and LayerNorm in **VariationalGenerator** for convergence of word-to-phoneme alignment of **LinguisticEncoder**. -- Use absolute positional encoding in **LinguisticEncoder** instead of relative positional encoding. - Will be extended to a **multi-speaker TTS**.