Skip to content

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

License

Notifications You must be signed in to change notification settings

NATSpeech/NATSpeech

Repository files navigation



NATSpeech: A Non-Autoregressive Text-to-Speech Framework

This repo contains official PyTorch implementation of:

Key Features

We implement the following features in this framework:

  • Data processing for non-autoregressive Text-to-Speech using Montreal Forced Aligner.
  • Convenient and scalable framework for training and inference.
  • Simple but efficient random-access dataset implementation.

Install Dependencies

## We tested on Linux/Ubuntu 18.04. 
## Install Python 3.6+ first (Anaconda recommended).

export PYTHONPATH=.
# build a virtual env (recommended).
python -m venv venv
source venv/bin/activate
# install requirements.
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # torch >= 1.9.0 recommended
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install forced alignment tool

Documents

Citation

If you find this useful for your research, please cite the following papers:

  • PortaSpeech
@article{ren2021portaspeech,
  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
  • DiffSpeech
@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}
 }

Acknowledgments

Our codes are influenced by the following repos:

License and Agreement

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

About

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Contributors 4

  •  
  •  
  •  
  •