Skip to content

đŸ” Matcha-TTS: A fast TTS architecture with conditional flow matching

License

Notifications You must be signed in to change notification settings

rmcpantoja/Matcha-TTS

Repository files navigation

đŸ” Matcha-TTS: A fast TTS architecture with conditional flow matching

python pytorch lightning hydra black isort

This is the official code implementation of đŸ” Matcha-TTS.

We propose đŸ” Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:

  • Is probabilistic
  • Has compact memory footprint
  • Sounds highly natural
  • Is very fast to synthesise from

Check out our demo page and read our arXiv preprint for more details.

Pre-trained models will be automatically downloaded with the CLI or gradio interface.

Try đŸ” Matcha-TTS on HuggingFace đŸ€— spaces!

Installation

  1. Create an environment (suggested but optional)
conda create -n matcha-tts python=3.10 -y
conda activate matcha-tts
  1. Install Matcha TTS using pip or from source
pip install matcha-tts

from source

pip install git+https://github.com/shivammehta25/Matcha-TTS.git
  1. Run CLI / gradio app / jupyter notebook
# This will download the required models
matcha-tts --text "<INPUT TEXT>"

or

matcha-tts-app

or open synthesis.ipynb on jupyter notebook

CLI Arguments

  • To synthesise from given text, run:
matcha-tts --text "<INPUT TEXT>"
  • To synthesise from a file, run:
matcha-tts --file <PATH TO FILE>
  • To batch synthesise from a file, run:
matcha-tts --file <PATH TO FILE> --batched

Additional arguments

  • Speaking rate
matcha-tts --text "<INPUT TEXT>" --speaking_rate 1.0
  • Sampling temperature
matcha-tts --text "<INPUT TEXT>" --temperature 0.667
  • Euler ODE solver steps
matcha-tts --text "<INPUT TEXT>" --steps 10

Train with your own dataset

Let's assume we are training with LJ Speech

  1. Download the dataset from here, extract it to data/LJSpeech-1.1, and prepare the file lists to point to the extracted data like for item 5 in the setup of the NVIDIA Tacotron 2 repo.

  2. Clone and enter the Matcha-TTS repository

git clone https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS
  1. Install the package from source
pip install -e .
  1. Go to configs/data/ljspeech.yaml and change
train_filelist_path: data/filelists/ljs_audio_text_train_filelist.txt
valid_filelist_path: data/filelists/ljs_audio_text_val_filelist.txt
  1. Generate normalisation statistics with the yaml file of dataset configuration
matcha-data-stats -i ljspeech.yaml
# Output:
#{'mel_mean': -5.53662231756592, 'mel_std': 2.1161014277038574}

Update these values in configs/data/ljspeech.yaml under data_statistics key.

data_statistics:  # Computed for ljspeech dataset
  mel_mean: -5.536622
  mel_std: 2.116101

to the paths of your train and validation filelists.

  1. Run the training script
make train-ljspeech

or

python matcha/train.py experiment=ljspeech
  • for a minimum memory run
python matcha/train.py experiment=ljspeech_min_memory
  • for multi-gpu training, run
python matcha/train.py experiment=ljspeech trainer.devices=[0,1]
  1. Synthesise from the custom trained model
matcha-tts --text "<INPUT TEXT>" --checkpoint_path <PATH TO CHECKPOINT>

Citation information

If you use our code or otherwise find this work useful, please cite our paper:

@article{mehta2023matcha,
  title={Matcha-TTS: A fast TTS architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  journal={arXiv preprint arXiv:2309.03199},
  year={2023}
}

Acknowledgements

Since this code uses Lightning-Hydra-Template, you have all the powers that come with it.

Other source code I would like to acknowledge:

  • Coqui-TTS: For helping me figure out how to make cython binaries pip installable and encouragement
  • Hugging Face Diffusers: For their awesome diffusers library and its components
  • Grad-TTS: For the monotonic alignment search source code
  • torchdyn: Useful for trying other ODE solvers during research and development
  • labml.ai: For the RoPE implementation

About

đŸ” Matcha-TTS: A fast TTS architecture with conditional flow matching

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 76.8%
  • Python 22.8%
  • Other 0.4%