🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter

This is the official code implementation of 🍵 Matcha-TTS.

We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:

Is probabilistic
Has compact memory footprint
Sounds highly natural
Is very fast to synthesise from

Check out our demo page and read our arXiv preprint for more details.

Pre-trained models will be automatically downloaded with the CLI or gradio interface.

Try 🍵 Matcha-TTS on HuggingFace 🤗 spaces!

Installation

Create an environment (suggested but optional)

conda create -n matcha-tts python=3.10 -y
conda activate matcha-tts

Install Matcha TTS using pip or from source

pip install matcha-tts

from source

pip install git+https://github.com/shivammehta25/Matcha-TTS.git

Run CLI / gradio app / jupyter notebook

# This will download the required models
matcha-tts --text "<INPUT TEXT>"

or

matcha-tts-app

or open synthesis.ipynb on jupyter notebook

CLI Arguments

To synthesise from given text, run:

matcha-tts --text "<INPUT TEXT>"

To synthesise from a file, run:

matcha-tts --file <PATH TO FILE>

To batch synthesise from a file, run:

matcha-tts --file <PATH TO FILE> --batched

Additional arguments

Speaking rate

matcha-tts --text "<INPUT TEXT>" --speaking_rate 1.0

Sampling temperature

matcha-tts --text "<INPUT TEXT>" --temperature 0.667

Euler ODE solver steps

matcha-tts --text "<INPUT TEXT>" --steps 10

Train with your own dataset

Let's assume we are training with LJ Speech

Download the dataset from here, extract it to data/LJSpeech-1.1, and prepare the file lists to point to the extracted data like for item 5 in the setup of the NVIDIA Tacotron 2 repo.
Clone and enter the Matcha-TTS repository

git clone https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS

Install the package from source

pip install -e .

Go to configs/data/ljspeech.yaml and change

train_filelist_path: data/filelists/ljs_audio_text_train_filelist.txt
valid_filelist_path: data/filelists/ljs_audio_text_val_filelist.txt

Generate normalisation statistics with the yaml file of dataset configuration

matcha-data-stats -i ljspeech.yaml
# Output:
#{'mel_mean': -5.53662231756592, 'mel_std': 2.1161014277038574}

Update these values in configs/data/ljspeech.yaml under data_statistics key.

data_statistics:  # Computed for ljspeech dataset
  mel_mean: -5.536622
  mel_std: 2.116101

to the paths of your train and validation filelists.

Run the training script

make train-ljspeech

or

python matcha/train.py experiment=ljspeech

for a minimum memory run

python matcha/train.py experiment=ljspeech_min_memory

for multi-gpu training, run

python matcha/train.py experiment=ljspeech trainer.devices=[0,1]

Synthesise from the custom trained model

matcha-tts --text "<INPUT TEXT>" --checkpoint_path <PATH TO CHECKPOINT>

Citation information

If you use our code or otherwise find this work useful, please cite our paper:

@article{mehta2023matcha,
  title={Matcha-TTS: A fast TTS architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  journal={arXiv preprint arXiv:2309.03199},
  year={2023}
}

Acknowledgements

Since this code uses Lightning-Hydra-Template, you have all the powers that come with it.

Other source code I would like to acknowledge:

Coqui-TTS: For helping me figure out how to make cython binaries pip installable and encouragement
Hugging Face Diffusers: For their awesome diffusers library and its components
Grad-TTS: For the monotonic alignment search source code
torchdyn: Useful for trying other ODE solvers during research and development
labml.ai: For the RoPE implementation

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
configs		configs
matcha		matcha
notebooks		notebooks
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
.pylintrc		.pylintrc
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
data		data
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
synthesis.ipynb		synthesis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter

Installation

CLI Arguments

Train with your own dataset

Citation information

Acknowledgements

About

Releases

Packages

Languages

License

rmcpantoja/Matcha-TTS

Folders and files

Latest commit

History

Repository files navigation

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter

Installation

CLI Arguments

Train with your own dataset

Citation information

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages