Skip to content

A modified VITS that utilizes phoneme duration's ground truth for better robustness

License

Notifications You must be signed in to change notification settings

NTT123/light-speed

Repository files navigation

Light Speed ⚡

Light Speed ⚡ is a modified VITS model that uses aligned phoneme durations.

network diagram

FAQ

Q: How do I create training data?
A: See the ./prepare_ljs_tfdata.ipynb notebook for instructions on preparing the training data.

Q: How can I train the model with 1 GPU?
A: Run: python train.py

Q: How can I train the model with 4 GPUs?
A: Run: torchrun --standalone --nnodes=1 --nproc-per-node=4 train.py

Q: How can I train a model to predict phoneme durations?
A: See the ./train_duration_model.ipynb notebook.

Q: How can I generate speech with a trained model?
A: See the ./inference.ipynb notebook.

About

A modified VITS that utilizes phoneme duration's ground truth for better robustness

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published