Light Speed ⚡

Light Speed ⚡ is a modified VITS model that uses aligned phoneme durations.

FAQ

Q: How do I create training data?
A: See the ./prepare_ljs_tfdata.ipynb notebook for instructions on preparing the training data.

Q: How can I train the model with 1 GPU?
A: Run: python train.py

Q: How can I train the model with 4 GPUs?
A: Run: torchrun --standalone --nnodes=1 --nproc-per-node=4 train.py

Q: How can I train a model to predict phoneme durations?
A: See the ./train_duration_model.ipynb notebook.

Q: How can I generate speech with a trained model?
A: See the ./inference.ipynb notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attentions.py		attentions.py
commons.py		commons.py
config.json		config.json
flow.py		flow.py
inference.ipynb		inference.ipynb
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
net.svg		net.svg
prepare_ljs_tfdata.ipynb		prepare_ljs_tfdata.ipynb
tfloader.py		tfloader.py
train.py		train.py
train_duration_model.ipynb		train_duration_model.ipynb