wav2letter++ is a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition. It is written entirely in C++ and uses the ArrayFire tensor library and the flashlight (use its branch v0.2) machine learning library for maximum efficiency. Our approach is detailed in this arXiv paper.
This repository also contains pre-trained models and implementations for various ASR results including:
- [NEW] Pratap et al. (2020): Scaling Online Speech Recognition Using ConvNets
- [NEW SOTA] Synnaeve et al. (2019): End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
- Kahn et al. (2019): Self-Training for End-to-End Speech Recognition
- Likhomanenko et al. (2019): Who Needs Words? Lexicon-free Speech Recognition
- Hannun et al. (2019): Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
The previous iteration of wav2letter (written in Lua) can be found in the wav2letter-lua
branch.
All details and documentation can be found on the wiki.
To get started with wav2letter++, checkout the tutorials section.
We also provide complete recipes for WSJ, Timit and Librispeech and they can be found in recipes folder.
Finally, we provide Python bindings for a subset of wav2letter++ (featurization, decoder, and ASG criterion) and a standalone inference framework for running online ASR.
If you use the code in your paper, then please cite it as:
@article{pratap2018w2l,
author = {Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert},
title = {wav2letter++: The Fastest Open-source Speech Recognition System},
journal = {CoRR},
volume = {abs/1812.07625},
year = {2018},
url = {https://arxiv.org/abs/1812.07625},
}
- Facebook page: https://www.facebook.com/groups/717232008481207/
- Google group: https://groups.google.com/forum/#!forum/wav2letter-users
- Contact: vineelkpratap@fb.com, awni@fb.com, qiantong@fb.com, jcai@fb.com, jacobkahn@fb.com, gab@fb.com, vitaliy888@fb.com, locronan@fb.com
See the CONTRIBUTING file for how to help out.
wav2letter++ is BSD-licensed, as found in the LICENSE file.