A mini, simple, and fast end-to-end automatic speech recognition toolkit.
- Minimal Training ⏱
Self-supervised pre-trained models + minimal fine-tuning. - Simple and Flexible ⚙️
Easy to understand and customize. - Colab Compatible 🧪
Train your model directly on Google Colab.
- Preprocessing (
run_preprocess.py
)- Find all audio files and transcriptions.
- Generate vocabularies (character/word/subword/code-switched).
- Training (
run_asr.py
)- Dataset (
miniasr/data/dataset.py
)- Tokenizer for text data (
miniasr/data/text.py
)
- Tokenizer for text data (
- DataLoader (
miniasr/data/dataloader.py
) - Model (
miniasr/model/base_asr.py
)- Feature extractor
- Data augmentation
- End-to-end CTC ASR
- Dataset (
- Testing (
run_asr.py
)- CTC greedy/beam decoding
- Performance measures: error rates, RTF, latency
- Python 3.6+
- Install sox on your OS
- Install latest s3prl (at least
v0.4
)
git clone https://github.com/s3prl/s3prl.git
cd s3prl
pip install -e ./
cd ..
- Install via pip:
pip install -e ./
Additional libraries:
- flashlight: to decode with LM and beam search.
You can directly use pre-trained ASR models for any applications. (under construction 🚧)
from miniasr.utils import load_from_checkpoint
from miniasr.data.audio import load_waveform
# Option 1: Loading from a checkpoint
model, args, tokenizer = load_from_checkpoint('path/to/ckpt', 'cuda')
# Option 2: Loading from torch.hub (TODO)
model = torch.hub.load('vectominist/MiniASR', 'ctc_eng').to('cuda')
# Load waveforms and recognize!
waves = [load_waveform('path/to/waveform').to('cuda')]
hyps = model.recognize(waves)
- For already implemented corpora, please see
egs/
. - To customize your own dataset, please see
miniasr/preprocess
.
miniasr-preprocess
Options:
--corpus Corpus name.
--path Path to dataset.
--set Which subsets to be processed.
--out Output directory.
--gen-vocab Specify whether to generate vocabulary files.
--char-vocab-size Character vocabulary size.
--word-vocab-size Word vocabulary size.
--subword-vocab-size Subword vocabulary size.
--gen-subword Specify whether to generate subword vocabulary.
--subword-mode {unigram,bpe} Subword training mode.
--char-coverage Character coverage.
--seed SEED Set random seed.
--njobs Number of workers.
--log-file Logging file.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Logging level.
See examples in egs/
.
miniasr-asr
Options:
--config Training configuration file (.yaml).
--test Specify testing mode.
--ckpt Checkpoint for testing.
--test-name Specify testing results' name.
--cpu Using CPU only.
--seed Set random seed.
--njobs Number of workers.
--log-file Logging file.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Logging level.
torch.hub
support- Releasing pre-trained ASR models
- Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, Graves et al.
- Neural Machine Translation of Rare Words with Subword Units, Sennrich et al.
- HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, Hsu et al.
- SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition, Park et al.
@misc{chang2021miniasr,
title={{MiniASR}},
author={Chang, Heng-Jui},
year={2021},
url={https://github.com/vectominist/MiniASR}
}