Extract framewise alignment information using CTC decoding (k2-fsa#39)

* Use new APIs with k2.RaggedTensor * Fix style issues. * Update the installation doc, saying it requires at least k2 v1.7 * Extract framewise alignment information using CTC decoding. * Print environment information. Print information about k2, lhotse, PyTorch, and icefall. * Fix CI. * Fix CI. * Compute framewise alignment information of the LibriSpeech dataset. * Update comments for the time to compute alignments of train-960. * Preserve cut id in mix cut transformer. * Minor fixes. * Add doc about how to extract framewise alignments.
yaozengwei · Oct 18, 2021 · 4890e27 · 4890e27
1 parent bd7c2f7
commit 4890e27
Show file tree

Hide file tree

Showing 18 changed files with 582 additions and 38 deletions.
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -46,10 +46,18 @@ jobs:
         with:
           python-version: ${{ matrix.python-version }}
 
+      - name: Install libnsdfile and libsox
+        if: startsWith(matrix.os, 'ubuntu')
+        run: |
+          sudo apt update
+          sudo apt install -q -y libsndfile1-dev libsndfile1 ffmpeg
+          sudo apt install -q -y --fix-missing sox libsox-dev libsox-fmt-all
+
       - name: Install Python dependencies
         run: |
           python3 -m pip install --upgrade pip pytest
           pip install k2==${{ matrix.k2-version }}+cpu.torch${{ matrix.torch }} -f https://k2-fsa.org/nightly/
+          pip install git+https://github.com/lhotse-speech/lhotse
           # icefall requirements
           pip install -r requirements.txt
 
@@ -88,4 +96,3 @@ jobs:
           # runt tests for conformer ctc
           cd egs/librispeech/ASR/conformer_ctc
           pytest
-
diff --git a/egs/librispeech/ASR/RESULTS.md b/egs/librispeech/ASR/RESULTS.md
@@ -38,14 +38,16 @@ python conformer_ctc/train.py --bucketing-sampler True \
                               --concatenate-cuts False \
                               --max-duration 200 \
                               --full-libri True \
-                              --world-size 4
+                              --world-size 4 \
+                              --lang-dir data/lang_bpe_5000
 
 python conformer_ctc/decode.py --nbest-scale 0.5 \
                                --epoch 34 \
                                --avg 20 \
                                --method attention-decoder \
                                --max-duration 20 \
-                               --num-paths 100
+                               --num-paths 100 \
+                               --lang-dir data/lang_bpe_5000
 ```
 
 ### LibriSpeech training results (Tdnn-Lstm)

diff --git a/egs/librispeech/ASR/conformer_ctc/README.md b/egs/librispeech/ASR/conformer_ctc/README.md
@@ -1,3 +1,53 @@
+## Introduction
+
 Please visit
 <https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html>
 for how to run this recipe.
+
+## How to compute framewise alignment information
+
+### Step 1: Train a model
+
+Please use `conformer_ctc/train.py` to train a model.
+See <https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html>
+for how to do it.
+
+### Step 2: Compute framewise alignment
+
+Run
+
+```
+# Choose a checkpoint and determine the number of checkpoints to average
+epoch=30
+avg=15
+./conformer_ctc/ali.py \
+  --epoch $epoch \
+  --avg $avg \
+  --max-duration 500 \
+  --bucketing-sampler 0 \
+  --full-libri 1 \
+  --exp-dir conformer_ctc/exp \
+  --lang-dir data/lang_bpe_5000 \
+  --ali-dir data/ali_5000
+```
+and  you will get four files inside the folder `data/ali_5000`:
+
+```
+$ ls -lh data/ali_500
+total 546M
+-rw-r--r-- 1 kuangfangjun root 1.1M Sep 28 08:06 test_clean.pt
+-rw-r--r-- 1 kuangfangjun root 1.1M Sep 28 08:07 test_other.pt
+-rw-r--r-- 1 kuangfangjun root 542M Sep 28 11:36 train-960.pt
+-rw-r--r-- 1 kuangfangjun root 2.1M Sep 28 11:38 valid.pt
+```
+
+**Note**: It can take more than 3 hours to compute the alignment
+for the training dataset, which contains 960 * 3 = 2880 hours of data.
+
+**Caution**: The model parameters in `conformer_ctc/ali.py` have to match those
+in `conformer_ctc/train.py`.
+
+**Caution**: You have to set the parameter `preserve_id` to `True` for `CutMix`.
+Search `./conformer_ctc/asr_datamodule.py` for `preserve_id`.
+
+**TODO:** Add doc about how to use the extracted alignment in the other pull-request.