evalutate_wav2vec2

Evaluating fairseq's wav2vec2.0

Using the simple CTC decoder, we can reproduce the number in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.

Prerequisite

fairseq
SCTK
A finetuned model file and a dictionary file downloaded from the example page.
LibriSpeech test-clean dataset from OpenSLR.

Steps

The following command will run the inference on the LibriSpeech test-clean dataset and generate ref.trn and hyp.trn file.

output_dir=<path to your output directory>
python evaluate_wav2vec2_librispeech.py \
    --root-dir <path to /LibriSpeech/test-clean/> \
    --output-dir "${output_dir}" \
    --model-file <path to model file> \
    --dict-file <path to `dict.ltr.txt` file> \
    --num-threads 8

Then you can use sclite to get the WER.

sclite -r "${output_dir}/ref.trn" -h "${output_dir}/hyp.trn" -i wsj -o pralign -o sum

The SYS file contains the WER.

                     SYSTEM SUMMARY PERCENTAGES by SPEAKER

      ,-----------------------------------------------------------------.
      |                            /hyp.trn                             |
      |-----------------------------------------------------------------|
      | SPKR   | # Snt  # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
      |--------+--------------+-----------------------------------------|
      ...
      
      |=================================================================|
      | Sum/Avg| 2620   52576 | 96.9    2.8    0.3    0.3    3.4   38.2 |
      |=================================================================|
      |  Mean  | 67.2  1348.1 | 96.9    2.8    0.3    0.3    3.4   39.6 |
      |  S.D.  | 20.7  291.9  |  1.2    1.1    0.2    0.2    1.3   13.9 |
      | Median | 63.0  1296.0 | 96.9    2.8    0.2    0.3    3.2   37.5 |
      `-----------------------------------------------------------------'

The PRA file contains the comparison between hypothesis and refrences.

id: (237-134500-0005)
Scores: (#C #S #D #I) 8 1 0 0
REF:  oh but i'm glad to get this place MOWED
HYP:  oh but i'm glad to get this place MOULD
Eval:                                   S

Using wav2vec_small_960h.pt (Wav2Vec 2.0 Base finetuned with 960 hours Librispeech), we get WER 3.1, which is close to 3.4 in the paper.

Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
README.md		README.md
evaluate_wav2vec2_librispeech.py		evaluate_wav2vec2_librispeech.py
run.sh		run.sh
sbatch.sh		sbatch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evalutate_wav2vec2

evalutate_wav2vec2

README.md

Evaluating fairseq's wav2vec2.0

Prerequisite

Steps

Files

evalutate_wav2vec2

Directory actions

More options

Directory actions

More options

Latest commit

History

evalutate_wav2vec2

Folders and files

parent directory

README.md

Evaluating fairseq's wav2vec2.0

Prerequisite

Steps