Merge branch 'develop' into mimic-updates

speechbrain · Apr 13, 2022 · d1fc97f · d1fc97f
2 parents 2359725 + 507144a
commit d1fc97f
Show file tree

Hide file tree

Showing 49 changed files with 5,544 additions and 380 deletions.
diff --git a/README.md b/README.md
@@ -21,6 +21,7 @@ SpeechBrain provides various useful tools to speed up and facilitate research on
 - Multi-GPU training and inference with PyTorch Data-Parallel or Distributed Data-Parallel.
 - Mixed-precision for faster training.
 - A transparent and entirely customizable data input and output pipeline. SpeechBrain follows the PyTorch data loader and dataset style and enables users to customize the i/o pipelines (e.g adding on-the-fly downsampling, BPE tokenization, sorting, threshold ...).
+- A nice integration of sharded data with WebDataset optimized for very large datasets on Nested File Systems (NFS).
 
 
 ### Speech recognition
@@ -70,7 +71,7 @@ The recipes released with speechbrain implement speech processing systems with c
 | TIMIT      | Speech Recognition | wav2vec2 + CTC/Att. | PER=8.04% (test) |
 | CommonVoice (French) | Speech Recognition | wav2vec2 + CTC/Att. | WER=13.7% (test) |
 | VoxCeleb2      | Speaker Verification | ECAPA-TDNN | EER=0.69% (vox1-test) |
-| AMI      | Speaker Diarization | ECAPA-TDNN | DER=2.13% (lapel-mix)|
+| AMI      | Speaker Diarization | ECAPA-TDNN | DER=3.01% (eval)|
 | VoiceBank      | Speech Enhancement | MetricGAN+| PESQ=3.08 (test)|
 | WSJ2MIX      | Speech Separation | SepFormer| SDRi=22.6 dB (test)|
 | WSJ3MIX      | Speech Separation | SepFormer| SDRi=20.0 dB (test)|

diff --git a/recipes/AISHELL-1/ASR/transformer/train.py b/recipes/AISHELL-1/ASR/transformer/train.py
@@ -41,17 +41,17 @@ def compute_forward(self, batch, stage):
                 feats = self.hparams.augmentation(feats)
 
         # forward modules
-        src = self.hparams.CNN(feats)
-        enc_out, pred = self.hparams.Transformer(
+        src = self.modules.CNN(feats)
+        enc_out, pred = self.modules.Transformer(
             src, tokens_bos, wav_lens, pad_idx=self.hparams.pad_index
         )
 
         # output layer for ctc log-probabilities
-        logits = self.hparams.ctc_lin(enc_out)
+        logits = self.modules.ctc_lin(enc_out)
         p_ctc = self.hparams.log_softmax(logits)
 
         # output layer for seq2seq log-probabilities
-        pred = self.hparams.seq_lin(pred)
+        pred = self.modules.seq_lin(pred)
         p_seq = self.hparams.log_softmax(pred)
 
         # Compute outputs

diff --git a/recipes/AMI/Diarization/README.md b/recipes/AMI/Diarization/README.md
@@ -1,42 +1,42 @@
 # Speaker Diarization on AMI corpus
 This directory contains the scripts for speaker diarization on the AMI corpus (http://groups.inf.ed.ac.uk/ami/corpus/).
 
-# Extra requirements
-The code requires sklearn as an additional dependency. To install it, type:
-pip install sklearn
-
-# How to run
-python experiment.py hparams/ecapa_tdnn.yaml
-
-# Speaker Diarization using Deep Embedding and Spectral Clustering
-The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded.
-You can also train the speaker embedding model from scratch using instructions in the same file. Use the following command to run diarization on AMI corpus.
-
-`python experiment.py hparams/xvectors.yaml`
-`python experiment.py hparams/ecapa_tdnn.yaml`
-
-# Performance Summary using Xvector model trained on VoxCeleb1+VoxCeleb2 dataset
-Xvectors : Dev = 4.34 % | Eval = 4.45 %
-ECAPA   :  Dev = 2.19 % | Eval = 2.74 %
-ECAPA_big: Dev = 2.16 % | Eval = 2.72 %
-
-# **About SpeechBrain**
-- Website: https://speechbrain.github.io/
-- Code: https://github.com/speechbrain/speechbrain/
-- HuggingFace: https://huggingface.co/speechbrain/
-
-
-# **Citing SpeechBrain**
-Please, cite SpeechBrain if you use it for your research or business.
-
-```bibtex
-@misc{speechbrain,
-  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
-  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
-  year={2021},
-  eprint={2106.04624},
-  archivePrefix={arXiv},
-  primaryClass={eess.AS},
-  note={arXiv:2106.04624}
-}
-```
+## Extra requirements
+The code requires sklearn as an additional dependency. 
+To install it, type: `pip install sklearn`
+
+## How to run
+Use the following command to run diarization on AMI corpus.
+`python experiment.py hparams/ecapa_tdnn.yaml` or `python experiment.py hparams/xvectors.yaml` depending upon the model used.
+
+
+## Speaker Diarization using Deep Embedding and Spectral Clustering
+The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. You can also train the speaker embedding model from scratch using instructions in the same file. 
+
+
+## Best performance in terms of Diarization Error Rate (DER).
+The forgiveness collar of 0.25 sec is used and overlaps are ignored while evaluation.
+
+| System | Mic. | Orcl. (Dev) | Orcl. (Eval) | Est. (Dev) | Est. (Eval)
+|----------- | ------------ | ------ |------| ------| ------ |
+| ECAPA-TDNN + SC | HeadsetMix | 2.02% | 1.78% | 2.43% | 4.03% |
+| ECAPA-TDNN + SC | LapelMix | 2.17% | 2.36% | 2.34% | 2.57% |
+| ECAPA-TDNN + SC | Array-1 | 2.95% | 2.75% | 3.07% | 3.30% |
+
+For the complete set of analyses, please refer to our paper given below.
+
+## Citation
+
+Paper Link: [ECAPA-TDNN Embeddings for Speaker Diarization](https://arxiv.org/pdf/2104.01466.pdf)
+
+If you find the code useful in your work, please cite:
+
+    @misc{dawalatabad2021ecapatdnn,
+          title={ECAPA-TDNN Embeddings for Speaker Diarization},
+          author={Nauman Dawalatabad and Mirco Ravanelli and Francois Grondin and Jenthe Thienpondt and Brecht Desplanques and Hwidong Na},
+            year={2021},
+          eprint={2104.01466},
+          archivePrefix={arXiv},
+          primaryClass={eess.AS},
+          note={arXiv:2104.01466}
+    }