Skip to content

Commit

Permalink
Merge branch 'develop' into mimic-updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Peter Plantinga committed Apr 13, 2022
2 parents 2359725 + 507144a commit d1fc97f
Show file tree
Hide file tree
Showing 49 changed files with 5,544 additions and 380 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ SpeechBrain provides various useful tools to speed up and facilitate research on
- Multi-GPU training and inference with PyTorch Data-Parallel or Distributed Data-Parallel.
- Mixed-precision for faster training.
- A transparent and entirely customizable data input and output pipeline. SpeechBrain follows the PyTorch data loader and dataset style and enables users to customize the i/o pipelines (e.g adding on-the-fly downsampling, BPE tokenization, sorting, threshold ...).
- A nice integration of sharded data with WebDataset optimized for very large datasets on Nested File Systems (NFS).


### Speech recognition
Expand Down Expand Up @@ -70,7 +71,7 @@ The recipes released with speechbrain implement speech processing systems with c
| TIMIT | Speech Recognition | wav2vec2 + CTC/Att. | PER=8.04% (test) |
| CommonVoice (French) | Speech Recognition | wav2vec2 + CTC/Att. | WER=13.7% (test) |
| VoxCeleb2 | Speaker Verification | ECAPA-TDNN | EER=0.69% (vox1-test) |
| AMI | Speaker Diarization | ECAPA-TDNN | DER=2.13% (lapel-mix)|
| AMI | Speaker Diarization | ECAPA-TDNN | DER=3.01% (eval)|
| VoiceBank | Speech Enhancement | MetricGAN+| PESQ=3.08 (test)|
| WSJ2MIX | Speech Separation | SepFormer| SDRi=22.6 dB (test)|
| WSJ3MIX | Speech Separation | SepFormer| SDRi=20.0 dB (test)|
Expand Down
8 changes: 4 additions & 4 deletions recipes/AISHELL-1/ASR/transformer/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,17 @@ def compute_forward(self, batch, stage):
feats = self.hparams.augmentation(feats)

# forward modules
src = self.hparams.CNN(feats)
enc_out, pred = self.hparams.Transformer(
src = self.modules.CNN(feats)
enc_out, pred = self.modules.Transformer(
src, tokens_bos, wav_lens, pad_idx=self.hparams.pad_index
)

# output layer for ctc log-probabilities
logits = self.hparams.ctc_lin(enc_out)
logits = self.modules.ctc_lin(enc_out)
p_ctc = self.hparams.log_softmax(logits)

# output layer for seq2seq log-probabilities
pred = self.hparams.seq_lin(pred)
pred = self.modules.seq_lin(pred)
p_seq = self.hparams.log_softmax(pred)

# Compute outputs
Expand Down
78 changes: 39 additions & 39 deletions recipes/AMI/Diarization/README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,42 @@
# Speaker Diarization on AMI corpus
This directory contains the scripts for speaker diarization on the AMI corpus (http://groups.inf.ed.ac.uk/ami/corpus/).

# Extra requirements
The code requires sklearn as an additional dependency. To install it, type:
pip install sklearn

# How to run
python experiment.py hparams/ecapa_tdnn.yaml

# Speaker Diarization using Deep Embedding and Spectral Clustering
The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded.
You can also train the speaker embedding model from scratch using instructions in the same file. Use the following command to run diarization on AMI corpus.

`python experiment.py hparams/xvectors.yaml`
`python experiment.py hparams/ecapa_tdnn.yaml`

# Performance Summary using Xvector model trained on VoxCeleb1+VoxCeleb2 dataset
Xvectors : Dev = 4.34 % | Eval = 4.45 %
ECAPA : Dev = 2.19 % | Eval = 2.74 %
ECAPA_big: Dev = 2.16 % | Eval = 2.72 %

# **About SpeechBrain**
- Website: https://speechbrain.github.io/
- Code: https://github.com/speechbrain/speechbrain/
- HuggingFace: https://huggingface.co/speechbrain/


# **Citing SpeechBrain**
Please, cite SpeechBrain if you use it for your research or business.

```bibtex
@misc{speechbrain,
title={{SpeechBrain}: A General-Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2106.04624}
}
```
## Extra requirements
The code requires sklearn as an additional dependency.
To install it, type: `pip install sklearn`

## How to run
Use the following command to run diarization on AMI corpus.
`python experiment.py hparams/ecapa_tdnn.yaml` or `python experiment.py hparams/xvectors.yaml` depending upon the model used.


## Speaker Diarization using Deep Embedding and Spectral Clustering
The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. You can also train the speaker embedding model from scratch using instructions in the same file.


## Best performance in terms of Diarization Error Rate (DER).
The forgiveness collar of 0.25 sec is used and overlaps are ignored while evaluation.

| System | Mic. | Orcl. (Dev) | Orcl. (Eval) | Est. (Dev) | Est. (Eval)
|----------- | ------------ | ------ |------| ------| ------ |
| ECAPA-TDNN + SC | HeadsetMix | 2.02% | 1.78% | 2.43% | 4.03% |
| ECAPA-TDNN + SC | LapelMix | 2.17% | 2.36% | 2.34% | 2.57% |
| ECAPA-TDNN + SC | Array-1 | 2.95% | 2.75% | 3.07% | 3.30% |

For the complete set of analyses, please refer to our paper given below.

## Citation

Paper Link: [ECAPA-TDNN Embeddings for Speaker Diarization](https://arxiv.org/pdf/2104.01466.pdf)

If you find the code useful in your work, please cite:

@misc{dawalatabad2021ecapatdnn,
title={ECAPA-TDNN Embeddings for Speaker Diarization},
author={Nauman Dawalatabad and Mirco Ravanelli and Francois Grondin and Jenthe Thienpondt and Brecht Desplanques and Hwidong Na},
year={2021},
eprint={2104.01466},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2104.01466}
}
Loading

0 comments on commit d1fc97f

Please sign in to comment.