-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'develop' into mimic-updates
- Loading branch information
Showing
49 changed files
with
5,544 additions
and
380 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,42 @@ | ||
# Speaker Diarization on AMI corpus | ||
This directory contains the scripts for speaker diarization on the AMI corpus (http://groups.inf.ed.ac.uk/ami/corpus/). | ||
|
||
# Extra requirements | ||
The code requires sklearn as an additional dependency. To install it, type: | ||
pip install sklearn | ||
|
||
# How to run | ||
python experiment.py hparams/ecapa_tdnn.yaml | ||
|
||
# Speaker Diarization using Deep Embedding and Spectral Clustering | ||
The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. | ||
You can also train the speaker embedding model from scratch using instructions in the same file. Use the following command to run diarization on AMI corpus. | ||
|
||
`python experiment.py hparams/xvectors.yaml` | ||
`python experiment.py hparams/ecapa_tdnn.yaml` | ||
|
||
# Performance Summary using Xvector model trained on VoxCeleb1+VoxCeleb2 dataset | ||
Xvectors : Dev = 4.34 % | Eval = 4.45 % | ||
ECAPA : Dev = 2.19 % | Eval = 2.74 % | ||
ECAPA_big: Dev = 2.16 % | Eval = 2.72 % | ||
|
||
# **About SpeechBrain** | ||
- Website: https://speechbrain.github.io/ | ||
- Code: https://github.com/speechbrain/speechbrain/ | ||
- HuggingFace: https://huggingface.co/speechbrain/ | ||
|
||
|
||
# **Citing SpeechBrain** | ||
Please, cite SpeechBrain if you use it for your research or business. | ||
|
||
```bibtex | ||
@misc{speechbrain, | ||
title={{SpeechBrain}: A General-Purpose Speech Toolkit}, | ||
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio}, | ||
year={2021}, | ||
eprint={2106.04624}, | ||
archivePrefix={arXiv}, | ||
primaryClass={eess.AS}, | ||
note={arXiv:2106.04624} | ||
} | ||
``` | ||
## Extra requirements | ||
The code requires sklearn as an additional dependency. | ||
To install it, type: `pip install sklearn` | ||
|
||
## How to run | ||
Use the following command to run diarization on AMI corpus. | ||
`python experiment.py hparams/ecapa_tdnn.yaml` or `python experiment.py hparams/xvectors.yaml` depending upon the model used. | ||
|
||
|
||
## Speaker Diarization using Deep Embedding and Spectral Clustering | ||
The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. You can also train the speaker embedding model from scratch using instructions in the same file. | ||
|
||
|
||
## Best performance in terms of Diarization Error Rate (DER). | ||
The forgiveness collar of 0.25 sec is used and overlaps are ignored while evaluation. | ||
|
||
| System | Mic. | Orcl. (Dev) | Orcl. (Eval) | Est. (Dev) | Est. (Eval) | ||
|----------- | ------------ | ------ |------| ------| ------ | | ||
| ECAPA-TDNN + SC | HeadsetMix | 2.02% | 1.78% | 2.43% | 4.03% | | ||
| ECAPA-TDNN + SC | LapelMix | 2.17% | 2.36% | 2.34% | 2.57% | | ||
| ECAPA-TDNN + SC | Array-1 | 2.95% | 2.75% | 3.07% | 3.30% | | ||
|
||
For the complete set of analyses, please refer to our paper given below. | ||
|
||
## Citation | ||
|
||
Paper Link: [ECAPA-TDNN Embeddings for Speaker Diarization](https://arxiv.org/pdf/2104.01466.pdf) | ||
|
||
If you find the code useful in your work, please cite: | ||
|
||
@misc{dawalatabad2021ecapatdnn, | ||
title={ECAPA-TDNN Embeddings for Speaker Diarization}, | ||
author={Nauman Dawalatabad and Mirco Ravanelli and Francois Grondin and Jenthe Thienpondt and Brecht Desplanques and Hwidong Na}, | ||
year={2021}, | ||
eprint={2104.01466}, | ||
archivePrefix={arXiv}, | ||
primaryClass={eess.AS}, | ||
note={arXiv:2104.01466} | ||
} |
Oops, something went wrong.