Skip to content

Commit

Permalink
Apply pre-commit all
Browse files Browse the repository at this point in the history
  • Loading branch information
kimdwkimdw committed Nov 22, 2021
1 parent 5c8ff86 commit 0570ead
Show file tree
Hide file tree
Showing 29 changed files with 75 additions and 75 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ For more details, take a look into the corresponding implementation in recipes/d
Beyond providing recipes for training the models from scratch, SpeechBrain shares several pre-trained models (coupled with easy-inference functions) on [HuggingFace](https://huggingface.co/speechbrain). In the following, we report some of them:

| Task | Dataset | Model |
| ------------- |:-------------:| -----:|
| ------------- |:-------------:| -----:|
| Speech Recognition | LibriSpeech | [CNN + Transformer](https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech) |
| Speech Recognition | LibriSpeech | [CRDNN](https://huggingface.co/speechbrain/asr-crdnn-transformerlm-librispeech) |
| Speech Recognition | CommonVoice(English) | [wav2vec + CTC](https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-en) |
Expand Down
2 changes: 1 addition & 1 deletion docs/docs-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
better-apidoc>=0.3.1
ctc-segmentation>=1.7.0
numba>=0.54.1
recommonmark>=0.7.1
six
sphinx-rtd-theme>=0.4.3
Sphinx>=3.4.3
ctc-segmentation>=1.7.0
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Referencing SpeechBrain
.. code-block:: txt
@misc{speechbrain,
title={SpeechBrain: A General-Purpose Speech Toolkit},
title={SpeechBrain: A General-Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
Expand Down
8 changes: 4 additions & 4 deletions docs/multigpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Important: the batch size for each GPU process will be: `batch_size / Number of

## Multi-GPU training using Distributed Data Parallel (DDP)

DDP implements data parallelism on different processes. This way, the GPUs do not necessarily have to be in the same server. This solution is much more flexible. However, the training routines must be written considering multi-threading.
DDP implements data parallelism on different processes. This way, the GPUs do not necessarily have to be in the same server. This solution is much more flexible. However, the training routines must be written considering multi-threading.

With SpeechBrain, we put several efforts to make sure the code is compliant with DDP. For instance, to avoid conflicts across processes we develop the `run_on_main` function. It is called when critical operations such as writing a file on disk are performed. It ensures that these operations are run in a single process only. The other processes are waiting until this operation is completed.

Expand All @@ -46,7 +46,7 @@ cd recipes/<dataset>/<task>/
python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr machine_1_adress --master_port 5555 experiment.py hyperparams.yaml --distributed_launch --distributed_backend='nccl'
```

In this case, Machine 1 will have 2 subprocesses (subprocess1: with local_rank=0, rank=0, and subprocess2: with local_rank=1, rank=1). Machine 2 will have 2 subprocess (subprocess1: with local_rank=0, rank=2, and subprocess2: with local_rank=1, rank=3).
In this case, Machine 1 will have 2 subprocesses (subprocess1: with local_rank=0, rank=0, and subprocess2: with local_rank=1, rank=1). Machine 2 will have 2 subprocess (subprocess1: with local_rank=0, rank=2, and subprocess2: with local_rank=1, rank=3).

In practice, using `torch.distributed.launch` ensures that the right environment variables are set (`local_rank` and `rank`), so you don't have to bother about it.

Expand All @@ -68,7 +68,7 @@ Now, let's try to scale this up a bit with a resource manager like SLURM. Here,
cd ${SLURM_SUBMIT_DIR}

# And we call the srun that will run --ntasks-per-node times (once here) per node
srun srun_script.sh
srun srun_script.sh
```

```shell
Expand All @@ -90,6 +90,6 @@ MASTER=`echo $LISTNODES | cut -d" " -f1`
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${SLURM_JOB_NUM_NODES} --node_rank=${SLURM_NODEID} --master_addr=${MASTER} --master_port=5555 train.py hparams/myrecipe.yaml
```

Note that using DDP on different machines introduces a **communication overhead** that might slow down training (depending on how fast is the connection across the different machines).
Note that using DDP on different machines introduces a **communication overhead** that might slow down training (depending on how fast is the connection across the different machines).

We would like to advise our users that despite being more efficient, DDP is also more prone to exhibit unexpected bugs. Indeed, DDP is quite server-dependent and some setups might generate errors with the PyTorch implementation of DDP. The future version of pytorch will improve the stability of DDP.
4 changes: 2 additions & 2 deletions recipes/AISHELL-1/ASR/seq2seq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ cd ../../Tokenizer
python train.py hparams/tokenizer_bpe5000.yaml --data_folder=/localscratch/aishell/
```
If not present in the specified data_folder, the dataset will be automatically downloaded there.
This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not
specify a different tokenizer in the speech recognition recipe.
This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not
specify a different tokenizer in the speech recognition recipe.

2- Train the speech recognizer
```
Expand Down
6 changes: 3 additions & 3 deletions recipes/AISHELL-1/ASR/transformer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ cd ../../Tokenizer
python train.py hparams/train_transformer_tokenizer_bpe5000.yaml --data_folder=/localscratch/aishell/
```
If not present in the specified data_folder, the dataset will be automatically downloaded there.
This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not
specify a different tokenizer in the speech recognition recipe.
This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not
specify a different tokenizer in the speech recognition recipe.

2- Train the speech recognizer
```
Expand Down Expand Up @@ -39,7 +39,7 @@ and about 5 hours minutes on a NVIDIA V100 (32GB) for rain_ASR_transformer_with_
You can find the pre-trained model with an easy-inference function on HuggingFace
- https://huggingface.co/speechbrain/asr-transformer-aishell
- https://huggingface.co/speechbrain/asr-wav2vec2-transformer-aishell


# **About SpeechBrain**
- Website: https://speechbrain.github.io/
Expand Down
4 changes: 2 additions & 2 deletions recipes/AMI/Diarization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
This directory contains the scripts for speaker diarization on the AMI corpus (http://groups.inf.ed.ac.uk/ami/corpus/).

## Extra requirements
The code requires sklearn as an additional dependency.
The code requires sklearn as an additional dependency.
To install it, type: `pip install sklearn`

## How to run
Expand All @@ -11,7 +11,7 @@ Use the following command to run diarization on AMI corpus.


## Speaker Diarization using Deep Embedding and Spectral Clustering
The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. You can also train the speaker embedding model from scratch using instructions in the same file.
The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. You can also train the speaker embedding model from scratch using instructions in the same file.


## Best performance in terms of Diarization Error Rate (DER).
Expand Down
2 changes: 1 addition & 1 deletion recipes/CommonLanguage/lang_id/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ print(text_lab)


**Web Demo** Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo Audio Classification: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/Speechbrain-audio-classification)

# **About SpeechBrain**
- Website: https://speechbrain.github.io/
- Code: https://github.com/speechbrain/speechbrain/
Expand Down
4 changes: 2 additions & 2 deletions recipes/DNS/enhance/spectral_map/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ This folder contains the scripts to train a speech enhancement system with spect
You can download the dataset from here: https://github.com/microsoft/DNS-Challenge

# How to run
python train.py train/params_CNNTransformer.yaml
python train.py train/params_CNN.yaml
python train.py train/params_CNNTransformer.yaml
python train.py train/params_CNN.yaml

# Results
| Release | hyperparams file | STOI | PESQ | Model link | GPUs |
Expand Down
18 changes: 9 additions & 9 deletions recipes/ERPCore/P3_decoding/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
# P300 decoding from single EEG trials using ERP CORE dataset
# Task description
The P300 is an attention-dependant response occurring when infrequent stimuli are presented to the user immersed into a sequence of more frequent background stimuli.
This response peaks between 300-500 ms after the infrequent stimulus onset and is mostly distributed on the parietal area. Due to the low signal-to-noise ratio of the electroencephalogram (EEG), P300 only emerges after an averaging procedure of EEG signals across several responses to stimuli (i.e., EEG trials) and across subjects.
Therefore, the decoding of the P300 event at the level of every single trial is a very challenging task.
The P300 is an attention-dependant response occurring when infrequent stimuli are presented to the user immersed into a sequence of more frequent background stimuli.
This response peaks between 300-500 ms after the infrequent stimulus onset and is mostly distributed on the parietal area. Due to the low signal-to-noise ratio of the electroencephalogram (EEG), P300 only emerges after an averaging procedure of EEG signals across several responses to stimuli (i.e., EEG trials) and across subjects.
Therefore, the decoding of the P300 event at the level of every single trial is a very challenging task.
P300 not only is of particular relevance as a control signal to guide Brain-Computer Interfaces (e.g., P300 spellers) but also as a biomarker in psychiatric disorders (e.g., schizophrenia, depression, etc.).

This folder contains the scripts to train a P300 decoder with EEG signals collected in the ERP CORE dataset using a compact convolutional neural network based on EEGNet.

ERP CORE is an open collection of event-related potentials available at: https://osf.io/thsqg/

The objective decoding task is the classification of the absence vs. presence of the P300 event (binary classification) from single EEG trials (i.e., single-trial P300 decoding) using signals from each subject separately.
This is necessary due to the high subject-to-subject variability in the EEG.
The objective decoding task is the classification of the absence vs. presence of the P300 event (binary classification) from single EEG trials (i.e., single-trial P300 decoding) using signals from each subject separately.
This is necessary due to the high subject-to-subject variability in the EEG.
Therefore, subject-specific decoders are trained and due to the resulting compact dataset (consisting of 200 EEG trials per subject using a 10-fold cross-validation scheme is adopted).

# How to run
Before running an experiment, make sure the extra-dependencies reported in the file `extra_requirements.txt` are installed in your environment.
Note that this code requires mne==0.22.1.

Download the dataset with: \
\>>> python download_required_data.py --data_folder /path/to/ERPCore_P3
\>>> python download_required_data.py --data_folder /path/to/ERPCore_P3

Perform training on a subject (e.g., subject ID 4='sub-004'): \
\>>> python train.py train.yaml --sbj_id 'sub-004' --data_folder '/path/to/ERPCore_P3'
Expand All @@ -38,10 +38,10 @@ done


# Results
For each subject-specific decoder and within each fold, AUROCs and F1 scores were computed on the test set.
These metrics are stored in the pickle file "metrics.pkl" (containing a ndarray with loss, F1 and AUROC for each fold within each row, with this order).
For each subject-specific decoder and within each fold, AUROCs and F1 scores were computed on the test set.
These metrics are stored in the pickle file "metrics.pkl" (containing a ndarray with loss, F1 and AUROC for each fold within each row, with this order).

Performance metrics were averaged across folds (subject-level metrics).
Performance metrics were averaged across folds (subject-level metrics).
In the following table, the subject-level metrics are reported (mean ± standard error of the mean across subjects).

| Release | Hyperparams file | Test F1 score | Test AUROCs | GPUs |
Expand Down
2 changes: 1 addition & 1 deletion recipes/ERPCore/P3_decoding/extra_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sklearn
mne==0.22.1
sklearn
2 changes: 1 addition & 1 deletion recipes/LibriSpeech/G2P/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Grapheme-to-phoneme (G2P).
This folder contains the scripts to train a grapheme-to-phoneme system
that converts characters in input to phonemes in output. It used the
that converts characters in input to phonemes in output. It used the
lexicon of the LibriSpeech dataset

You can download LibriSpeech at http://www.openslr.org/12
Expand Down
4 changes: 2 additions & 2 deletions recipes/LibriSpeech/LM/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Language Model with LibriSpeech
This folder contains recipes for training language models for the LibriSpeech Dataset.
It supports both an RNN-based LM and a Transformer-based LM.
It supports both an RNN-based LM and a Transformer-based LM.
The scripts rely on the HuggingFace dataset, which manages data reading and loading from
large text corpora.
large text corpora.

You can download LibriSpeech at http://www.openslr.org/12

Expand Down
2 changes: 1 addition & 1 deletion recipes/LibriSpeech/Tokenizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ You can download LibriSpeech at http://www.openslr.org/12


# How to run
python train.py train/1K_unigram_subword_bpe.yaml
python train.py train/1K_unigram_subword_bpe.yaml
python train.py train/5K_unigram_subword_bpe.yaml


Expand Down
2 changes: 1 addition & 1 deletion recipes/TIMIT/Alignment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ python train.py train/train.yaml
# Results

| Release | hyperparams file | Test Accuracy | Model link | GPUs |
|:-------------:|:---------------------------:| -----:| -----:| --------:|
|:-------------:|:---------------------------:| -----:| -----:| --------:|
| 20-05-22 | train.yaml | 79.55 | [model](https://drive.google.com/drive/folders/1fXu7JAVUYxZLosH05iBTEPrJyVSCjNRi?usp=sharing) | 1xV100 32GB |


Expand Down
14 changes: 7 additions & 7 deletions recipes/UrbanSound8k/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## UrbanSound8k multi-class audio classification

[This recipe and description has been adapted from the SpeechBrain "VoxCeleb" recipe example]
[This recipe and description has been adapted from the SpeechBrain "VoxCeleb" recipe example]

This recipe contains scripts for multi-class audio classification experiments with the UrbanSound8k dataset (https://urbansounddataset.weebly.com/urbansound8k.html). While publicly available, a request must be made before a download link for the dataset will be provided by the authors (https://urbansounddataset.weebly.com/download-urbansound8k.html).
This recipe contains scripts for multi-class audio classification experiments with the UrbanSound8k dataset (https://urbansounddataset.weebly.com/urbansound8k.html). While publicly available, a request must be made before a download link for the dataset will be provided by the authors (https://urbansounddataset.weebly.com/download-urbansound8k.html).

UrbanSound8k is divided into 10 classes, one of which (engine_idling) receives special attention in our experiments below.

Expand Down Expand Up @@ -59,7 +59,7 @@ Note that the results for 10-fold must be compiled from the output folders and a
# Performance (single fold)
test loss: 4.15, test acc: 7.55e-01, test error: 2.46e-01

Per Class Accuracy:
Per Class Accuracy:
0: 0.850
1: 0.670
2: 0.600
Expand All @@ -69,9 +69,9 @@ Per Class Accuracy:
6: 0.753
7: 0.906
8: 0.790
9: 0.939,
9: 0.939,

Confusion Matrix:
Confusion Matrix:
[[85 1 2 3 0 1 1 0 1 6]
[ 2 67 5 9 0 3 5 2 6 1]
[ 0 3 60 1 1 0 16 16 3 0]
Expand Down Expand Up @@ -131,9 +131,9 @@ Again, your results will NOT be comparable to previous results in the literature





While all of the above hyperparameter files listed above (except the 10-fold-cv) accept as lists the train, valid and test

While all of the above hyperparameter files listed above (except the 10-fold-cv) accept as lists the train, valid and test



Expand Down
Loading

0 comments on commit 0570ead

Please sign in to comment.