Apply pre-commit all

speechbrain · Nov 22, 2021 · 0570ead · 0570ead
1 parent 5c8ff86
commit 0570ead
Show file tree

Hide file tree

Showing 29 changed files with 75 additions and 75 deletions.
diff --git a/README.md b/README.md
@@ -105,7 +105,7 @@ For more details, take a look into the corresponding implementation in recipes/d
 Beyond providing recipes for training the models from scratch, SpeechBrain shares several pre-trained models (coupled with easy-inference functions) on [HuggingFace](https://huggingface.co/speechbrain). In the following, we report some of them:
 
 | Task        | Dataset | Model |
-| ------------- |:-------------:| -----:| 
+| ------------- |:-------------:| -----:|
 | Speech Recognition | LibriSpeech | [CNN + Transformer](https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech) |
 | Speech Recognition | LibriSpeech | [CRDNN](https://huggingface.co/speechbrain/asr-crdnn-transformerlm-librispeech) |
 | Speech Recognition | CommonVoice(English) | [wav2vec + CTC](https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-en) |

diff --git a/docs/docs-requirements.txt b/docs/docs-requirements.txt
@@ -1,7 +1,7 @@
 better-apidoc>=0.3.1
+ctc-segmentation>=1.7.0
 numba>=0.54.1
 recommonmark>=0.7.1
 six
 sphinx-rtd-theme>=0.4.3
 Sphinx>=3.4.3
-ctc-segmentation>=1.7.0
diff --git a/docs/index.rst b/docs/index.rst
@@ -32,7 +32,7 @@ Referencing SpeechBrain
 .. code-block:: txt
 
   @misc{speechbrain,
-      title={SpeechBrain: A General-Purpose Speech Toolkit}, 
+      title={SpeechBrain: A General-Purpose Speech Toolkit},
       author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
       year={2021},
       eprint={2106.04624},

diff --git a/docs/multigpu.md b/docs/multigpu.md
@@ -19,7 +19,7 @@ Important: the batch size for each GPU process will be: `batch_size / Number of
 
 ## Multi-GPU training using Distributed Data Parallel (DDP)
 
-DDP implements data parallelism on different processes. This way, the GPUs do not necessarily have to be in the same server. This solution is much more flexible. However, the training routines must be written considering multi-threading. 
+DDP implements data parallelism on different processes. This way, the GPUs do not necessarily have to be in the same server. This solution is much more flexible. However, the training routines must be written considering multi-threading.
 
 With SpeechBrain, we put several efforts to make sure the code is compliant with DDP. For instance, to avoid conflicts across processes we develop the `run_on_main` function. It is called when critical operations such as writing a file on disk are performed. It ensures that these operations are run in a single process only. The other processes are waiting until this operation is completed.
 
@@ -46,7 +46,7 @@ cd recipes/<dataset>/<task>/
 python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr machine_1_adress --master_port 5555 experiment.py hyperparams.yaml --distributed_launch --distributed_backend='nccl'
 ```
 
-In this case, Machine 1 will have 2 subprocesses (subprocess1: with local_rank=0, rank=0, and subprocess2: with local_rank=1, rank=1). Machine 2 will have 2 subprocess (subprocess1: with local_rank=0, rank=2, and subprocess2: with local_rank=1, rank=3). 
+In this case, Machine 1 will have 2 subprocesses (subprocess1: with local_rank=0, rank=0, and subprocess2: with local_rank=1, rank=1). Machine 2 will have 2 subprocess (subprocess1: with local_rank=0, rank=2, and subprocess2: with local_rank=1, rank=3).
 
 In practice, using `torch.distributed.launch` ensures that the right environment variables are set (`local_rank` and `rank`), so you don't have to bother about it.
 
@@ -68,7 +68,7 @@ Now, let's try to scale this up a bit with a resource manager like SLURM. Here,
 cd ${SLURM_SUBMIT_DIR}
 
 # And we call the srun that will run --ntasks-per-node times (once here) per node
-srun srun_script.sh 
+srun srun_script.sh
 ```
 
 ```shell
@@ -90,6 +90,6 @@ MASTER=`echo $LISTNODES | cut -d" " -f1`
 python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${SLURM_JOB_NUM_NODES} --node_rank=${SLURM_NODEID} --master_addr=${MASTER} --master_port=5555 train.py hparams/myrecipe.yaml
 ```
 
-Note that using DDP on different machines introduces a **communication overhead** that might slow down training (depending on how fast is the connection across the different machines). 
+Note that using DDP on different machines introduces a **communication overhead** that might slow down training (depending on how fast is the connection across the different machines).
 
 We would like to advise our users that despite being more efficient, DDP is also more prone to exhibit unexpected bugs. Indeed, DDP is quite server-dependent and some setups might generate errors with the PyTorch implementation of DDP.  The future version of pytorch will improve the stability of DDP.
diff --git a/recipes/AISHELL-1/ASR/seq2seq/README.md b/recipes/AISHELL-1/ASR/seq2seq/README.md
@@ -12,8 +12,8 @@ cd ../../Tokenizer
 python train.py hparams/tokenizer_bpe5000.yaml --data_folder=/localscratch/aishell/
 ```
 If not present in the specified data_folder, the dataset will be automatically downloaded there.
-This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not 
-specify a different tokenizer in the speech recognition recipe. 
+This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not
+specify a different tokenizer in the speech recognition recipe.
 
 2- Train the speech recognizer
 ```

diff --git a/recipes/AISHELL-1/ASR/transformer/README.md b/recipes/AISHELL-1/ASR/transformer/README.md
@@ -9,8 +9,8 @@ cd ../../Tokenizer
 python train.py hparams/train_transformer_tokenizer_bpe5000.yaml --data_folder=/localscratch/aishell/
 ```
 If not present in the specified data_folder, the dataset will be automatically downloaded there.
-This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not 
-specify a different tokenizer in the speech recognition recipe. 
+This step is not mandatory. We will use the official tokenizer downloaded from the web if you do not
+specify a different tokenizer in the speech recognition recipe.
 
 2- Train the speech recognizer
 ```
@@ -39,7 +39,7 @@ and about 5 hours minutes on a NVIDIA V100 (32GB) for rain_ASR_transformer_with_
 You can find the pre-trained model with an easy-inference function on HuggingFace
 - https://huggingface.co/speechbrain/asr-transformer-aishell
 - https://huggingface.co/speechbrain/asr-wav2vec2-transformer-aishell
- 
+
 
 # **About SpeechBrain**
 - Website: https://speechbrain.github.io/

diff --git a/recipes/AMI/Diarization/README.md b/recipes/AMI/Diarization/README.md
@@ -2,7 +2,7 @@
 This directory contains the scripts for speaker diarization on the AMI corpus (http://groups.inf.ed.ac.uk/ami/corpus/).
 
 ## Extra requirements
-The code requires sklearn as an additional dependency. 
+The code requires sklearn as an additional dependency.
 To install it, type: `pip install sklearn`
 
 ## How to run
@@ -11,7 +11,7 @@ Use the following command to run diarization on AMI corpus.
 
 
 ## Speaker Diarization using Deep Embedding and Spectral Clustering
-The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. You can also train the speaker embedding model from scratch using instructions in the same file. 
+The script assumes the pre-trained model. Please refer to speechbrain/recipes/VoxCeleb/SpeakerRec/README.md to know more about the available pre-trained models that can easily be downloaded. You can also train the speaker embedding model from scratch using instructions in the same file.
 
 
 ## Best performance in terms of Diarization Error Rate (DER).

diff --git a/recipes/CommonLanguage/lang_id/README.md b/recipes/CommonLanguage/lang_id/README.md
@@ -40,7 +40,7 @@ print(text_lab)
 
 
 **Web Demo** Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo Audio Classification: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/Speechbrain-audio-classification)
- 
+
 # **About SpeechBrain**
 - Website: https://speechbrain.github.io/
 - Code: https://github.com/speechbrain/speechbrain/

diff --git a/recipes/DNS/enhance/spectral_map/README.md b/recipes/DNS/enhance/spectral_map/README.md
@@ -3,8 +3,8 @@ This folder contains the scripts to train a speech enhancement system with spect
 You can download the dataset from here: https://github.com/microsoft/DNS-Challenge
 
 # How to run
-python train.py train/params_CNNTransformer.yaml  
-python train.py train/params_CNN.yaml 
+python train.py train/params_CNNTransformer.yaml
+python train.py train/params_CNN.yaml
 
 # Results
 | Release | hyperparams file | STOI | PESQ | Model link | GPUs |

diff --git a/recipes/ERPCore/P3_decoding/README.md b/recipes/ERPCore/P3_decoding/README.md
@@ -1,24 +1,24 @@
 # P300 decoding from single EEG trials using ERP CORE dataset
 # Task description
-The P300 is an attention-dependant response occurring when infrequent stimuli are presented to the user immersed into a sequence of more frequent background stimuli. 
-This response peaks between 300-500 ms after the infrequent stimulus onset and is mostly distributed on the parietal area. Due to the low signal-to-noise ratio of the electroencephalogram (EEG), P300 only emerges after an averaging procedure of EEG signals across several responses to stimuli (i.e., EEG trials) and across subjects. 
-Therefore, the decoding of the P300 event at the level of every single trial is a very challenging task. 
+The P300 is an attention-dependant response occurring when infrequent stimuli are presented to the user immersed into a sequence of more frequent background stimuli.
+This response peaks between 300-500 ms after the infrequent stimulus onset and is mostly distributed on the parietal area. Due to the low signal-to-noise ratio of the electroencephalogram (EEG), P300 only emerges after an averaging procedure of EEG signals across several responses to stimuli (i.e., EEG trials) and across subjects.
+Therefore, the decoding of the P300 event at the level of every single trial is a very challenging task.
 P300 not only is of particular relevance as a control signal to guide Brain-Computer Interfaces (e.g., P300 spellers) but also as a biomarker in psychiatric disorders (e.g., schizophrenia, depression, etc.).
 
 This folder contains the scripts to train a P300 decoder with EEG signals collected in the ERP CORE dataset using a compact convolutional neural network based on EEGNet.
 
 ERP CORE is an open collection of event-related potentials available at: https://osf.io/thsqg/
 
-The objective decoding task is the classification of the absence vs. presence of the P300 event (binary classification) from single EEG trials (i.e., single-trial P300 decoding) using signals from each subject separately. 
-This is necessary due to the high subject-to-subject variability in the EEG. 
+The objective decoding task is the classification of the absence vs. presence of the P300 event (binary classification) from single EEG trials (i.e., single-trial P300 decoding) using signals from each subject separately.
+This is necessary due to the high subject-to-subject variability in the EEG.
 Therefore, subject-specific decoders are trained and due to the resulting compact dataset (consisting of 200 EEG trials per subject using a 10-fold cross-validation scheme is adopted).
 
 # How to run
 Before running an experiment, make sure the extra-dependencies reported in the file `extra_requirements.txt` are installed in your environment.
 Note that this code requires mne==0.22.1.
 
 Download the dataset with: \
-\>>> python download_required_data.py --data_folder /path/to/ERPCore_P3 
+\>>> python download_required_data.py --data_folder /path/to/ERPCore_P3
 
 Perform training on a subject (e.g., subject ID 4='sub-004'): \
 \>>> python train.py train.yaml --sbj_id 'sub-004' --data_folder '/path/to/ERPCore_P3'
@@ -38,10 +38,10 @@ done
 
 
 # Results
-For each subject-specific decoder and within each fold, AUROCs and F1 scores were computed on the test set. 
-These metrics are stored in the pickle file "metrics.pkl" (containing a ndarray with loss, F1 and AUROC for each fold within each row, with this order). 
+For each subject-specific decoder and within each fold, AUROCs and F1 scores were computed on the test set.
+These metrics are stored in the pickle file "metrics.pkl" (containing a ndarray with loss, F1 and AUROC for each fold within each row, with this order).
 
-Performance metrics were averaged across folds (subject-level metrics). 
+Performance metrics were averaged across folds (subject-level metrics).
 In the following table, the subject-level metrics are reported (mean ± standard error of the mean across subjects).
 
 | Release | Hyperparams file | Test F1 score | Test AUROCs |  GPUs |

diff --git a/recipes/ERPCore/P3_decoding/extra_requirements.txt b/recipes/ERPCore/P3_decoding/extra_requirements.txt
@@ -1,2 +1,2 @@
-sklearn
 mne==0.22.1
+sklearn
diff --git a/recipes/LibriSpeech/G2P/README.md b/recipes/LibriSpeech/G2P/README.md
@@ -1,6 +1,6 @@
 # Grapheme-to-phoneme (G2P).
 This folder contains the scripts to train a grapheme-to-phoneme system
-that converts characters in input to phonemes in output. It used the 
+that converts characters in input to phonemes in output. It used the
 lexicon of the LibriSpeech dataset
 
 You can download LibriSpeech at http://www.openslr.org/12

diff --git a/recipes/LibriSpeech/LM/README.md b/recipes/LibriSpeech/LM/README.md
@@ -1,8 +1,8 @@
 # Language Model with LibriSpeech
 This folder contains recipes for training language models for the LibriSpeech Dataset.
-It supports both an RNN-based LM and a Transformer-based LM. 
+It supports both an RNN-based LM and a Transformer-based LM.
 The scripts rely on the HuggingFace dataset, which manages data reading and loading from
-large text corpora. 
+large text corpora.
 
 You can download LibriSpeech at http://www.openslr.org/12
 

diff --git a/recipes/LibriSpeech/Tokenizer/README.md b/recipes/LibriSpeech/Tokenizer/README.md
@@ -6,7 +6,7 @@ You can download LibriSpeech at http://www.openslr.org/12
 
 
 # How to run
-python train.py train/1K_unigram_subword_bpe.yaml  
+python train.py train/1K_unigram_subword_bpe.yaml
 python train.py train/5K_unigram_subword_bpe.yaml
 
 

diff --git a/recipes/TIMIT/Alignment/README.md b/recipes/TIMIT/Alignment/README.md
@@ -9,7 +9,7 @@ python train.py train/train.yaml
 # Results
 
 | Release | hyperparams file | Test Accuracy | Model link | GPUs |
-|:-------------:|:---------------------------:| -----:| -----:| --------:| 
+|:-------------:|:---------------------------:| -----:| -----:| --------:|
 | 20-05-22 | train.yaml | 79.55 | [model](https://drive.google.com/drive/folders/1fXu7JAVUYxZLosH05iBTEPrJyVSCjNRi?usp=sharing)  | 1xV100 32GB |
 
 

diff --git a/recipes/UrbanSound8k/README.md b/recipes/UrbanSound8k/README.md
@@ -1,8 +1,8 @@
 ## UrbanSound8k  multi-class audio classification
 
-[This recipe and description has been adapted from the SpeechBrain "VoxCeleb" recipe example] 
+[This recipe and description has been adapted from the SpeechBrain "VoxCeleb" recipe example]
 
-This recipe contains scripts for multi-class audio classification experiments with the UrbanSound8k dataset (https://urbansounddataset.weebly.com/urbansound8k.html). While publicly available, a request must be made before a download link for the dataset will be provided by the authors (https://urbansounddataset.weebly.com/download-urbansound8k.html). 
+This recipe contains scripts for multi-class audio classification experiments with the UrbanSound8k dataset (https://urbansounddataset.weebly.com/urbansound8k.html). While publicly available, a request must be made before a download link for the dataset will be provided by the authors (https://urbansounddataset.weebly.com/download-urbansound8k.html).
 
 UrbanSound8k is divided into 10 classes, one of which (engine_idling) receives special attention in our experiments below.
 
@@ -59,7 +59,7 @@ Note that the results for 10-fold must be compiled from the output folders and a
 # Performance (single fold)
 test loss: 4.15, test acc: 7.55e-01, test error: 2.46e-01
 
-Per Class Accuracy: 
+Per Class Accuracy:
 0: 0.850
 1: 0.670
 2: 0.600
@@ -69,9 +69,9 @@ Per Class Accuracy:
 6: 0.753
 7: 0.906
 8: 0.790
-9: 0.939, 
+9: 0.939,
 
- Confusion Matrix: 
+ Confusion Matrix:
 [[85  1  2  3  0  1  1  0  1  6]
  [ 2 67  5  9  0  3  5  2  6  1]
  [ 0  3 60  1  1  0 16 16  3  0]
@@ -131,9 +131,9 @@ Again, your results will NOT be comparable to previous results in the literature
 
 
 
-
 
-While all of the above hyperparameter files listed above (except the 10-fold-cv) accept as lists the train, valid and test  
+
+While all of the above hyperparameter files listed above (except the 10-fold-cv) accept as lists the train, valid and test