Update doc

iamanigeeit · Apr 6, 2020 · 65a196c · 65a196c
1 parent 20b1317
commit 65a196c
Show file tree

Hide file tree

Showing 5 changed files with 335 additions and 234 deletions.
diff --git a/doc/espnet2_distributed.md b/doc/espnet2_distributed.md
@@ -1,6 +1,6 @@
 # Distributed training
 
-There are some ways to launch the job of distributed training.
+ESPnet2 provides some kinds of data-parallel distributed training.
 
 1. Single node with Multi GPUs
     1. Using multi-processing: `torch.nn.DistributedDataParallel`
@@ -14,6 +14,8 @@ There are some ways to launch the job of distributed training.
         - `--dist_world_size N-HOST --ngpu N-GPU --multiprocessing_distributed false`
     1. Launch `N-NODE` jobs with `1-GPU` for each node
         - `--dist_world_size N-NODE --ngpu 1`
+
+
 
 ## Examples
 

diff --git a/doc/espnet2_task.md b/doc/espnet2_task.md
@@ -1,134 +1,170 @@
 # Task class and data input system for training
 ## Task class
 
-In ESpnet1, we have too many duplicated python modules. One of the big purposes of ESPnet2 is to provide a common interface and enable us to focus more on the unique parts of each task.
+In ESpnet1, we have too many duplicated python modules. 
+One of the big purposes of ESPnet2 is to provide a common interface and 
+enable us to focus more on the unique parts of each task.
 
-`Task` class is a common system to build training tools among each task, ASR, TTS, LM, etc. inspired by `Fairseq Task` idea. To build your task in ESPnet2, only you have to do is just inheriting `AbsTask` class:
+`Task` class is a common system to build training tools among each task, 
+ASR, TTS, LM, etc. inspired by `Fairseq Task` idea. 
+To build your task in ESPnet2, only you have to do is just inheriting `AbsTask` class:
 
 ```python
 from espnet2.tasks.abs_task import AbsTask
 from espnet2.train.abs_espnet_model import AbsESPnetModel
 
 class NewModel(ESPnetModel):
-  def forward(self, input, target):
-    # loss: The loss of the task. Must be a scalar value.
-    # stats: A dict object, used for logging and validation criterion
-    # weight: A scalar value that is used for normalization of loss and stats values among each mini-batches.
-    #     In many cases, this value should be equal to the mini-batch-size
-    return loss, stats, weight
+    def forward(self, input, target):
+        (...)
+        # loss: The loss of the task. Must be a scalar value.
+        # stats: A dict object, used for logging and validation criterion
+        # weight: A scalar value that is used for normalization of loss and stats values among each mini-batches.
+        #     In many cases, this value should be equal to the mini-batch-size
+        return loss, stats, weight
 
 class NewTask(AbsTask):
-  @classmethod
-  def add_task_arguments(cls, parser):
-    parser.add_arguments(...)
-    ...
+    @classmethod
+    def add_task_arguments(cls, parser):
+        parser.add_arguments(...)
+        (...)
 
-  @classmethod
-  def build_collate_fn(cls, args: argparse.Namespace)
-    (...)
+    @classmethod
+    def build_collate_fn(cls, args: argparse.Namespace)
+        (...)
 
-  @classmethod
-  def build_preprocess_fn(cls, args, train):
-    (...)
+    @classmethod
+    def build_preprocess_fn(cls, args, train):
+        (...)
 
-  @classmethod
-  def required_data_names(cls, inference: bool = False):
-    (...)
+    @classmethod
+    def required_data_names(cls, inference: bool = False):
+        (...)
 
-  @classmethod
-  def optional_data_names(cls, inference: bool = False):
-    (...)
+    @classmethod
+    def optional_data_names(cls, inference: bool = False):
+        (...)
 
-  @classmethod
-  def build_model(cls, args):
-    return NewModel(...)
+    @classmethod
+    def build_model(cls, args):
+        return NewModel(...)
 
 if __name__ == "__main__":
-  # Start training
-  NewTask.main()
+    # Start training
+    NewTask.main()
 ```
 
 ## Data input system
- Wspnet2 also provides a command line interface for inputting and loading data used in training. On the contrary, unlike `fairseq` or training system such as `pytorch-lightining`, our `Task` class doesn't have an interface for building the dataset explicitly. This is because we are aimed at the task related only speech/text, so it is enough to describle a set of such sequence data and we don't need general system so far.
+Espnet2 also provides a command line interface for inputting and 
+loading data used in training. On the contrary, unlike `fairseq` or 
+training system such as `pytorch-lightining`, 
+our `Task` class doesn't have an interface for building the dataset explicitly.
+This is because we aim at the task related to speech/text only, 
+so we don't need such general system so far.
 
+The following is an example of the command lint arguments:
 
 ```bash
-% python -m espnet2.bin.asr_train \
+python -m espnet2.bin.asr_train \
   --train_data_path_and_name_and_type=/some/path/tr/wav.scp,speech,sound \
   --train_data_path_and_name_and_type=/some/path/tr/token_int,text,text_int \
   --valid_data_path_and_name_and_type=/some/path/dev/wav.scp,speech,sound \
   --valid_data_path_and_name_and_type=/some/path/dev/token_int,text,text_int
 ```
 
-- `--train_data_path_and_name_and_type` and `--valid_data_path_and_name_and_type` can be repeated as you need.
-- The argument of `--train_data_path_and_name_and_type` should be given as three values separated by comma, like `<file-path>,<key-name>,<file-format>`.
 
-"file-format" indicates the format of file specified by "file-path". The followings are examples of supported formats:
+First of all, our mini-batch is always a `dict` object:
 
-```
-# format=npy
-sample_id_a /some/path/a.npy
-sample_id_b /some/path/b.npy
+```python
+# In training iteration
+for batch in iterator:
+    # e.g. batch = {"speech": ..., "text": ...}
+    # Forward
+    model(**batch)
 ```
 
-```
-# format=sound
-sample_id_a /some/path/a.flac
-sample_id_b /some/path/a.wav
-```
+Where the `model` is same as the model built by `Task.build_model()`.
 
-```
-# format=kaldi_ark
-sample_id_a /some/path/a.ark:1234
-sample_id_b /some/path/a.ark:5678
-```
+You can flexibly construct this mini-batch object 
+using `--*_data_path_and_name_and_type`.
+`--*_data_path_and_name_and_type` can be repeated as you need and 
+each `--*_data_path_and_name_and_type` corresponds to an element in the mini-batch.
+Also, keep in mind that **there is no distinction between input and target data**.
 
-```
-# format=text_int
-sample_id_a 10 2 4 4
-sample_id_b 3 2 0 1 6 2
-```
 
-```
-# format=text
-sample_id_a hello
-sample_id_b world
-```
+The argument of `--train_data_path_and_name_and_type` 
+should be given as three values separated by commas, 
+like `<file-path>,<key-name>,<file-format>`.
 
-```
-# format=hdf5
-(just giving 1level-hdf5)
-```
+- `key-name` specify the key of dict
+- `file-path` is a file/directory path for the data source.
+- `file-format` indicates the format of file specified by `file-path`. 
 
-"key-name" can be specified as any named and it indicates mini-batch which has each keys as dictionary:
 
-```python
-# In training iteration
-batch = {"speech": ..., "text": ...}
-loss, stats, weight = model(**batch)
+### `scp` file
+You can show the supported file format using `--help` option.
+
+```bash
+python -m espnet2.bin.asr_train --help
 ```
 
-Where the `model` is the same as the model instance built by `Task.build_model()`.
+Almost all formats are referred as `scp` file  according to Kaldi-ASR.
+`scp` is just a text file which has two columns for each line: 
+The first indicates the sample id and the second is some value. 
+e.g. file path, transcription, a sequence of numbers.
+
+
+- format=npy
+    ```
+    sample_id_a /some/path/a.npy
+    sample_id_b /some/path/b.npy
+    ```
+- format=sound
+    ```
+    sample_id_a /some/path/a.flac
+    sample_id_b /some/path/a.wav
+    ```
+- format=kaldi_ark
+    ```
+    sample_id_a /some/path/a.ark:1234
+    sample_id_b /some/path/a.ark:5678
+    ```
+- format=text_int
+    ```
+    sample_id_a 10 2 4 4
+    sample_id_b 3 2 0 1 6 2
+    ```
+- format=text
+    ```
+    sample_id_a hello
+    sample_id_b world
+    ```
+
+
+### `required_data_names()` and `optional_data_names()`
+Though an arbitrary dictionary can be created by this system, 
+each task assumes that the specific key is given for a specific purpose. 
+e.g. ASR Task requires `speech` and `text` keys and
+each value is used for input data and target data respectively. 
+See again the methods of `Task` class: 
+`required_data_names()` and `optional_data_names()`.
 
-Though arbitrary dictionary can be created by this system, each task assumes that the specific key is given for a specific purpose. e.g. ASR Task requires `speech` and `text` keys and each value is used for input data and target data respectively. See again the methods of `Task` class: `required_data_names()` and `optional_data_names()`.
 
 ```python
 class NewTask(AbsTask):
   @classmethod
   def required_data_names(cls, inference: bool = False):
-    if not inference:
-      retval = ("input", "target")
-    else:
-      retval = ("input",)
-    return retval
+      if not inference:
+          retval = ("input", "target")
+      else:
+          retval = ("input",)
+      return retval
 
   @classmethod
   def optional_data_names(cls, inference: bool = False):
-    retval = ("auxially_feature",)
-    return retval
+      retval = ("auxially_feature",)
+      return retval
 ```
 
-
 `required_data_names()` determines the mandatory data names and `optional_data_names()` gives optional data. It means that the other names are allowed to given by command line arguments.
 
 ```bash
@@ -142,7 +178,8 @@ python -m new_task \
   --train_data_path_and_name_and_type=filepath,unknown,sometype
 ```
 
-Actually the intention of this system is just an assertion check, so it may be unnecessary in some cases. If feel so, you can turn off this checking with `--allow_variable_data_keys true`.
+The intention of this system is just an assertion check, so if feel unnecessary, 
+you can turn off this checking with `--allow_variable_data_keys true`.
 
 ```bash
 # Ignore assertion checking for data names
@@ -162,21 +199,24 @@ class NewTask(AbsTask):
     ...
 ```
 
-`collcate_fn` is an argument of `torch.utils.data.DataLoader` and it can modify the data which is received from data-loader. e.g.
+`collcate_fn` is an argument of `torch.utils.data.DataLoader` and 
+it can modify the data which is received from data-loader. e.g.:
 
 ```python
 def collcate_fn(data):
-  # data is a list of the return value of Dataset class:
-  modified_data = (...touch data)
-  return modified_data
+    # data is a list of the return value of Dataset class:
+    modified_data = (...touch data)
+    return modified_data
 
 from torch.utils.data import DataLoader
 data_loader = DataLoader(dataset, collcate_fn=collcate_fn)
 for modified_data in data_loader:
-  ...
+    ...
 ```
 
-The type of argument is determined by the input `dataset` class and our dataset is always `espnet2.train.dataset.ESPnetDataset`, which the return value is a tuple of sample id and a dict of tensor,
+The type of argument is determined by the input `dataset` class and 
+our dataset is always `espnet2.train.dataset.ESPnetDataset`, 
+which the return value is a tuple of sample id and a dict of tensor,
 
 ```python
 batch = ("sample_id", {"speech": tensor, "text": tensor})
@@ -192,35 +232,38 @@ data = [
 ]
 ```
 
-The return type of collate_fn is supposed to be a tuple of list and a dict of tensor in espnet2, so the collcate_fn for `Task` must transform the data type to it.
+The return type of collate_fn is supposed to be a tuple of list and a dict of tensor in espnet2, 
+so the collcate_fn for `Task` must transform the data type to it.
 
 ```python
 for ids, batch in data_loader:
   model(**batch)
 ```
 
-We provide common collate_fn and this function can support many cases, so you might not need to customize it. This collate_fn is aware of variable sequence features for seq2seq task:
+We provide common collate_fn and this function can support many cases, 
+so you might not need to customize it. 
+This collate_fn is aware of variable sequence features for seq2seq task:
 
 - The first axis of the sequence tensor from dataset must be length axis: e.g. (Length, Dim), (Length, Dim, Dim2), or (Length, ...)
 - It's not necessary to make the lengths of each sample unified and they are stacked with zero-padding.
-  - The value of padding can be changed.
-  ```python
-  from espnet2.train.collate_fn import CommonCollateFn
-  @classmethod
-  def build_collate_fn(cls, args):
-    # float_pad_value is used for float-tensor and int_pad_value is used for int-tensor
-    return CommonCollateFn(float_pad_value=0.0, int_pad_value=-1)
-  ```
+    - The value of padding can be changed.
+        ```python
+        from espnet2.train.collate_fn import CommonCollateFn
+        @classmethod
+        def build_collate_fn(cls, args):
+            # float_pad_value is used for float-tensor and int_pad_value is used for int-tensor
+            return CommonCollateFn(float_pad_value=0.0, int_pad_value=-1)
+        ```
 - Tensors which represent the length of each samples are also appended
-  ```python
-  batch = {"speech": ..., "speech_lengths": ..., "text": ..., "text_lengths": ...}
-  ```
+    ```python
+    batch = {"speech": ..., "speech_lengths": ..., "text": ..., "text_lengths": ...}
+    ```
 - If the feature is not sequential data, this behavior can be disabled.
-  ```bash
-  python -m new_task --train_data_path_and_name_and_type=filepath,foo,npy
-  ```
-  ```python
-  @classmethod
-  def build_collate_fn(cls, args):
-    return CommonCollateFn(not_sequence=["foo"])
-  ```
+    ```bash
+    python -m new_task --train_data_path_and_name_and_type=filepath,foo,npy
+    ```
+    ```python
+    @classmethod
+    def build_collate_fn(cls, args):
+        return CommonCollateFn(not_sequence=["foo"])
+    ```