Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kamo-naoyuki committed Apr 6, 2020
1 parent 20b1317 commit 65a196c
Show file tree
Hide file tree
Showing 5 changed files with 335 additions and 234 deletions.
4 changes: 3 additions & 1 deletion doc/espnet2_distributed.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Distributed training

There are some ways to launch the job of distributed training.
ESPnet2 provides some kinds of data-parallel distributed training.

1. Single node with Multi GPUs
1. Using multi-processing: `torch.nn.DistributedDataParallel`
Expand All @@ -14,6 +14,8 @@ There are some ways to launch the job of distributed training.
- `--dist_world_size N-HOST --ngpu N-GPU --multiprocessing_distributed false`
1. Launch `N-NODE` jobs with `1-GPU` for each node
- `--dist_world_size N-NODE --ngpu 1`



## Examples

Expand Down
253 changes: 148 additions & 105 deletions doc/espnet2_task.md
Original file line number Diff line number Diff line change
@@ -1,134 +1,170 @@
# Task class and data input system for training
## Task class

In ESpnet1, we have too many duplicated python modules. One of the big purposes of ESPnet2 is to provide a common interface and enable us to focus more on the unique parts of each task.
In ESpnet1, we have too many duplicated python modules.
One of the big purposes of ESPnet2 is to provide a common interface and
enable us to focus more on the unique parts of each task.

`Task` class is a common system to build training tools among each task, ASR, TTS, LM, etc. inspired by `Fairseq Task` idea. To build your task in ESPnet2, only you have to do is just inheriting `AbsTask` class:
`Task` class is a common system to build training tools among each task,
ASR, TTS, LM, etc. inspired by `Fairseq Task` idea.
To build your task in ESPnet2, only you have to do is just inheriting `AbsTask` class:

```python
from espnet2.tasks.abs_task import AbsTask
from espnet2.train.abs_espnet_model import AbsESPnetModel

class NewModel(ESPnetModel):
def forward(self, input, target):
# loss: The loss of the task. Must be a scalar value.
# stats: A dict object, used for logging and validation criterion
# weight: A scalar value that is used for normalization of loss and stats values among each mini-batches.
# In many cases, this value should be equal to the mini-batch-size
return loss, stats, weight
def forward(self, input, target):
(...)
# loss: The loss of the task. Must be a scalar value.
# stats: A dict object, used for logging and validation criterion
# weight: A scalar value that is used for normalization of loss and stats values among each mini-batches.
# In many cases, this value should be equal to the mini-batch-size
return loss, stats, weight

class NewTask(AbsTask):
@classmethod
def add_task_arguments(cls, parser):
parser.add_arguments(...)
...
@classmethod
def add_task_arguments(cls, parser):
parser.add_arguments(...)
(...)

@classmethod
def build_collate_fn(cls, args: argparse.Namespace)
(...)
@classmethod
def build_collate_fn(cls, args: argparse.Namespace)
(...)

@classmethod
def build_preprocess_fn(cls, args, train):
(...)
@classmethod
def build_preprocess_fn(cls, args, train):
(...)

@classmethod
def required_data_names(cls, inference: bool = False):
(...)
@classmethod
def required_data_names(cls, inference: bool = False):
(...)

@classmethod
def optional_data_names(cls, inference: bool = False):
(...)
@classmethod
def optional_data_names(cls, inference: bool = False):
(...)

@classmethod
def build_model(cls, args):
return NewModel(...)
@classmethod
def build_model(cls, args):
return NewModel(...)

if __name__ == "__main__":
# Start training
NewTask.main()
# Start training
NewTask.main()
```

## Data input system
Wspnet2 also provides a command line interface for inputting and loading data used in training. On the contrary, unlike `fairseq` or training system such as `pytorch-lightining`, our `Task` class doesn't have an interface for building the dataset explicitly. This is because we are aimed at the task related only speech/text, so it is enough to describle a set of such sequence data and we don't need general system so far.
Espnet2 also provides a command line interface for inputting and
loading data used in training. On the contrary, unlike `fairseq` or
training system such as `pytorch-lightining`,
our `Task` class doesn't have an interface for building the dataset explicitly.
This is because we aim at the task related to speech/text only,
so we don't need such general system so far.

The following is an example of the command lint arguments:

```bash
% python -m espnet2.bin.asr_train \
python -m espnet2.bin.asr_train \
--train_data_path_and_name_and_type=/some/path/tr/wav.scp,speech,sound \
--train_data_path_and_name_and_type=/some/path/tr/token_int,text,text_int \
--valid_data_path_and_name_and_type=/some/path/dev/wav.scp,speech,sound \
--valid_data_path_and_name_and_type=/some/path/dev/token_int,text,text_int
```

- `--train_data_path_and_name_and_type` and `--valid_data_path_and_name_and_type` can be repeated as you need.
- The argument of `--train_data_path_and_name_and_type` should be given as three values separated by comma, like `<file-path>,<key-name>,<file-format>`.

"file-format" indicates the format of file specified by "file-path". The followings are examples of supported formats:
First of all, our mini-batch is always a `dict` object:

```
# format=npy
sample_id_a /some/path/a.npy
sample_id_b /some/path/b.npy
```python
# In training iteration
for batch in iterator:
# e.g. batch = {"speech": ..., "text": ...}
# Forward
model(**batch)
```

```
# format=sound
sample_id_a /some/path/a.flac
sample_id_b /some/path/a.wav
```
Where the `model` is same as the model built by `Task.build_model()`.

```
# format=kaldi_ark
sample_id_a /some/path/a.ark:1234
sample_id_b /some/path/a.ark:5678
```
You can flexibly construct this mini-batch object
using `--*_data_path_and_name_and_type`.
`--*_data_path_and_name_and_type` can be repeated as you need and
each `--*_data_path_and_name_and_type` corresponds to an element in the mini-batch.
Also, keep in mind that **there is no distinction between input and target data**.

```
# format=text_int
sample_id_a 10 2 4 4
sample_id_b 3 2 0 1 6 2
```

```
# format=text
sample_id_a hello
sample_id_b world
```
The argument of `--train_data_path_and_name_and_type`
should be given as three values separated by commas,
like `<file-path>,<key-name>,<file-format>`.

```
# format=hdf5
(just giving 1level-hdf5)
```
- `key-name` specify the key of dict
- `file-path` is a file/directory path for the data source.
- `file-format` indicates the format of file specified by `file-path`.

"key-name" can be specified as any named and it indicates mini-batch which has each keys as dictionary:

```python
# In training iteration
batch = {"speech": ..., "text": ...}
loss, stats, weight = model(**batch)
### `scp` file
You can show the supported file format using `--help` option.

```bash
python -m espnet2.bin.asr_train --help
```

Where the `model` is the same as the model instance built by `Task.build_model()`.
Almost all formats are referred as `scp` file according to Kaldi-ASR.
`scp` is just a text file which has two columns for each line:
The first indicates the sample id and the second is some value.
e.g. file path, transcription, a sequence of numbers.


- format=npy
```
sample_id_a /some/path/a.npy
sample_id_b /some/path/b.npy
```
- format=sound
```
sample_id_a /some/path/a.flac
sample_id_b /some/path/a.wav
```
- format=kaldi_ark
```
sample_id_a /some/path/a.ark:1234
sample_id_b /some/path/a.ark:5678
```
- format=text_int
```
sample_id_a 10 2 4 4
sample_id_b 3 2 0 1 6 2
```
- format=text
```
sample_id_a hello
sample_id_b world
```
### `required_data_names()` and `optional_data_names()`
Though an arbitrary dictionary can be created by this system,
each task assumes that the specific key is given for a specific purpose.
e.g. ASR Task requires `speech` and `text` keys and
each value is used for input data and target data respectively.
See again the methods of `Task` class:
`required_data_names()` and `optional_data_names()`.
Though arbitrary dictionary can be created by this system, each task assumes that the specific key is given for a specific purpose. e.g. ASR Task requires `speech` and `text` keys and each value is used for input data and target data respectively. See again the methods of `Task` class: `required_data_names()` and `optional_data_names()`.
```python
class NewTask(AbsTask):
@classmethod
def required_data_names(cls, inference: bool = False):
if not inference:
retval = ("input", "target")
else:
retval = ("input",)
return retval
if not inference:
retval = ("input", "target")
else:
retval = ("input",)
return retval
@classmethod
def optional_data_names(cls, inference: bool = False):
retval = ("auxially_feature",)
return retval
retval = ("auxially_feature",)
return retval
```


`required_data_names()` determines the mandatory data names and `optional_data_names()` gives optional data. It means that the other names are allowed to given by command line arguments.

```bash
Expand All @@ -142,7 +178,8 @@ python -m new_task \
--train_data_path_and_name_and_type=filepath,unknown,sometype
```

Actually the intention of this system is just an assertion check, so it may be unnecessary in some cases. If feel so, you can turn off this checking with `--allow_variable_data_keys true`.
The intention of this system is just an assertion check, so if feel unnecessary,
you can turn off this checking with `--allow_variable_data_keys true`.

```bash
# Ignore assertion checking for data names
Expand All @@ -162,21 +199,24 @@ class NewTask(AbsTask):
...
```

`collcate_fn` is an argument of `torch.utils.data.DataLoader` and it can modify the data which is received from data-loader. e.g.
`collcate_fn` is an argument of `torch.utils.data.DataLoader` and
it can modify the data which is received from data-loader. e.g.:

```python
def collcate_fn(data):
# data is a list of the return value of Dataset class:
modified_data = (...touch data)
return modified_data
# data is a list of the return value of Dataset class:
modified_data = (...touch data)
return modified_data

from torch.utils.data import DataLoader
data_loader = DataLoader(dataset, collcate_fn=collcate_fn)
for modified_data in data_loader:
...
...
```

The type of argument is determined by the input `dataset` class and our dataset is always `espnet2.train.dataset.ESPnetDataset`, which the return value is a tuple of sample id and a dict of tensor,
The type of argument is determined by the input `dataset` class and
our dataset is always `espnet2.train.dataset.ESPnetDataset`,
which the return value is a tuple of sample id and a dict of tensor,

```python
batch = ("sample_id", {"speech": tensor, "text": tensor})
Expand All @@ -192,35 +232,38 @@ data = [
]
```

The return type of collate_fn is supposed to be a tuple of list and a dict of tensor in espnet2, so the collcate_fn for `Task` must transform the data type to it.
The return type of collate_fn is supposed to be a tuple of list and a dict of tensor in espnet2,
so the collcate_fn for `Task` must transform the data type to it.

```python
for ids, batch in data_loader:
model(**batch)
```

We provide common collate_fn and this function can support many cases, so you might not need to customize it. This collate_fn is aware of variable sequence features for seq2seq task:
We provide common collate_fn and this function can support many cases,
so you might not need to customize it.
This collate_fn is aware of variable sequence features for seq2seq task:

- The first axis of the sequence tensor from dataset must be length axis: e.g. (Length, Dim), (Length, Dim, Dim2), or (Length, ...)
- It's not necessary to make the lengths of each sample unified and they are stacked with zero-padding.
- The value of padding can be changed.
```python
from espnet2.train.collate_fn import CommonCollateFn
@classmethod
def build_collate_fn(cls, args):
# float_pad_value is used for float-tensor and int_pad_value is used for int-tensor
return CommonCollateFn(float_pad_value=0.0, int_pad_value=-1)
```
- The value of padding can be changed.
```python
from espnet2.train.collate_fn import CommonCollateFn
@classmethod
def build_collate_fn(cls, args):
# float_pad_value is used for float-tensor and int_pad_value is used for int-tensor
return CommonCollateFn(float_pad_value=0.0, int_pad_value=-1)
```
- Tensors which represent the length of each samples are also appended
```python
batch = {"speech": ..., "speech_lengths": ..., "text": ..., "text_lengths": ...}
```
```python
batch = {"speech": ..., "speech_lengths": ..., "text": ..., "text_lengths": ...}
```
- If the feature is not sequential data, this behavior can be disabled.
```bash
python -m new_task --train_data_path_and_name_and_type=filepath,foo,npy
```
```python
@classmethod
def build_collate_fn(cls, args):
return CommonCollateFn(not_sequence=["foo"])
```
```bash
python -m new_task --train_data_path_and_name_and_type=filepath,foo,npy
```
```python
@classmethod
def build_collate_fn(cls, args):
return CommonCollateFn(not_sequence=["foo"])
```
Loading

0 comments on commit 65a196c

Please sign in to comment.