Skip to content

Commit

Permalink
added Kaldi stuff
Browse files Browse the repository at this point in the history
  • Loading branch information
vrenkens committed Jul 28, 2017
1 parent a32302d commit e926522
Show file tree
Hide file tree
Showing 89 changed files with 2,624 additions and 633 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@ cluster
nabu/computing/condor/create_environment.sh
config/recipes/LAS/GP
config/recipes/DBLSTM/GP
config/recipes/phonology
.gitignore
config/recipes/ali_phonology
sweeps
199 changes: 179 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,33 +12,41 @@ adjust everything from the model structure to the way it is trained.

Nabu works in several stages: data prepation, training and finally testing and
decoding. Each of these stages uses a recipe for a specific model and database.
The recipe defines all the necesary parameters for the database and the model.
You can find more information on recipes [here](config/recipes/README.md).
The recipe contains configuration files for the all components and defines all
the necesary parameters for the database and the model. You can find more
information on the components in a recipe [here](config/recipes/README.md).

### Data preperation

In the data prepation stage all the data is prepared (feature computation,
In the data preperation stage all the data is prepared (feature computation,
target normalization etc.) for training and testing. Before running the data
preperation you should create a database.conf file in the recipe directory
based on the database.cfg that should already be there ,and fill in all the
based on the database.cfg that should already be there, and fill in all the
paths. Should you want to modify parameters in the processors, you can modify
the config files that are pointed to in the database config. You can find more
information about processors [here](nabu/processing/processors/README.md).

You can run the data prepation with:

```
run data --recipe=/path/to/recipe
run data --recipe=/path/to/recipe --expdir=/path/to/expdir --computing=<computing>
```

The recipe parameter should point to the directory containing the recipe you
- recipe: points to the directory containing the recipe you
want to prepare the data for.
- expdir: the path to a directory where you can write to. In this directory all
files will be stored, like the configurations and logs
- computing [default: standard]: the distributed computing software you want to
use. One of standard or condor. standard means that no distributed computing
software is used and the job will run on the machine where nabu is called from.
the condor option uses HTCondor. More information can be found
[here](nabu/computing/README.md).

### Training

In the training stage the model will be trained to minimize a loss function.
During training the model can be evaluated to adjust the learning rate if
necessary. Multile configuration files in the recipe are used during training:
necessary. Multiple configuration files in the recipe are used during training:

- model.cfg: model parameters
- trainer.cfg: training parameters
Expand All @@ -55,19 +63,10 @@ You can run the training with:
run train --recipe=/path/to/recipe --expdir=/path/to/expdir --mode=<mode> --computing=<computing>
```

The parameters of this script are the following:

- recipe: path to the recipe configuration directory (like in data prepation)
- expdir: the path to a directory where you can write to. In this directory all
files will be stored, like the configurations, intermediate models, logs etc.
- mode [default: non_distributed]: this is the distribution mode. This should be
one of non_distributed, single_machine or multi_machine. You can find more
information about this [here](nabu/computing/README.md)
- computing [default: standart]: the distributed computing software you want to
use. One of standart or condor. standart means that no distributed computing
software is used and the job will run on the machine where nabu is called from.
the condor option uses HTCondor. More information can be found
[here](nabu/computing/README.md).
The parameters are the same as the data preperation script (see above) with one
extra parameter; mode (default: non_distributed). Mode is the distribution mode.
This should be one of non_distributed, single_machine or multi_machine.
You can find more information about this [here](nabu/computing/README.md)

### Testing

Expand Down Expand Up @@ -101,6 +100,166 @@ run decode --recipe=/path/to/recipe --expdir=/path/to/expdir --computing=<comput
The parameters for this script are similar to the training script (see above).
You should use the same expdir that you used for training the model.

### Parameter search

You can automatically do a parameter search using Nabu. To do this you should
create a sweep file. A sweep file contain blocks of parameters, each block
will change the parameters in the recipe and run a script. A sweep file
looks like this:

'''
experiment name 1
confile1 section option value
confile2 section option value
...

experiment name 2
confile1 section option value
confile2 section option value
...

...
'''

For example, if you want to try several number of layers and number of units:

'''
4layers_1024units
model.cfg encoder num_layers 4
model.cfg encoder num_units 1024

4layers_1024units
model.cfg encoder num_layers 4
model.cfg encoder num_units 2048

5layers_1024units
model.cfg encoder num_layers 5
model.cfg encoder num_units 1024

5layers_1024units
model.cfg encoder num_layers 5
model.cfg encoder num_units 2048
'''

The parameter sweep can then be executed as follows:

'''
run sweep --command=<command> --sweep=/path/to/sweepfile --expdir=/path/to/exdir <command option>
'''

where command can be any of the commands discussed above.

### Kaldi an Nabu

There are some scripts avaiable to use a Nabu Neural Network in the Kaldi
framework. Kaldi is an ASR toolkit. You can find more information
[here](http://www.kaldi-asr.org/).

Using Kaldi with nabu happens in several steps:
1) Data preperation
2) GMM-HMM training
3) Aligning the data
4) computing the prior
5) Training the Neural Network
6) Decoding and scoring

#### Data preperation

The data preperation is database dependent. Kaldi has many scripts for data
preperation and you should use them.

#### GMM-HMM training

You can train te GMM-HMM model as folows:

'''
nabu/scipts/kaldi/train_gmm.sh <datadir> <langdir> <langdir-test> <traindir> <kaldi>
'''

With the folowing arguments:
- datadir: the directory containing the training data (created in data prep)
- langdir: the directory containing the lanuage model for training
(created in data prep)
- langdir-test: the directory containg the language model that should be used
for decoding (created in data prep)
- traindir: The directory where the training files (logs, models, ...) will be
written
- kaldi: the location of your kaldi installation

The script will compute the features, train the GMM-HMM models and align the
training data, so you do not have to do this anymore in the comming step.
The alignments for the training set can be found in <traindir>/pdfs.

#### Aligning the data

The training data has already been aligned in the previous step, but if you want
to align e.g. the validation set you can do that as follows:

'''
nabu/scipts/kaldi/align_data.sh <datadir> <langdir> <traindir> <targetdir> <kaldi>
'''

the datadir should point to the data you want to align, the traindir should be
the traindir you used i the previous step and the targetdir is the directory
where the alignments will be written. The alignments can be found in
<targetdir>/pdfs

#### Computing the prior

The prior is needed to convert the pdf posteriors to pdf pseudo-likelihoods.
The prior can be computed with:

'''
nabu/scipts/kaldi/compute_prior.sh <traindir>
'''

traindir should be the same as the traindir in the previous step. the prior can
then be found in numpy format in <traindir>/prior.npy

#### Training the neural net

Training the neural network happens using the Nabu framework. In order to do
this you should create a recipe for doing so (see the section on training).
You can find an example recipe for this in config/recipes/DNN/WSJ. You can
use this recipe, but you should still create the database configuration.
In your database configuration you should create sections for the features
which is the same as you would do for a normal Nabu neural network and
sections for the alignments. The alignment sections should get the special
type of alignments. A section should look something like this:

'''
[trainalignments]
type = alignment
datafiles = <traindir>/pdfs
dir = /path/to/dir
processor_config = path/to/alignment_processor.cfg
'''

dir is just the directory where the processed alignments will be written.

The rest of the training procedure is the same as the normal procedure, so
folow the instructions in the sections above.

#### Decoding and scoring

To decode the using the trained system you should first compute the
pseudo-likelihoods as folows:

'''
run decode --expdir=<expdir> --recipe=<recipe> ...
'''

The pseudo likelihoods can the be found in <expdir>/decode/decoded/alignments.

You can then dor the Kaldi decoding and scoring with:

'''
nabu/scipts/kaldi/decode.sh <datadir> <traindir> <expdir>/decode/decoded/alignments/feats.scp <outputs> <kaldi>
'''

The arguments are similar then the arguments in the script above. The outputs
will be written to the <outputs> folder.

## Designing in Nabu

As mentioned in the beginning Nabu focusses on adaptibility. You can easily
Expand Down
6 changes: 5 additions & 1 deletion config/recipes/DBLSTM/TIMIT/model.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,11 @@ num_layers = 3
input_noise = 0.6
#dropout rate
dropout = 0.5
#wheter layer normalization should be applied
layer_norm = True

[decoder]
#type of decoder
decoder = linear_decoder
decoder = dnn_decoder
num_layers = 0
output_dims = 39
6 changes: 3 additions & 3 deletions config/recipes/DBLSTM/TIMIT/trainer.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@ numbuckets = 16
#frequency of evaluating the validation set.
valid_frequency = 500
#if you want to adapt the learning rate based on the validation set, set to True
valid_adapt = True
valid_adapt = False
#if you want to go back in training if validation performance is worse set to True
go_back = True
go_back = False
#the number of times validation performance can be worse before terminating training, set to None to disable early stopping
num_tries = 3
num_tries = 5
#set to True if you want to reset the number of tries if the validation performance is better
reset_tries = True
2 changes: 1 addition & 1 deletion config/recipes/DBLSTM/TIMIT/validation_evaluator.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[evaluator]
#name of the evaluator that should be used
evaluator = ctc_evaluator
evaluator = loss_evaluator
#the number of utterances that are processed simultaniously
batch_size = 8
#link the input names defined in the classifier config to sections defined in
Expand Down
3 changes: 3 additions & 0 deletions config/recipes/DNN/WSJ/alignment_processor.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[processor]
#type of processor
processor = alignment_processor
24 changes: 24 additions & 0 deletions config/recipes/DNN/WSJ/feature_processor.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[processor]
#type of processor
processor = audio_processor
#feature type
feature = fbank
#the dynamic information that is added to the features, options are nodelta,
#delta and ddelta
dynamic = ddelta
#length of the sliding window (seconds)
winlen = 0.025
#step of the sliding window (seconds)
winstep = 0.01
#number of fbank filters
nfilt = 40
#number of fft bins
nfft = 512
#low cuttof frequency
lowfreq = 0
#hight cutoff frequency, if -1 set to None
highfreq = -1
#premphesis
preemph = 0.97
#include energy in features
include_energy = True
35 changes: 35 additions & 0 deletions config/recipes/DNN/WSJ/model.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
[io]
#a space seperated list of input names
inputs = features
#a space seperated list of output names
outputs = alignments

[encoder]
#type of encoder
encoder = dnn
#number of neurons in the hidden layers
num_units = 2048
#number of hidden layers
num_layers = 5
#input noise standart deviation
input_noise = 0
#dropout rate
dropout = 0.5
#number of left and right context windows to take into account
context = 5
#wheter layer normalization should be applied
layer_norm = True

[decoder]
#type of decoder
decoder = dnn_decoder
#the output dimensions
output_dims = 3100
#the number of layers in each detector
num_layers = 0
#the number of units in each detector
num_units = 2024
#wheter layer normalization should be applied
layer_norm = True
#dropout rate
dropout = 1
11 changes: 11 additions & 0 deletions config/recipes/DNN/WSJ/recognizer.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[recognizer]
#the number of utterances that are processed simultaniously
batch_size = 8
#link the input names defined in the classifier config to sections defined in
#the database config
features = test93fbank

[decoder]
#name of the decoder that should be used
decoder = alignment_decoder
prior = /users/spraak/vrenkens/spchtemp/Nabu/data/wsj/kaldi_alignments/train_si284/prior.npy
17 changes: 17 additions & 0 deletions config/recipes/DNN/WSJ/test_evaluator.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[evaluator]
#name of the evaluator that should be used
evaluator = decoder_evaluator
#the number of utterances that are processed simultaniously
batch_size = 8
#link the input names defined in the classifier config to sections defined in
#the database config
features = test93fbank
#a space seperated list of target names used by the evaluator
targets = event
#a mapping between the target names and database sections
event = test93Dental

[decoder]
#name of the decoder that should be used
decoder = max_decoder
event_alphabet = 0 1
Loading

0 comments on commit e926522

Please sign in to comment.