added Kaldi stuff

vrenkens · Jul 28, 2017 · e926522 · e926522
1 parent a32302d
commit e926522
Show file tree

Hide file tree

Showing 89 changed files with 2,624 additions and 633 deletions.
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,7 @@ cluster
 nabu/computing/condor/create_environment.sh
 config/recipes/LAS/GP
 config/recipes/DBLSTM/GP
+config/recipes/phonology
+.gitignore
+config/recipes/ali_phonology
+sweeps
diff --git a/README.md b/README.md
@@ -12,33 +12,41 @@ adjust everything from the model structure to the way it is trained.
 
 Nabu works in several stages: data prepation, training and finally testing and
 decoding. Each of these stages uses a recipe for a specific model and database.
-The recipe defines all the necesary parameters for the database and the model.
-You can find more information on recipes [here](config/recipes/README.md).
+The recipe contains configuration files for the all components and defines all
+the necesary parameters for the database and the model. You can find more
+information on the components in a recipe [here](config/recipes/README.md).
 
 ### Data preperation
 
-In the data prepation stage all the data is prepared (feature computation,
+In the data preperation stage all the data is prepared (feature computation,
 target normalization etc.) for training and testing. Before running the data
 preperation you should create a database.conf file in the recipe directory
-based on the database.cfg that should already be there ,and fill in all the
+based on the database.cfg that should already be there, and fill in all the
 paths. Should you want to modify parameters in the processors, you can modify
 the config files that are pointed to in the database config. You can find more
 information about processors [here](nabu/processing/processors/README.md).
 
 You can run the data prepation with:
 
 ```
-run data --recipe=/path/to/recipe
+run data --recipe=/path/to/recipe --expdir=/path/to/expdir --computing=<computing>
 ```
 
-The recipe parameter should point to the directory containing the recipe you
+- recipe: points to the directory containing the recipe you
 want to prepare the data for.
+- expdir: the path to a directory where you can write to. In this directory all
+files will be stored, like the configurations and logs
+- computing [default: standard]: the distributed computing software you want to
+use. One of standard or condor. standard means that no distributed computing
+software is used and the job will run on the machine where nabu is called from.
+the condor option uses HTCondor. More information can be found
+[here](nabu/computing/README.md).
 
 ### Training
 
 In the training stage the model will be trained to minimize a loss function.
 During training the model can be evaluated to adjust the learning rate if
-necessary. Multile configuration files in the recipe are used during training:
+necessary. Multiple configuration files in the recipe are used during training:
 
 - model.cfg: model parameters
 - trainer.cfg: training parameters
@@ -55,19 +63,10 @@ You can run the training with:
 run train --recipe=/path/to/recipe --expdir=/path/to/expdir --mode=<mode> --computing=<computing>
 ```
 
-The parameters of this script are the following:
-
-- recipe: path to the recipe configuration directory (like in data prepation)
-- expdir: the path to a directory where you can write to. In this directory all
-files will be stored, like the configurations, intermediate models, logs etc.
-- mode [default: non_distributed]: this is the distribution mode. This should be
-one of non_distributed, single_machine or multi_machine. You can find more
-information about this [here](nabu/computing/README.md)
-- computing [default: standart]: the distributed computing software you want to
-use. One of standart or condor. standart means that no distributed computing
-software is used and the job will run on the machine where nabu is called from.
-the condor option uses HTCondor. More information can be found
-[here](nabu/computing/README.md).
+The parameters are the same as the data preperation script (see above) with one
+extra parameter; mode (default: non_distributed). Mode is the distribution mode.
+This should be one of non_distributed, single_machine or multi_machine.
+You can find more information about this [here](nabu/computing/README.md)
 
 ### Testing
 
@@ -101,6 +100,166 @@ run decode --recipe=/path/to/recipe --expdir=/path/to/expdir --computing=<comput
 The parameters for this script are similar to the training script (see above).
 You should use the same expdir that you used for training the model.
 
+### Parameter search
+
+You can automatically do a parameter search using Nabu. To do this you should
+create a sweep file. A sweep file contain blocks of parameters, each block
+will change the parameters in the recipe and run a script. A sweep file
+looks like this:
+
+'''
+experiment name 1
+confile1 section option value
+confile2 section option value
+...
+
+experiment name 2
+confile1 section option value
+confile2 section option value
+...
+
+...
+'''
+
+For example, if you want to try several number of layers and number of units:
+
+'''
+4layers_1024units
+model.cfg encoder num_layers 4
+model.cfg encoder num_units 1024
+
+4layers_1024units
+model.cfg encoder num_layers 4
+model.cfg encoder num_units 2048
+
+5layers_1024units
+model.cfg encoder num_layers 5
+model.cfg encoder num_units 1024
+
+5layers_1024units
+model.cfg encoder num_layers 5
+model.cfg encoder num_units 2048
+'''
+
+The parameter sweep can then be executed as follows:
+
+'''
+run sweep --command=<command> --sweep=/path/to/sweepfile --expdir=/path/to/exdir <command option>
+'''
+
+where command can be any of the commands discussed above.
+
+### Kaldi an Nabu
+
+There are some scripts avaiable to use a Nabu Neural Network in the Kaldi
+framework. Kaldi is an ASR toolkit. You can find more information
+[here](http://www.kaldi-asr.org/).
+
+Using Kaldi with nabu happens in several steps:
+1) Data preperation
+2) GMM-HMM training
+3) Aligning the data
+4) computing the prior
+5) Training the Neural Network
+6) Decoding and scoring
+
+#### Data preperation
+
+The data preperation is database dependent. Kaldi has many scripts for data
+preperation and you should use them.
+
+#### GMM-HMM training
+
+You can train te GMM-HMM model as folows:
+
+'''
+nabu/scipts/kaldi/train_gmm.sh <datadir> <langdir> <langdir-test>  <traindir> <kaldi>
+'''
+
+With the folowing arguments:
+- datadir: the directory containing the training data (created in data prep)
+- langdir: the directory containing the lanuage model for training
+(created in data prep)
+- langdir-test: the directory containg the language model that should be used
+for decoding (created in data prep)
+- traindir: The directory where the training files (logs, models, ...) will be
+written
+- kaldi: the location of your kaldi installation
+
+The script will compute the features, train the GMM-HMM models and align the
+training data, so you do not have to do this anymore in the comming step.
+The alignments for the training set can be found in <traindir>/pdfs.
+
+#### Aligning the data
+
+The training data has already been aligned in the previous step, but if you want
+to align e.g. the validation set you can do that as follows:
+
+'''
+nabu/scipts/kaldi/align_data.sh <datadir> <langdir> <traindir> <targetdir> <kaldi>
+'''
+
+the datadir should point to the data you want to align, the traindir should be
+the traindir you used i the previous step and the targetdir is the directory
+where the alignments will be written. The alignments can be found in
+<targetdir>/pdfs
+
+#### Computing the prior
+
+The prior is needed to convert the pdf posteriors to pdf pseudo-likelihoods.
+The prior can be computed with:
+
+'''
+nabu/scipts/kaldi/compute_prior.sh <traindir>
+'''
+
+traindir should be the same as the traindir in the previous step. the prior can
+then be found in numpy format in <traindir>/prior.npy
+
+#### Training the neural net
+
+Training the neural network happens using the Nabu framework. In order to do
+this you should create a recipe for doing so (see the section on training).
+You can find an example recipe for this in config/recipes/DNN/WSJ. You can
+use this recipe, but you should still create the database configuration.
+In your database configuration you should create sections for the features
+which is the same as you would do for a normal Nabu neural network and
+sections for the alignments. The alignment sections should get the special
+type of alignments. A section should look something like this:
+
+'''
+[trainalignments]
+type = alignment
+datafiles = <traindir>/pdfs
+dir = /path/to/dir
+processor_config = path/to/alignment_processor.cfg
+'''
+
+dir is just the directory where the processed alignments will be written.
+
+The rest of the training procedure is the same as the normal procedure, so
+folow the instructions in the sections above.
+
+#### Decoding and scoring
+
+To decode the using the trained system you should first compute the
+pseudo-likelihoods as folows:
+
+'''
+run decode --expdir=<expdir> --recipe=<recipe> ...
+'''
+
+The pseudo likelihoods can the be found in <expdir>/decode/decoded/alignments.
+
+You can then dor the Kaldi decoding and scoring with:
+
+'''
+nabu/scipts/kaldi/decode.sh <datadir> <traindir> <expdir>/decode/decoded/alignments/feats.scp <outputs> <kaldi>
+'''
+
+The arguments are similar then the arguments in the script above. The outputs
+will be written to the <outputs> folder.
+
 ## Designing in Nabu
 
 As mentioned in the beginning Nabu focusses on adaptibility. You can easily

diff --git a/config/recipes/DBLSTM/TIMIT/model.cfg b/config/recipes/DBLSTM/TIMIT/model.cfg
@@ -15,7 +15,11 @@ num_layers = 3
 input_noise = 0.6
 #dropout rate
 dropout = 0.5
+#wheter layer normalization should be applied
+layer_norm = True
 
 [decoder]
 #type of decoder
-decoder = linear_decoder
+decoder = dnn_decoder
+num_layers = 0
+output_dims = 39
diff --git a/config/recipes/DBLSTM/TIMIT/trainer.cfg b/config/recipes/DBLSTM/TIMIT/trainer.cfg
@@ -29,10 +29,10 @@ numbuckets = 16
 #frequency of evaluating the validation set.
 valid_frequency = 500
 #if you want to adapt the learning rate based on the validation set, set to True
-valid_adapt = True
+valid_adapt = False
 #if you want to go back in training if validation performance is worse set to True
-go_back = True
+go_back = False
 #the number of times validation performance can be worse before terminating training, set to None to disable early stopping
-num_tries = 3
+num_tries = 5
 #set to True if you want to reset the number of tries if the validation performance is better
 reset_tries = True
diff --git a/config/recipes/DBLSTM/TIMIT/validation_evaluator.cfg b/config/recipes/DBLSTM/TIMIT/validation_evaluator.cfg
@@ -1,6 +1,6 @@
 [evaluator]
 #name of the evaluator that should be used
-evaluator = ctc_evaluator
+evaluator = loss_evaluator
 #the number of utterances that are processed simultaniously
 batch_size = 8
 #link the input names defined in the classifier config to sections defined in

diff --git a/config/recipes/DNN/WSJ/alignment_processor.cfg b/config/recipes/DNN/WSJ/alignment_processor.cfg
@@ -0,0 +1,3 @@
+[processor]
+#type of processor
+processor = alignment_processor
diff --git a/config/recipes/DNN/WSJ/feature_processor.cfg b/config/recipes/DNN/WSJ/feature_processor.cfg
@@ -0,0 +1,24 @@
+[processor]
+#type of processor
+processor = audio_processor
+#feature type
+feature = fbank
+#the dynamic information that is added to the features, options are nodelta,
+#delta and ddelta
+dynamic = ddelta
+#length of the sliding window (seconds)
+winlen = 0.025
+#step of the sliding window (seconds)
+winstep = 0.01
+#number of fbank filters
+nfilt = 40
+#number of fft bins
+nfft = 512
+#low cuttof frequency
+lowfreq = 0
+#hight cutoff frequency, if -1 set to None
+highfreq = -1
+#premphesis
+preemph = 0.97
+#include energy in features
+include_energy = True
diff --git a/config/recipes/DNN/WSJ/model.cfg b/config/recipes/DNN/WSJ/model.cfg
@@ -0,0 +1,35 @@
+[io]
+#a space seperated list of input names
+inputs = features
+#a space seperated list of output names
+outputs = alignments
+
+[encoder]
+#type of encoder
+encoder = dnn
+#number of neurons in the hidden layers
+num_units = 2048
+#number of hidden layers
+num_layers = 5
+#input noise standart deviation
+input_noise = 0
+#dropout rate
+dropout = 0.5
+#number of left and right context windows to take into account
+context = 5
+#wheter layer normalization should be applied
+layer_norm = True
+
+[decoder]
+#type of decoder
+decoder = dnn_decoder
+#the output dimensions
+output_dims = 3100
+#the number of layers in each detector
+num_layers = 0
+#the number of units in each detector
+num_units = 2024
+#wheter layer normalization should be applied
+layer_norm = True
+#dropout rate
+dropout = 1
diff --git a/config/recipes/DNN/WSJ/recognizer.cfg b/config/recipes/DNN/WSJ/recognizer.cfg
@@ -0,0 +1,11 @@
+[recognizer]
+#the number of utterances that are processed simultaniously
+batch_size = 8
+#link the input names defined in the classifier config to sections defined in
+#the database config
+features = test93fbank
+
+[decoder]
+#name of the decoder that should be used
+decoder = alignment_decoder
+prior = /users/spraak/vrenkens/spchtemp/Nabu/data/wsj/kaldi_alignments/train_si284/prior.npy
diff --git a/config/recipes/DNN/WSJ/test_evaluator.cfg b/config/recipes/DNN/WSJ/test_evaluator.cfg
@@ -0,0 +1,17 @@
+[evaluator]
+#name of the evaluator that should be used
+evaluator = decoder_evaluator
+#the number of utterances that are processed simultaniously
+batch_size = 8
+#link the input names defined in the classifier config to sections defined in
+#the database config
+features = test93fbank
+#a space seperated list of target names used by the evaluator
+targets = event
+#a mapping between the target names and database sections
+event = test93Dental
+
+[decoder]
+#name of the decoder that should be used
+decoder = max_decoder
+event_alphabet = 0 1