Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zipformer from Dan using multi-dataset setup #675

Merged
Merged
Changes from all commits
Commits
Show all changes
665 commits
Select commit Hold shift + click to select a range
14a2603
Bug fix
danpovey Sep 28, 2022
d6ef1be
Change subsamplling factor from 1 to 2
danpovey Sep 28, 2022
461ad36
Implement AttentionCombine as replacement for RandomCombine
danpovey Sep 29, 2022
d398f0e
Decrease random_prob from 0.5 to 0.333
danpovey Sep 29, 2022
d8f7310
Add print statement
danpovey Sep 29, 2022
056b9a4
Apply single_prob mask, so sometimes we just get one layer as output.
danpovey Sep 29, 2022
38f8905
Introduce feature mask per frame
danpovey Sep 29, 2022
ab7c940
Include changes from Liyong about padding conformer module.
danpovey Sep 30, 2022
1eb603f
Reduce single_prob from 0.5 to 0.25
danpovey Sep 30, 2022
cc64f2f
Reduce feature_mask_dropout_prob from 0.25 to 0.15.
danpovey Oct 1, 2022
e9326a7
Remove dropout from inside ConformerEncoderLayer, for adding to resid…
danpovey Oct 1, 2022
8d517a6
Increase feature_mask_dropout_prob from 0.15 to 0.2.
danpovey Oct 1, 2022
cf5f7e5
Swap random_prob and single_prob, to reduce prob of being randomized.
danpovey Oct 1, 2022
1be4554
Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert…
danpovey Oct 2, 2022
c20fc3b
Randomize order of some modules
danpovey Oct 3, 2022
a0a1874
Bug fix
danpovey Oct 3, 2022
5a89953
Stop backprop bug
danpovey Oct 3, 2022
93dff29
Introduce a scale dependent on the masking value
danpovey Oct 3, 2022
b3af9f6
Implement efficient layer dropout
danpovey Oct 3, 2022
88d0da7
Simplify the learned scaling factor on the modules
danpovey Oct 3, 2022
96e0d92
Compute valid loss on batch 0.
danpovey Oct 3, 2022
a9f950a
Make the scaling factors more global and the randomness of dropout mo…
danpovey Oct 3, 2022
33c24e4
Bug fix
danpovey Oct 3, 2022
006fcc1
Introduce offset in layerdrop_scaleS
danpovey Oct 4, 2022
5fe8cb1
Remove final combination; implement layer drop that drops the final l…
danpovey Oct 4, 2022
8154283
Bug fices
danpovey Oct 4, 2022
61f6283
Fix bug RE self.training
danpovey Oct 5, 2022
1cd7e93
Fix bug setting layerdrop mask
danpovey Oct 5, 2022
040592a
Fix eigs call
danpovey Oct 5, 2022
bb233d3
Add debug info
danpovey Oct 5, 2022
537c353
Remove warmup
danpovey Oct 6, 2022
0685ac7
Remove layer dropout and model-level warmup
danpovey Oct 6, 2022
02eb7af
Don't always apply the frame mask
danpovey Oct 6, 2022
99d17d1
Merge branch 'scaled_adam_exp58' into scaled_adam_exp67
danpovey Oct 6, 2022
e1d741a
Slight code cleanup/simplification
danpovey Oct 6, 2022
e4c9786
Merge branch 'scaled_adam_exp27' into scaled_adam_exp69
danpovey Oct 6, 2022
a3179c3
Various fixes, finish implementating frame masking
danpovey Oct 6, 2022
bd325e8
Remove debug info
danpovey Oct 6, 2022
314f238
Don't compute validation if printing diagnostics.
danpovey Oct 7, 2022
ebf8aa1
Apply layer bypass during warmup in a new way, including 2s and 4s of…
danpovey Oct 7, 2022
28e5f46
Update checkpoint.py to deal with int params
danpovey Oct 7, 2022
ff4028d
Revert initial_scale to previous values.
danpovey Oct 7, 2022
b9a95af
Remove the feature where it was bypassing groups of layers.
danpovey Oct 7, 2022
97bc894
Implement layer dropout with probability 0.075
danpovey Oct 7, 2022
9c1a239
Fix issue with warmup in test time
danpovey Oct 8, 2022
300da13
Add warmup schedule where dropout disappears from earlier layers first.
danpovey Oct 8, 2022
fe4a7e9
Have warmup that gradually removes dropout from layers; multiply init…
danpovey Oct 8, 2022
606d3bd
Do dropout a different way
danpovey Oct 8, 2022
71b8bfe
Fix bug in warmup
danpovey Oct 8, 2022
6dc449d
Remove debug print
danpovey Oct 8, 2022
af545e0
Make the warmup mask per frame.
danpovey Oct 8, 2022
b1fa3d5
Implement layer dropout (in a relatively efficient way)
danpovey Oct 8, 2022
5c99e97
Decrease initial keep_prob to 0.25.
danpovey Oct 8, 2022
2631f05
Make it start warming up from the very start, and increase warmup_bat…
danpovey Oct 8, 2022
86845bd
Change warmup schedule and increase warmup_batches from 4k to 6k
danpovey Oct 8, 2022
97a0fbe
Make the bypass scale trainable.
danpovey Oct 8, 2022
9023fe7
Change the initial keep-prob back from 0.25 to 0.5
danpovey Oct 8, 2022
bc9fbe2
Bug fix
danpovey Oct 8, 2022
d467338
Limit bypass scale to >= 0.1
danpovey Oct 8, 2022
5255969
Revert "Change warmup schedule and increase warmup_batches from 4k to…
danpovey Oct 9, 2022
e654086
Do warmup by dropping out whole layers.
danpovey Oct 9, 2022
3e137dd
Decrease frequency of logging variance_proportion
danpovey Oct 9, 2022
f8f200e
Make layerdrop different in different processes.
danpovey Oct 9, 2022
44ad73c
For speed, drop the same num layers per job.
danpovey Oct 9, 2022
40fa33d
Decrease initial_layerdrop_prob from 0.75 to 0.5
danpovey Oct 9, 2022
cf45090
Revert also the changes in scaled_adam_exp85 regarding warmup schedule
danpovey Oct 9, 2022
00841f0
Remove unused code LearnedScale.
danpovey Oct 9, 2022
bd7dce4
Reintroduce batching to the optimizer
danpovey Oct 9, 2022
dece8ad
Various fixes from debugging with nvtx, but removed the NVTX annotati…
danpovey Oct 9, 2022
d7f6e8e
Only apply ActivationBalancer with prob 0.25.
danpovey Oct 9, 2022
9f059f7
Fix s -> scaling for import.
danpovey Oct 10, 2022
09c9b02
Increase final layerdrop prob from 0.05 to 0.075
danpovey Oct 10, 2022
857b373
Fix bug where fewer layers were dropped than should be; remove unnece…
danpovey Oct 10, 2022
f941991
Fix bug in choosing layers to drop
danpovey Oct 10, 2022
12323f2
Refactor RelPosMultiheadAttention to have 2nd forward function and in…
danpovey Oct 10, 2022
5697623
Reduce final layerdrop_prob from 0.075 to 0.05.
danpovey Oct 10, 2022
1825336
Fix issue with diagnostics if stats is None
danpovey Oct 11, 2022
eb58e6d
Remove persistent attention scores.
danpovey Oct 12, 2022
1232302
Make ActivationBalancer and MaxEig more efficient.
danpovey Oct 12, 2022
b736bb4
Cosmetic improvements
danpovey Oct 12, 2022
49c6b69
Change scale_factor_scale from 0.5 to 0.8
danpovey Oct 12, 2022
9e30f2b
Make the ActivationBalancer regress to the data mean, not zero, when …
danpovey Oct 13, 2022
6333413
Merge branch 'scaled_adam_exp106' into scaled_adam_exp108
danpovey Oct 13, 2022
9270e32
Remove unused config value
danpovey Oct 13, 2022
b09a1b2
Fix bug when channel_dim < 0
danpovey Oct 13, 2022
23d6bf7
Fix bug when channel_dim < 0
danpovey Oct 13, 2022
2a50def
Simplify how the positional-embedding scores work in attention (thank…
danpovey Oct 13, 2022
7d8e460
Revert dropout on attention scores to 0.0.
danpovey Oct 13, 2022
ae6478c
This should just be a cosmetic change, regularizing how we get the wa…
danpovey Oct 13, 2022
db8b991
Reduce beta from 0.75 to 0.0.
danpovey Oct 14, 2022
15b91c1
Reduce stats period from 10 to 4.
danpovey Oct 14, 2022
5f375be
Merge branch 'scaled_adam_exp103b2' into scaled_adam_exp103b4
danpovey Oct 14, 2022
9602341
Reworking of ActivationBalancer code to hopefully balance speed and e…
danpovey Oct 14, 2022
18ff1de
Add debug code for attention weihts and eigs
danpovey Oct 14, 2022
9095353
Remove debug statement
danpovey Oct 14, 2022
1812f6c
Add different debug info.
danpovey Oct 14, 2022
a780984
Penalize attention-weight entropies above a limit.
danpovey Oct 14, 2022
394d4c9
Remove debug statements
danpovey Oct 14, 2022
0557dbb
use larger delta but only penalize if small grad norm
danpovey Oct 14, 2022
822465f
Bug fixes; change debug freq
danpovey Oct 14, 2022
80d51ef
Change cutoff for small_grad_norm
danpovey Oct 14, 2022
0d452b5
Merge exp106 (remove persistent attention scores)
danpovey Oct 15, 2022
a0ef291
Merging 109: linear positional encoding
danpovey Oct 15, 2022
125e1b1
Merge branch 'scaled_adam_exp117' into scaled_adam_exp119
danpovey Oct 15, 2022
91840fa
Implement whitening of values in conformer.
danpovey Oct 15, 2022
fcbb960
Also whiten the keys in conformer.
danpovey Oct 15, 2022
593a6e9
Fix an issue with scaling of grad.
danpovey Oct 15, 2022
252798b
Decrease whitening limit from 2.0 to 1.1.
danpovey Oct 15, 2022
9919a05
Fix debug stats.
danpovey Oct 15, 2022
fc728f2
Reorganize Whiten() code; configs are not the same as before. Also r…
danpovey Oct 15, 2022
1135669
Bug fix RE float16
danpovey Oct 16, 2022
ef4650b
Revert whitening_limit from 1.1 to 2.2.
danpovey Oct 16, 2022
29d4e8e
Replace MaxEig with Whiten with limit=5.0, and move it to end of Conf…
danpovey Oct 16, 2022
ae0067c
Change LR schedule to start off higher
danpovey Oct 16, 2022
325f553
Simplify the dropout mask, no non-dropped-out sequences
danpovey Oct 16, 2022
03fe1ed
Make attention dims configurable, not embed_dim//2, trying 256.
danpovey Oct 17, 2022
3f495cd
Reduce attention_dim to 192; cherry-pick scaled_adam_exp130 which is …
danpovey Oct 17, 2022
2675944
Use half the dim for values, vs. keys and queries.
danpovey Oct 17, 2022
b988bc0
Increase initial-lr from 0.04 to 0.05, plus changes for diagnostics
danpovey Oct 18, 2022
b37564c
Cosmetic changes
danpovey Oct 18, 2022
6b3f9e5
Changes to avoid bug in backward hooks, affecting diagnostics.
danpovey Oct 19, 2022
c3c655d
Random clip attention scores to -5..5.
danpovey Oct 19, 2022
8e15d43
Add some random clamping in model.py
danpovey Oct 19, 2022
f4442de
Add reflect=0.1 to invocations of random_clamp()
danpovey Oct 19, 2022
45c38de
Remove in_balancer.
danpovey Oct 19, 2022
d37c159
Revert model.py so there are no constraints on the output.
danpovey Oct 19, 2022
9c54906
Implement randomized backprop for softmax.
danpovey Oct 19, 2022
ef5a273
Merge branch 'scaled_adam_exp146' into scaled_adam_exp149
danpovey Oct 19, 2022
0ad4462
Reduce min_abs from 1e-03 to 1e-04
danpovey Oct 19, 2022
a4443ef
Add RandomGrad with min_abs=1.0e-04
danpovey Oct 19, 2022
cc15552
Use full precision to do softmax and store ans.
danpovey Oct 19, 2022
f08a869
Merge branch 'scaled_adam_exp151' into scaled_adam_exp150
danpovey Oct 19, 2022
f6b8f0f
Fix bug in backprop of random_clamp()
danpovey Oct 20, 2022
d75d646
Merge branch 'scaled_adam_exp147' into scaled_adam_exp149
danpovey Oct 20, 2022
d137118
Get the randomized backprop for softmax in autocast mode working.
danpovey Oct 20, 2022
610281e
Keep just the RandomGrad changes, vs. 149. Git history may not refl…
danpovey Oct 20, 2022
679ba2e
Remove debug print
danpovey Oct 20, 2022
5a0914f
Merge branch 'scaled_adam_exp149' into scaled_adam_exp150
danpovey Oct 20, 2022
6601035
Reduce min_abs from 1.0e-04 to 5.0e-06
danpovey Oct 20, 2022
4565d43
Add hard limit of attention weights to +- 50
danpovey Oct 20, 2022
6e62094
Merge branch 'scaled_adam_exp150' into scaled_adam_exp155
danpovey Oct 20, 2022
1018a77
Use normal implementation of softmax.
danpovey Oct 20, 2022
dccff6b
Remove use of RandomGrad
danpovey Oct 20, 2022
c5cb52f
Remove the use of random_clamp in conformer.py.
danpovey Oct 20, 2022
9f68b57
Reduce the limit on attention weights from 50 to 25.
danpovey Oct 21, 2022
476fb9e
Reduce min_prob of ActivationBalancer from 0.1 to 0.05.
danpovey Oct 21, 2022
bdbd2cf
Penalize too large weights in softmax of AttentionDownsample()
danpovey Oct 21, 2022
e5fe3de
Also apply limit on logit in SimpleCombiner
danpovey Oct 21, 2022
3298e18
Increase limit on logit for SimpleCombiner to 25.0
danpovey Oct 21, 2022
1d2fe8e
Add more diagnostics to debug gradient scale problems
danpovey Oct 22, 2022
fd3f21f
Changes to grad scale logging; increase grad scale more frequently if…
danpovey Oct 22, 2022
2e93e5d
Add logging
danpovey Oct 22, 2022
8d1021d
Remove comparison diagnostics, which were not that useful.
danpovey Oct 22, 2022
9672dff
Merge branch 'scaled_adam_exp168' into scaled_adam_exp169
danpovey Oct 22, 2022
84580ec
Configuration changes: scores limit 5->10, min_prob 0.05->0.1, cur_gr…
danpovey Oct 22, 2022
efde375
Reset optimizer state when we change loss function definition.
danpovey Oct 22, 2022
1ec9fe5
Make warmup period decrease scale on simple loss, leaving pruned loss…
danpovey Oct 22, 2022
aa5f34a
Cosmetic change
danpovey Oct 22, 2022
74d7750
Increase initial-lr from 0.05 to 0.06.
danpovey Oct 22, 2022
1d43825
Increase initial-lr from 0.06 to 0.075 and decrease lr-epochs from 3.…
danpovey Oct 22, 2022
0691256
Fixes to logging statements.
danpovey Oct 22, 2022
af0fc31
Introduce warmup schedule in optimizer
danpovey Oct 22, 2022
9919fb3
Increase grad_scale to Whiten module
danpovey Oct 22, 2022
e8066b5
Merge branch 'scaled_adam_exp172' into scaled_adam_exp174
danpovey Oct 22, 2022
525e87a
Add inf check hooks
danpovey Oct 22, 2022
146626b
Renaming in optim.py; remove step() from scan_pessimistic_batches_for…
danpovey Oct 22, 2022
11886dc
Change base lr to 0.1, also rename from initial lr in train.py
danpovey Oct 22, 2022
1908123
Adding activation balancers after simple_am_prob and simple_lm_prob
danpovey Oct 22, 2022
8b3bba9
Reduce max_abs on am_balancer
danpovey Oct 22, 2022
7a55cac
Increase max_factor in final lm_balancer and am_balancer
danpovey Oct 22, 2022
466176e
Use penalize_abs_values_gt, not ActivationBalancer.
danpovey Oct 22, 2022
13ffd8e
Trying to reduce grad_scale of Whiten() from 0.02 to 0.01.
danpovey Oct 22, 2022
269b701
Add hooks.py, had negleted to git add it.
danpovey Oct 22, 2022
2964628
don't do penalize_values_gt on simple_lm_proj and simple_am_proj; red…
danpovey Oct 22, 2022
e0c1dc6
Increase probs of activation balancer and make it decay slower.
danpovey Oct 22, 2022
ad2d3c2
Dont print out full non-finite tensor
danpovey Oct 22, 2022
b7083e7
Increase default max_factor for ActivationBalancer from 0.02 to 0.04;…
danpovey Oct 22, 2022
9e86d1f
reduce initial scale in GradScaler
danpovey Oct 22, 2022
0406d0b
Increase max_abs in ActivationBalancer of conv module from 20 to 50
danpovey Oct 23, 2022
5b9d166
--base-lr0.075->0.5; --lr-epochs 3->3.5
danpovey Oct 23, 2022
40588d3
Revert 179->180 change, i.e. change max_abs for deriv_balancer2 back …
danpovey Oct 23, 2022
09cbc9f
Save some memory in the autograd of DoubleSwish.
danpovey Oct 23, 2022
e586cc3
Change the discretization of the sigmoid to be expectation preserving.
danpovey Oct 23, 2022
d6aa386
Fix randn to rand
danpovey Oct 23, 2022
8565794
Try a more exact way to round to uint8 that should prevent ever wrapp…
danpovey Oct 23, 2022
d3876e3
Make it use float16 if in amp but use clamp to avoid wrapping error
danpovey Oct 23, 2022
95aaa4a
Store only half precision output for softmax.
danpovey Oct 23, 2022
36cb279
More memory efficient backprop for DoubleSwish.
danpovey Oct 25, 2022
1e89841
Change to warmup schedule.
danpovey Oct 25, 2022
9da5526
Changes to more accurately estimate OOM conditions
danpovey Oct 25, 2022
6ebff23
Reduce cutoff from 100 to 5 for estimating OOM with warmup
danpovey Oct 25, 2022
3159b09
Make 20 the limit for warmup_count
danpovey Oct 25, 2022
dbfbd80
Cast to float16 in DoubleSwish forward
danpovey Oct 25, 2022
6a6df19
Hopefully make penalize_abs_values_gt more memory efficient.
danpovey Oct 25, 2022
78f3cba
Add logging about memory used.
danpovey Oct 25, 2022
a0507a8
Change scalar_max in optim.py from 2.0 to 5.0
danpovey Oct 25, 2022
bf37c7c
Regularize how we apply the min and max to the eps of BasicNorm
danpovey Oct 26, 2022
938510a
Fix clamping of bypass scale; remove a couple unused variables.
danpovey Oct 27, 2022
a7fc6ae
Increase floor on bypass_scale from 0.1 to 0.2.
danpovey Oct 27, 2022
2c40011
Increase bypass_scale from 0.2 to 0.4.
danpovey Oct 27, 2022
f8c531c
Increase bypass_scale min from 0.4 to 0.5
danpovey Oct 27, 2022
be5c687
Merging upstream/master
danpovey Oct 27, 2022
3f05e47
Rename conformer.py to zipformer.py
danpovey Oct 27, 2022
5dfa141
Rename Conformer to Zipformer
danpovey Oct 27, 2022
c8abba7
Update decode.py by copying from pruned_transducer_stateless5 and cha…
danpovey Oct 28, 2022
b9f6ba1
Remove some unused variables.
danpovey Oct 28, 2022
7b8a010
Merge branch 'scaled_adam_exp188' into scaled_adam_exp198b
danpovey Oct 28, 2022
a067fe8
Fix clamping of epsilon
danpovey Oct 28, 2022
e592a92
Merge branch 'scaled_adam_exp198b' into scaled_adam_exp202
danpovey Oct 28, 2022
ed1b4d5
Refactor zipformer for more flexibility so we can change number of en…
danpovey Oct 28, 2022
0a89f51
Have a 3rd encoder, at downsampling factor of 8.
danpovey Oct 28, 2022
d7d5188
Refactor how the downsampling is done so that it happens later, but t…
danpovey Oct 28, 2022
de9a6eb
Fix bug RE seq lengths
danpovey Oct 28, 2022
7b57a34
Have 4 encoder stacks
danpovey Oct 28, 2022
96ea4cf
Have 6 different encoder stacks, U-shaped network.
danpovey Oct 28, 2022
435d0de
Reduce dim of linear positional encoding in attention layers.
danpovey Oct 29, 2022
f995426
Reduce min of bypass_scale from 0.5 to 0.3, and make it not applied i…
danpovey Oct 29, 2022
ff03ec8
Tuning change to num encoder layers, inspired by relative param impor…
danpovey Oct 29, 2022
bba454a
Make decoder group size equal to 4.
danpovey Oct 29, 2022
05689f6
Add skip connections as in normal U-net
danpovey Oct 29, 2022
9a7979d
Avoid falling off the loop for weird inputs
danpovey Oct 29, 2022
072776b
Apply layer-skip dropout prob
danpovey Oct 29, 2022
a3561c8
Have warmup schedule for layer-skipping
danpovey Oct 29, 2022
6b6143f
Merge branch 'scaled_adam_exp218' into scaled_adam_exp221
danpovey Oct 30, 2022
8b0722e
Rework how warmup count is produced; should not affect results.
danpovey Oct 30, 2022
e9c69d8
Add warmup schedule for zipformer encoder layer, from 1.0 -> 0.2.
danpovey Oct 30, 2022
e4a22bb
Reduce initial clamp_min for bypass_scale from 1.0 to 0.5.
danpovey Oct 30, 2022
efbb1d2
Restore the changes from scaled_adam_219 and scaled_adam_exp220, acc…
danpovey Oct 30, 2022
b8db0f5
Change to schedule of bypass_scale min: make it larger, decrease slower.
danpovey Oct 31, 2022
730e6c8
Change schedule after initial loss not promising
danpovey Oct 31, 2022
5fda800
Implement pooling module, add it after initial feedforward.
danpovey Oct 31, 2022
3de8a5a
Bug fix
danpovey Oct 31, 2022
12f17f5
Introduce dropout rate to dynamic submodules of conformer.
danpovey Oct 31, 2022
5e51534
Introduce minimum probs in the SimpleCombiner
danpovey Oct 31, 2022
b091ae5
Add bias in weight module
danpovey Oct 31, 2022
b7876ba
Remove dynamic weights in SimpleCombine
danpovey Oct 31, 2022
4da4a3a
Merge branch 'scaled_adam_exp236' into scaled_adam_exp242
danpovey Oct 31, 2022
b806a21
Remove the 5th of 6 encoder stacks
danpovey Oct 31, 2022
e1a87e9
Fix some typos
csukuangfj Nov 6, 2022
366f0bc
small fixes
csukuangfj Nov 6, 2022
ed9d754
small fixes
csukuangfj Nov 7, 2022
f8b231b
Copy files
csukuangfj Nov 7, 2022
21390ea
Update decode.py
csukuangfj Nov 12, 2022
0302ed1
Add changes from the master
csukuangfj Nov 14, 2022
46e4230
Add changes from the master
csukuangfj Nov 14, 2022
71cc8ea
Merge remote-tracking branch 'dan/master' into from-dan-scaled-adam-e…
csukuangfj Nov 14, 2022
0a7be72
update results
csukuangfj Nov 15, 2022
c49ef78
Add CI
csukuangfj Nov 15, 2022
60f4480
Small fixes
csukuangfj Nov 15, 2022
8d0bdd3
Small fixes
csukuangfj Nov 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -33,6 +33,7 @@ popd
log "Export to torchscript model"
./pruned_transducer_stateless7/export.py \
--exp-dir $repo/exp \
--use-averaged-model false \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--epoch 99 \
--avg 1 \
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

cd egs/librispeech/ASR

repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14

log "Downloading pre-trained model from $repo_url"
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)

log "Display test files"
tree $repo/
soxi $repo/test_wavs/*.wav
ls -lh $repo/test_wavs/*.wav

pushd $repo/exp
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/cpu_jit.pt"
git lfs pull --include "exp/pretrained.pt"
ln -s pretrained.pt epoch-99.pt
ls -lh *.pt
popd

log "Decode with models exported by torch.jit.script()"

./pruned_transducer_stateless8/jit_pretrained.py \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--nn-model-filename $repo/exp/cpu_jit.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav

log "Export to torchscript model"
./pruned_transducer_stateless8/export.py \
--exp-dir $repo/exp \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--use-averaged-model false \
--epoch 99 \
--avg 1 \
--jit 1

ls -lh $repo/exp/*.pt

log "Decode with models exported by torch.jit.script()"

./pruned_transducer_stateless8/jit_pretrained.py \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--nn-model-filename $repo/exp/cpu_jit.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav

for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"

./pruned_transducer_stateless8/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

for method in modified_beam_search beam_search fast_beam_search; do
log "$method"

./pruned_transducer_stateless8/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_LABEL_NAME}" == x"run-decode" ]]; then
mkdir -p pruned_transducer_stateless8/exp
ln -s $PWD/$repo/exp/pretrained.pt pruned_transducer_stateless8/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_bpe_500 data/

ls -lh data
ls -lh pruned_transducer_stateless8/exp

log "Decoding test-clean and test-other"

# use a small value for decoding with CPU
max_duration=100

for method in greedy_search fast_beam_search modified_beam_search; do
log "Decoding with $method"

./pruned_transducer_stateless8/decode.py \
--decoding-method $method \
--epoch 999 \
--avg 1 \
--use-averaged-model 0 \
--max-duration $max_duration \
--exp-dir pruned_transducer_stateless8/exp
done

rm pruned_transducer_stateless8/exp/*.pt
fi
155 changes: 155 additions & 0 deletions .github/workflows/run-librispeech-2022-11-14-stateless8.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Copyright 2022 Fangjun Kuang (csukuangfj@gmail.com)

# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: run-librispeech-2022-11-14-stateless8
# zipformer

on:
push:
branches:
- master
pull_request:
types: [labeled]

schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"

jobs:
run_librispeech_2022_11_14_zipformer_stateless8:
if: github.event.label.name == 'ready' || github.event.label.name == 'run-decode' || github.event_name == 'push' || github.event_name == 'schedule'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]

fail-fast: false

steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'

- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf

- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2022-09-25

- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh

- name: Cache LibriSpeech test-clean and test-other datasets
id: libri-test-clean-and-test-other-data
uses: actions/cache@v2
with:
path: |
~/tmp/download
key: cache-libri-test-clean-and-test-other

- name: Download LibriSpeech test-clean and test-other
if: steps.libri-test-clean-and-test-other-data.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/download-librispeech-test-clean-and-test-other-dataset.sh

- name: Prepare manifests for LibriSpeech test-clean and test-other
shell: bash
run: |
.github/scripts/prepare-librispeech-test-clean-and-test-other-manifests.sh

- name: Cache LibriSpeech test-clean and test-other fbank features
id: libri-test-clean-and-test-other-fbank
uses: actions/cache@v2
with:
path: |
~/tmp/fbank-libri
key: cache-libri-fbank-test-clean-and-test-other-v2

- name: Compute fbank for LibriSpeech test-clean and test-other
if: steps.libri-test-clean-and-test-other-fbank.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/compute-fbank-librispeech-test-clean-and-test-other.sh

- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
mkdir -p egs/librispeech/ASR/data
ln -sfv ~/tmp/fbank-libri egs/librispeech/ASR/data/fbank
ls -lh egs/librispeech/ASR/data/*

sudo apt-get -qq install git-lfs tree sox
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH

.github/scripts/run-librispeech-pruned-transducer-stateless8-2022-11-14.sh

- name: Display decoding results for librispeech pruned_transducer_stateless8
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode'
shell: bash
run: |
cd egs/librispeech/ASR/
tree ./pruned_transducer_stateless8/exp

cd pruned_transducer_stateless8
echo "results for pruned_transducer_stateless8"
echo "===greedy search==="
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2

echo "===fast_beam_search==="
find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2

echo "===modified beam search==="
find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2

- name: Upload decoding results for librispeech pruned_transducer_stateless8
uses: actions/upload-artifact@v2
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode'
with:
name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-18.04-cpu-pruned_transducer_stateless8-2022-11-14
path: egs/librispeech/ASR/pruned_transducer_stateless8/exp/
1 change: 1 addition & 0 deletions egs/librispeech/ASR/README.md
Original file line number Diff line number Diff line change
@@ -23,6 +23,7 @@ The following table lists the differences among them.
| `pruned_transducer_stateless5` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + more layers + random combiner|
| `pruned_transducer_stateless6` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + distillation with hubert|
| `pruned_transducer_stateless7` | Zipformer | Embedding + Conv1d | First experiment with Zipformer from Dan|
| `pruned_transducer_stateless8` | Zipformer | Embedding + Conv1d | Same as pruned_transducer_stateless7, but using extra data from GigaSpeech|
| `pruned_stateless_emformer_rnnt2` | Emformer(from torchaudio) | Embedding + Conv1d | Using Emformer from torchaudio for streaming ASR|
| `conv_emformer_transducer_stateless` | ConvEmformer | Embedding + Conv1d | Using ConvEmformer for streaming ASR + mechanisms in reworked model |
| `conv_emformer_transducer_stateless2` | ConvEmformer | Embedding + Conv1d | Using ConvEmformer with simplified memory for streaming ASR + mechanisms in reworked model |
58 changes: 58 additions & 0 deletions egs/librispeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,63 @@
## Results

### pruned_transducer_stateless8 (zipformer + multidataset)

See <https://github.com/k2-fsa/icefall/pull/675> for more details.

[pruned_transducer_stateless8](./pruned_transducer_stateless8)

The tensorboard log can be found at
<https://tensorboard.dev/experiment/y6kAPnN3S3OwvQxQqKQzsQ>

You can find a pretrained model, training logs, decoding logs, and decoding
results at:
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14>

You can use <https://github.com/k2-fsa/sherpa> to deploy it.

Number of model parameters: 70369391, i.e., 70.37 M

| | test-clean | test-other | comment |
|----------------------|------------|-------------|----------------------------------------|
| greedy search | 1.87 | 4.38 | --epoch 16 --avg 2 --max-duration 600 |
| modified beam search | 1.81 | 4.34 | --epoch 16 --avg 2 --max-duration 600 |
| fast beam search | 1.91 | 4.33 | --epoch 16 --avg 2 --max-duration 600 |

The training commands are:
```bash
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

./pruned_transducer_stateless8/train.py \
--world-size 8 \
--num-epochs 20 \
--full-libri 1 \
--use-fp16 1 \
--max-duration 750 \
--exp-dir pruned_transducer_stateless8/exp \
--feedforward-dims "1024,1024,2048,2048,1024" \
--master-port 12535 \
--giga-prob 0.9
```

The decoding commands are:
```bash
for m in greedy_search fast_beam_search modified_beam_search ; do
for epoch in 16; do
for avg in 2; do
./pruned_transducer_stateless8/decode.py \
--epoch $epoch \
--avg $avg \
--use-averaged-model 1 \
--exp-dir ./pruned_transducer_stateless8/exp \
--feedforward-dims "1024,1024,2048,2048,1024" \
--max-duration 600 \
--decoding-method $m
done
done
done
```


### pruned_transducer_stateless7 (zipformer)

See <https://github.com/k2-fsa/icefall/pull/672> for more details.
Original file line number Diff line number Diff line change
@@ -30,6 +30,7 @@

./pruned_transducer_stateless7/jit_pretrained.py \
--nn-model-filename ./pruned_transducer_stateless7/exp/cpu_jit.pt \
--bpe-model ./data/lang_bpe_500/bpe.model \
/path/to/foo.wav \
/path/to/bar.wav
"""
Empty file.
Loading