-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fast_beam_search_nbest. #420
Conversation
|
||
# at this point, nbest.fsa.scores are all zeros. | ||
|
||
nbest = nbest.intersect(lattice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered there is a line in nbest.intersect
using remove_epsilon_and_add_self_loops
, it would blow up the GPU memory, so you implemented linear_fsa_with_self_loops
. Not sure if linear_fsa_with_self_loops
is suitable for nbest.from_lattice
, if not, I think we'd better to use a modified version of nbest.intersect
here.
[Edit]: I saw you have already modified that line in nbest.from_lattice
.
@@ -528,7 +631,7 @@ def main(): | |||
model.eval() | |||
model.device = device | |||
|
|||
if params.decoding_method == "fast_beam_search": | |||
if "fast_beam_search" in params.decoding_method: | |||
decoding_graph = k2.trivial_graph(params.vocab_size - 1, device=device) | |||
else: | |||
decoding_graph = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a flag like use-lg
telling that when to load a LG graph, just like pruned_transducer_stateless/decode.py
.
Here are some results for Compared to #277, the WERs are a little better. Part of the reason is perhaps that the baseline is stronger in this PR. Different from #277, we give a more detailed analysis below.
baselineI use the following command to find out the deletion errors at the end of an utterance. grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-False-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt Its output is
Note that it has only a single line. You will see more for LG-based decoding below. LG (log_add)
The output is
It outputs 20 lines! One thing to note is that the above errors are dominated by words like
Also, LG (log_add) tends to delete contiguous words once it encounters OOV words. The word table is attached below if you want to check it. LG (max)
It outputs 881. The first 10 lines are given below:
The complete output is given in the following attached file. It tends to have more deletion errors at the end of an utterance. So I would recommend
|
The decoding command is
Where You can refer to https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12 to download it. |
What exactly is the role of the LG here, at what stage do you apply it? |
During RNN-T decoding, instead of using a trivial graph that contains only two states: state 0 and a final state, we use an LG instead. The LG graph is obtained by executing https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/compile_lg.py |
LG constrains the search space (paths) of RNN-T decoding. |
I suspect the issue with the end-of-utterance deletions, is that it has to do with the beam: I suspect that after it encounters something it doesn't expect, only epsilon arcs are within the beam. (I am talking about the log-likelihood beam, of course, not the top-k constraint which should not be called a beam). |
@@ -53,6 +53,32 @@ | |||
--beam 4 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a floating-point beam, we should probably specify it at 4.0. This might be a bit too low, leading to deletions at end of utterance. You could try 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree.
The above results are obtained using the command posted in #420 (comment)
which indeed uses 8 (i.e., 8.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if 8.0 is not working well then try a much larger number, like 32.0. We can even make it super large, and just rely on the max-states constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32.0 indeed helps. It reduces the WER to 2.82 from 3.43.
64.0 produces the same results as beam 32.0.
Yes, I agree. I am using |
I have tried to use fast_beam_search + LG for pruned_transducer_stateless{,2,3,5} and the results are given below. It tends to reduce deletion errors with LG decoding. However, other kinds of errors are increased and the WER becomes worse. pruned_transducer_stateless
pruned_transducer_stateless2
pruned_transducer_stateless3
pruned_transducer_stateless5
|
No description provided.