Add fast_beam_search_nbest. #420

csukuangfj · 2022-06-14T09:18:41Z

No description provided.

pkufool · 2022-06-15T11:30:06Z

egs/librispeech/ASR/pruned_transducer_stateless/beam_search.py

+
+    # at this point, nbest.fsa.scores are all zeros.
+
+    nbest = nbest.intersect(lattice)


I remembered there is a line in nbest.intersect using remove_epsilon_and_add_self_loops, it would blow up the GPU memory, so you implemented linear_fsa_with_self_loops. Not sure if linear_fsa_with_self_loops is suitable for nbest.from_lattice, if not, I think we'd better to use a modified version of nbest.intersect here.

[Edit]: I saw you have already modified that line in nbest.from_lattice.

pkufool · 2022-06-15T11:35:30Z

egs/librispeech/ASR/pruned_transducer_stateless2/decode.py

@@ -528,7 +631,7 @@ def main():
    model.eval()
    model.device = device

-    if params.decoding_method == "fast_beam_search":
+    if "fast_beam_search" in params.decoding_method:
        decoding_graph = k2.trivial_graph(params.vocab_size - 1, device=device)
    else:
        decoding_graph = None


I think we should add a flag like use-lg telling that when to load a LG graph, just like pruned_transducer_stateless/decode.py.

csukuangfj · 2022-06-21T07:05:17Z

Here are some results for fast_beam_search with LG.

Compared to #277, the WERs are a little better. Part of the reason is perhaps that the baseline is stronger in this PR.

Different from #277, we give a more detailed analysis below.

	test-clean	insertions	deletions	substitutions	correct	comment	result file
baseline (trivial graph, no LG)	2.61	150	98	1123	52576		errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-False-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt
LG (log_add)	3.43	210	455	1138	52576	more deletion errors (see below)	errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt
LG (max)	5.06	171	1378	1110	52576	significantly more deletion errors(see below)	errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-True.txt

baseline

I use the following command to find out the deletion errors at the end of an utterance.

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-False-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt

Its output is

P S PRAY SIR EXCUSE ME FOR WRITING TO YOU A SECOND TIME I COULD NOT HELP WRITING PARTLY TO TELL YOU HOW THANKFUL I AM FOR YOUR KINDNESS AND PARTLY TO LET YOU KNOW THAT YOUR ADVICE SHALL NOT BE WASTED HOWEVER SORROWFULLY AND RELUCTANTLY IT MAY BE AT FIRST FOLLOWED C (B->*)

Note that it has only a single line. You will see more for LG-based decoding below.

LG (log_add)

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt

The output is

THE TOP FLOOR BELONGS TO MILES MC (LAREN->*)
SINCE LAST THURSDAY I (GHIP GHISIZZLE HAVE BEEN THE LAWFUL BOOLOOROO OF THE BLUE COUNTRY BUT NOW THAT YOU ARE CONQUERED BY QUEEN TROT I SUPPOSE I AM CONQUERED TOO AND YOU HAVE NO BOOLOOROO AT ALL->*)
IT COULD NOT BE USED FOR (ELECTROPLATING OR DEPOSITION NOR COULD IT CHARGE STORAGE BATTERIES ALL OF WHICH ARE EASILY WITHIN THE ABILITY OF THE DIRECT CURRENT->*)
AT ONCE THE GOAT GAVE A LEAP ESCAPED FROM THE SOLDIERS AND WITH BOWED HEAD RUSHED UPON THE (BOOLOOROO->*)
MISTRESS (FITZOOTH HAD BEEN CARRIED OFF BY THE SHERIFF'S DAUGHTER AND HER MAIDS AS SOON AS THEY HAD ENTERED THE HOUSE SO THAT ROBIN ALONE HAD THE CARE OF MONTFICHET->*)
HE GAVE WAY TO THE OTHERS VERY READILY AND RETREATED UNPERCEIVED BY THE SQUIRE AND MISTRESS (FITZOOTH TO THE REAR OF THE TENT->*)
THERE BEFELL AN ANXIOUS INTERVIEW MISTRESS (FITZOOTH ARGUING FOR AND AGAINST THE SQUIRE'S PROJECT IN A BREATH->*)
LET US BEGIN WITH THAT HIS COMMENTARY (ON GALATIANS->*)
MOST OF ALL ROBIN THOUGHT OF HIS FATHER WHAT WOULD HE (COUNSEL->*)
HE KNEW IT WOULD TAKE THEM TO THE HOUSE OF THE CROOKED MAGICIAN WHOM HE HAD NEVER SEEN BUT WHO WAS THEIR NEAREST (NEIGHBOR->*)
(ANGOR PAIN PAINFUL TO HEAR->*)
AT THE HEAD OF THE (PINKIES->PINKS) WERE (GHIP GHISIZZLE AND BUTTON BRIGHT WHO HAD THE PARROT ON HIS SHOULDER AND THEY WERE SUPPORTED BY CAPTAIN CORALIE AND CAPTAIN TINTINT AND ROSALIE THE WITCH->*)
P S PRAY SIR EXCUSE ME FOR WRITING TO YOU A SECOND TIME I COULD NOT HELP WRITING PARTLY TO TELL YOU HOW THANKFUL I AM FOR YOUR KINDNESS AND PARTLY TO LET YOU KNOW THAT YOUR ADVICE SHALL NOT BE WASTED HOWEVER SORROWFULLY AND RELUCTANTLY IT MAY BE AT FIRST FOLLOWED C (B->*)
WHEN THE (BLUESKINS->BLUESKIN) SAW (GHIP GHISIZZLE THEY RAISED ANOTHER GREAT SHOUT FOR HE WAS THE FAVORITE OF THE SOLDIERS AND VERY POPULAR WITH ALL THE PEOPLE->*)
ROBIN (FITZOOTH->*)
DESCEND (O->A) LITTLE CLOUD AND HOVER BEFORE THE EYES OF (THEL->*)
FOR IT IS A SOLID HEAVY HANDSOME DOOR AND MUST ONCE HAVE BEEN IN THE HABIT OF SHUTTING WITH A SONOROUS BANG BEHIND (A->THE) LIVERIED LACKEY WHO HAD JUST SEEN HIS MASTER AND MISTRESS OFF THE GROUNDS IN A CARRIAGE AND (PAIR->*)
SO (GHIP GHISIZZLE ORDERED THE CAPTAIN TO TAKE A FILE OF SOLDIERS AND ESCORT THE RAVING BEAUTIES TO THEIR NEW HOME->*)
(FITZOOTH'S HAND RESTED AT LAST UPON THE TOP RUNG OF A LADDER AND SLOWLY THE TRUTH CAME TO HIM->*)
BE NOT SO FOOLISH FRIEND SAID (FITZOOTH CROSSLY->*)

It outputs 20 lines!

One thing to note is that the above errors are dominated by words like

GHIP GHISIZZLE ..... several lines have this pattern
FITZOOTH .... several lines have this pattern

Also, LG (log_add) tends to delete contiguous words once it encounters OOV words.
Almost all of the above deletion errors are caused by OOV.

The word table is attached below if you want to check it.
words.txt

LG (max)

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-True.txt | wc -l

It outputs 881.

The first 10 lines are given below:

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-True.txt | head

THE GOOD NATURED AUDIENCE IN PITY TO FALLEN MAJESTY SHOWED FOR ONCE GREATER DEFERENCE TO THE KING THAN TO THE MINISTER AND SUNG THE PSALM WHICH THE FORMER HAD CALLED (FOR->*)
SOME OF THE PENAL REGULATIONS WERE COPIED FROM THE EDICTS OF DIOCLETIAN AND THIS METHOD OF CONVERSION WAS APPLAUDED BY THE SAME BISHOPS WHO HAD FELT THE HAND OF OPPRESSION AND PLEADED FOR THE RIGHTS OF (HUMANITY->*)
THERE CERTAINLY WAS NO END TO IT AND EVEN RUTH WAS PHILADELPHIAN ENOUGH TO BELIEVE THAT A STREET OUGHT NOT TO HAVE ANY END OR ARCHITECTURAL POINT UPON WHICH THE WEARY EYE COULD (REST->*)
AND SO THE STORY OF MORMONISM RUNS ON ITS FINALE HAS NOT YET BEEN WRITTEN THE CURRENT PRESS PRESENTS CONTINUOUSLY NEW STAGES OF ITS PROGRESS NEW DEVELOPMENTS OF ITS (PLAN->*)
THE LADIES IN COMPLIANCE WITH THAT SOFTNESS OF HEART WHICH IS THEIR CHARACTERISTIC ARE ON ONE SIDE AND THE MEN BY WHOM THE WORLD HAS TO BE MANAGED (ARE->OR) ON THE (OTHER->*)
(O->OH) VERY WELL SAID GRINGO TURNING AWAY WITH A SHADE OF CONTEMPT YOU'LL FIND IF YOU ARE GOING INTO LITERATURE AND NEWSPAPER WORK THAT YOU CAN'T AFFORD A CONSCIENCE LIKE (THAT->*)
FROM THE SAME MEN NEW REGIMENTS AND NEW COMPANIES WERE FORMED DIFFERENT OFFICERS APPOINTED AND THE WHOLE MILITARY FORCE PUT INTO SUCH HANDS AS THE INDEPENDENTS COULD RELY (ON->*)
FOR IN THE TIMES BEFORE THE GREAT FLOOD ATHENS WAS THE GREATEST AND BEST OF CITIES AND (DID->DEAD) THE NOBLEST DEEDS AND HAD THE BEST CONSTITUTION OF ANY UNDER THE FACE OF (HEAVEN->*)
MISTER NEVERBEND BEGAN THE CAPTAIN AND I (OBSERVED->OBSERVE) THAT UP TO THAT MOMENT HE HAD GENERALLY ADDRESSED ME AS PRESIDENT IT CANNOT BE DENIED THAT WE HAVE COME HERE ON AN UNPLEASANT (MISSION->*)
HOW STRANGE IT SEEMED TO THE SAD WOMAN AS SHE WATCHED THE GROWTH AND THE BEAUTY THAT BECAME EVERY DAY MORE BRILLIANT AND THE INTELLIGENCE THAT THREW ITS QUIVERING SUNSHINE OVER THE TINY FEATURES OF THIS (CHILD->*)

The complete output is given in the following attached file.
LG-max.txt

It tends to have more deletion errors at the end of an utterance.

So I would recommend

Using a G that covers as many words as possible so that we have fewer OOV words.
Always using log_add.

csukuangfj · 2022-06-21T07:11:38Z

The decoding command is

./pruned_transducer_stateless/decode.py \
  --epoch 99 \
  --avg 1 \
  --exp-dir pruned_transducer_stateless/exp \
  --max-duration 300 \
  --decoding-method fast_beam_search \
  --use-LG 1 \
  --use-max 0 \
  --beam 8 \
  --max-contexts 8 \
  --max-states 64 \
  --ngram-lm-scale 0.01 \
  --num-paths 200 \
  --nbest-scale 0.5

Where pruned_transducer_stateless/exp/epoch-99.pt is a symlink to
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/blob/main/exp/pretrained.pt

You can refer to https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12 to download it.

danpovey · 2022-06-21T07:12:12Z

What exactly is the role of the LG here, at what stage do you apply it?

csukuangfj · 2022-06-21T07:13:45Z

What exactly is the role of the LG here, at what stage do you apply it?

During RNN-T decoding, instead of using a trivial graph that contains only two states: state 0 and a final state, we use an LG instead. The LG graph is obtained by executing https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/compile_lg.py

csukuangfj · 2022-06-21T07:14:31Z

LG constrains the search space (paths) of RNN-T decoding.

danpovey · 2022-06-21T07:16:22Z

I suspect the issue with the end-of-utterance deletions, is that it has to do with the beam: I suspect that after it encounters something it doesn't expect, only epsilon arcs are within the beam. (I am talking about the log-likelihood beam, of course, not the top-k constraint which should not be called a beam).

danpovey · 2022-06-21T07:18:19Z

egs/librispeech/ASR/pruned_transducer_stateless2/decode.py

@@ -53,6 +53,32 @@
    --beam 4 \


if this is a floating-point beam, we should probably specify it at 4.0. This might be a bit too low, leading to deletions at end of utterance. You could try 8.

Yes, I agree.

The above results are obtained using the command posted in #420 (comment)

which indeed uses 8 (i.e., 8.0)

OK, if 8.0 is not working well then try a much larger number, like 32.0. We can even make it super large, and just rely on the max-states constraint.

32.0 indeed helps. It reduces the WER to 2.82 from 3.43.

64.0 produces the same results as beam 32.0.

csukuangfj · 2022-06-21T07:55:55Z

I am talking about the log-likelihood beam, of course, not the top-k constraint which should not be called a beam

Yes, I agree. I am using num_active_paths, not beam_size, in sherpa for modified beam search.

csukuangfj · 2022-06-21T16:04:02Z

I have tried to use fast_beam_search + LG for pruned_transducer_stateless{,2,3,5} and the results are given below.

It tends to reduce deletion errors with LG decoding. However, other kinds of errors are increased and the WER becomes worse.

pruned_transducer_stateless

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.61
Errors: 150 insertions, 97 deletions, 1124 substitutions, over 52576 reference words (51355 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.82
Errors: 243 insertions, 78 deletions, 1162 substitutions, over 52576 reference words (51336 correct)

pruned_transducer_stateless2

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-38-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.57
Errors: 145 insertions, 102 deletions, 1103 substitutions, over 52576 reference words (51371 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-38-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.84
Errors: 229 insertions, 83 deletions, 1181 substitutions, over 52576 reference words (51312 correct)

pruned_transducer_stateless3

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-12-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.06
Errors: 99 insertions, 111 deletions, 755 substitutions, over 46885 reference words (46019 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-12-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.29
Errors: 202 insertions, 55 deletions, 817 substitutions, over 46885 reference words (46013 correct)

pruned_transducer_stateless5

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.44
Errors: 136 insertions, 113 deletions, 1033 substitutions, over 52576 reference words (51430 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.65
Errors: 234 insertions, 70 deletions, 1087 substitutions, over 52576 reference words (51419 correct)

errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64.txt

errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt

Add fast_beam_search_nbest.

1bf2e17

csukuangfj added ready run-decode labels Jun 14, 2022

Fix CI errors.

df6ab85

csukuangfj added ready and removed ready labels Jun 15, 2022

Fix CI errors.

d3af7ee

csukuangfj added ready and removed ready labels Jun 15, 2022

More fixes.

46430a5

csukuangfj added ready and removed ready labels Jun 15, 2022

Small fixes.

feb9fd0

csukuangfj added ready and removed ready labels Jun 15, 2022

pkufool reviewed Jun 15, 2022

View reviewed changes

Support using log_add in LG decoding with fast_beam_search.

ec9e4cf

danpovey reviewed Jun 21, 2022

View reviewed changes

csukuangfj added 3 commits June 21, 2022 21:31

Support LG decoding in pruned_transducer_stateless

136ee53

Support LG for pruned_transducer_stateless2.

f5af662

Support LG for fast beam search.

284cbf7

Minor fixes.

e032b03

csukuangfj merged commit dc89b61 into k2-fsa:master Jun 21, 2022

csukuangfj deleted the fast-beam-search-nbest branch June 21, 2022 16:09

csukuangfj mentioned this pull request Jul 2, 2022

Espnet2 transducer v2 espnet/espnet#4032

Closed

2 tasks

pkufool mentioned this pull request Oct 17, 2022

Add fast_beam_search_LG #622

Merged

desh2608 mentioned this pull request Jul 31, 2023

fast_beam_search_LG outputs empty for sentence with oov words #1151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast_beam_search_nbest. #420

Add fast_beam_search_nbest. #420

csukuangfj commented Jun 14, 2022

pkufool Jun 15, 2022 •

edited

Loading

pkufool Jun 15, 2022

csukuangfj commented Jun 21, 2022 •

edited

Loading

csukuangfj commented Jun 21, 2022

danpovey commented Jun 21, 2022

csukuangfj commented Jun 21, 2022

csukuangfj commented Jun 21, 2022

danpovey commented Jun 21, 2022

danpovey Jun 21, 2022

csukuangfj Jun 21, 2022

danpovey Jun 21, 2022

csukuangfj Jun 21, 2022

csukuangfj commented Jun 21, 2022

csukuangfj commented Jun 21, 2022


		# at this point, nbest.fsa.scores are all zeros.

		nbest = nbest.intersect(lattice)

Add fast_beam_search_nbest. #420

Add fast_beam_search_nbest. #420

Conversation

csukuangfj commented Jun 14, 2022

pkufool Jun 15, 2022 • edited Loading

Choose a reason for hiding this comment

pkufool Jun 15, 2022

Choose a reason for hiding this comment

csukuangfj commented Jun 21, 2022 • edited Loading

baseline

LG (log_add)

LG (max)

csukuangfj commented Jun 21, 2022

danpovey commented Jun 21, 2022

csukuangfj commented Jun 21, 2022

csukuangfj commented Jun 21, 2022

danpovey commented Jun 21, 2022

danpovey Jun 21, 2022

Choose a reason for hiding this comment

csukuangfj Jun 21, 2022

Choose a reason for hiding this comment

danpovey Jun 21, 2022

Choose a reason for hiding this comment

csukuangfj Jun 21, 2022

Choose a reason for hiding this comment

csukuangfj commented Jun 21, 2022

csukuangfj commented Jun 21, 2022

pruned_transducer_stateless

pruned_transducer_stateless2

pruned_transducer_stateless3

pruned_transducer_stateless5

pkufool Jun 15, 2022 •

edited

Loading

csukuangfj commented Jun 21, 2022 •

edited

Loading