Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast_beam_search_nbest. #420

Merged
merged 10 commits into from
Jun 21, 2022

Conversation

csukuangfj
Copy link
Collaborator

No description provided.

@csukuangfj csukuangfj added ready and removed ready labels Jun 15, 2022
@csukuangfj csukuangfj added ready and removed ready labels Jun 15, 2022
@csukuangfj csukuangfj added ready and removed ready labels Jun 15, 2022
@csukuangfj csukuangfj added ready and removed ready labels Jun 15, 2022

# at this point, nbest.fsa.scores are all zeros.

nbest = nbest.intersect(lattice)
Copy link
Collaborator

@pkufool pkufool Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered there is a line in nbest.intersect using remove_epsilon_and_add_self_loops, it would blow up the GPU memory, so you implemented linear_fsa_with_self_loops. Not sure if linear_fsa_with_self_loops is suitable for nbest.from_lattice, if not, I think we'd better to use a modified version of nbest.intersect here.

[Edit]: I saw you have already modified that line in nbest.from_lattice.

@@ -528,7 +631,7 @@ def main():
model.eval()
model.device = device

if params.decoding_method == "fast_beam_search":
if "fast_beam_search" in params.decoding_method:
decoding_graph = k2.trivial_graph(params.vocab_size - 1, device=device)
else:
decoding_graph = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a flag like use-lg telling that when to load a LG graph, just like pruned_transducer_stateless/decode.py.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Jun 21, 2022

Here are some results for fast_beam_search with LG.

Compared to #277, the WERs are a little better. Part of the reason is perhaps that the baseline is stronger in this PR.

Different from #277, we give a more detailed analysis below.

test-clean insertions deletions substitutions correct comment result file
baseline (trivial graph, no LG) 2.61 150 98 1123 52576 errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-False-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt
LG (log_add) 3.43 210 455 1138 52576 more deletion errors (see below) errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt
LG (max) 5.06 171 1378 1110 52576 significantly more deletion errors(see below) errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-True.txt

baseline

I use the following command to find out the deletion errors at the end of an utterance.

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-False-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt

Its output is

P S PRAY SIR EXCUSE ME FOR WRITING TO YOU A SECOND TIME I COULD NOT HELP WRITING PARTLY TO TELL YOU HOW THANKFUL I AM FOR YOUR KINDNESS AND PARTLY TO LET YOU KNOW THAT YOUR ADVICE SHALL NOT BE WASTED HOWEVER SORROWFULLY AND RELUCTANTLY IT MAY BE AT FIRST FOLLOWED C (B->*)

Note that it has only a single line. You will see more for LG-based decoding below.

LG (log_add)

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-False.txt

The output is

THE TOP FLOOR BELONGS TO MILES MC (LAREN->*)
SINCE LAST THURSDAY I (GHIP GHISIZZLE HAVE BEEN THE LAWFUL BOOLOOROO OF THE BLUE COUNTRY BUT NOW THAT YOU ARE CONQUERED BY QUEEN TROT I SUPPOSE I AM CONQUERED TOO AND YOU HAVE NO BOOLOOROO AT ALL->*)
IT COULD NOT BE USED FOR (ELECTROPLATING OR DEPOSITION NOR COULD IT CHARGE STORAGE BATTERIES ALL OF WHICH ARE EASILY WITHIN THE ABILITY OF THE DIRECT CURRENT->*)
AT ONCE THE GOAT GAVE A LEAP ESCAPED FROM THE SOLDIERS AND WITH BOWED HEAD RUSHED UPON THE (BOOLOOROO->*)
MISTRESS (FITZOOTH HAD BEEN CARRIED OFF BY THE SHERIFF'S DAUGHTER AND HER MAIDS AS SOON AS THEY HAD ENTERED THE HOUSE SO THAT ROBIN ALONE HAD THE CARE OF MONTFICHET->*)
HE GAVE WAY TO THE OTHERS VERY READILY AND RETREATED UNPERCEIVED BY THE SQUIRE AND MISTRESS (FITZOOTH TO THE REAR OF THE TENT->*)
THERE BEFELL AN ANXIOUS INTERVIEW MISTRESS (FITZOOTH ARGUING FOR AND AGAINST THE SQUIRE'S PROJECT IN A BREATH->*)
LET US BEGIN WITH THAT HIS COMMENTARY (ON GALATIANS->*)
MOST OF ALL ROBIN THOUGHT OF HIS FATHER WHAT WOULD HE (COUNSEL->*)
HE KNEW IT WOULD TAKE THEM TO THE HOUSE OF THE CROOKED MAGICIAN WHOM HE HAD NEVER SEEN BUT WHO WAS THEIR NEAREST (NEIGHBOR->*)
(ANGOR PAIN PAINFUL TO HEAR->*)
AT THE HEAD OF THE (PINKIES->PINKS) WERE (GHIP GHISIZZLE AND BUTTON BRIGHT WHO HAD THE PARROT ON HIS SHOULDER AND THEY WERE SUPPORTED BY CAPTAIN CORALIE AND CAPTAIN TINTINT AND ROSALIE THE WITCH->*)
P S PRAY SIR EXCUSE ME FOR WRITING TO YOU A SECOND TIME I COULD NOT HELP WRITING PARTLY TO TELL YOU HOW THANKFUL I AM FOR YOUR KINDNESS AND PARTLY TO LET YOU KNOW THAT YOUR ADVICE SHALL NOT BE WASTED HOWEVER SORROWFULLY AND RELUCTANTLY IT MAY BE AT FIRST FOLLOWED C (B->*)
WHEN THE (BLUESKINS->BLUESKIN) SAW (GHIP GHISIZZLE THEY RAISED ANOTHER GREAT SHOUT FOR HE WAS THE FAVORITE OF THE SOLDIERS AND VERY POPULAR WITH ALL THE PEOPLE->*)
ROBIN (FITZOOTH->*)
DESCEND (O->A) LITTLE CLOUD AND HOVER BEFORE THE EYES OF (THEL->*)
FOR IT IS A SOLID HEAVY HANDSOME DOOR AND MUST ONCE HAVE BEEN IN THE HABIT OF SHUTTING WITH A SONOROUS BANG BEHIND (A->THE) LIVERIED LACKEY WHO HAD JUST SEEN HIS MASTER AND MISTRESS OFF THE GROUNDS IN A CARRIAGE AND (PAIR->*)
SO (GHIP GHISIZZLE ORDERED THE CAPTAIN TO TAKE A FILE OF SOLDIERS AND ESCORT THE RAVING BEAUTIES TO THEIR NEW HOME->*)
(FITZOOTH'S HAND RESTED AT LAST UPON THE TOP RUNG OF A LADDER AND SLOWLY THE TRUTH CAME TO HIM->*)
BE NOT SO FOOLISH FRIEND SAID (FITZOOTH CROSSLY->*)

It outputs 20 lines!

One thing to note is that the above errors are dominated by words like

  • GHIP GHISIZZLE ..... several lines have this pattern
  • FITZOOTH .... several lines have this pattern

Also, LG (log_add) tends to delete contiguous words once it encounters OOV words.
Almost all of the above deletion errors are caused by OOV.

The word table is attached below if you want to check it.
words.txt

LG (max)

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-True.txt | wc -l

It outputs 881.

The first 10 lines are given below:

grep "*)$" errs-test-clean-beam_8.0_max_contexts_8_max_states_64-epoch-99-avg-1-use-LG-True-beam-8.0-max-contexts-8-max-states-64-use-max-True.txt | head
THE GOOD NATURED AUDIENCE IN PITY TO FALLEN MAJESTY SHOWED FOR ONCE GREATER DEFERENCE TO THE KING THAN TO THE MINISTER AND SUNG THE PSALM WHICH THE FORMER HAD CALLED (FOR->*)
SOME OF THE PENAL REGULATIONS WERE COPIED FROM THE EDICTS OF DIOCLETIAN AND THIS METHOD OF CONVERSION WAS APPLAUDED BY THE SAME BISHOPS WHO HAD FELT THE HAND OF OPPRESSION AND PLEADED FOR THE RIGHTS OF (HUMANITY->*)
THERE CERTAINLY WAS NO END TO IT AND EVEN RUTH WAS PHILADELPHIAN ENOUGH TO BELIEVE THAT A STREET OUGHT NOT TO HAVE ANY END OR ARCHITECTURAL POINT UPON WHICH THE WEARY EYE COULD (REST->*)
AND SO THE STORY OF MORMONISM RUNS ON ITS FINALE HAS NOT YET BEEN WRITTEN THE CURRENT PRESS PRESENTS CONTINUOUSLY NEW STAGES OF ITS PROGRESS NEW DEVELOPMENTS OF ITS (PLAN->*)
THE LADIES IN COMPLIANCE WITH THAT SOFTNESS OF HEART WHICH IS THEIR CHARACTERISTIC ARE ON ONE SIDE AND THE MEN BY WHOM THE WORLD HAS TO BE MANAGED (ARE->OR) ON THE (OTHER->*)
(O->OH) VERY WELL SAID GRINGO TURNING AWAY WITH A SHADE OF CONTEMPT YOU'LL FIND IF YOU ARE GOING INTO LITERATURE AND NEWSPAPER WORK THAT YOU CAN'T AFFORD A CONSCIENCE LIKE (THAT->*)
FROM THE SAME MEN NEW REGIMENTS AND NEW COMPANIES WERE FORMED DIFFERENT OFFICERS APPOINTED AND THE WHOLE MILITARY FORCE PUT INTO SUCH HANDS AS THE INDEPENDENTS COULD RELY (ON->*)
FOR IN THE TIMES BEFORE THE GREAT FLOOD ATHENS WAS THE GREATEST AND BEST OF CITIES AND (DID->DEAD) THE NOBLEST DEEDS AND HAD THE BEST CONSTITUTION OF ANY UNDER THE FACE OF (HEAVEN->*)
MISTER NEVERBEND BEGAN THE CAPTAIN AND I (OBSERVED->OBSERVE) THAT UP TO THAT MOMENT HE HAD GENERALLY ADDRESSED ME AS PRESIDENT IT CANNOT BE DENIED THAT WE HAVE COME HERE ON AN UNPLEASANT (MISSION->*)
HOW STRANGE IT SEEMED TO THE SAD WOMAN AS SHE WATCHED THE GROWTH AND THE BEAUTY THAT BECAME EVERY DAY MORE BRILLIANT AND THE INTELLIGENCE THAT THREW ITS QUIVERING SUNSHINE OVER THE TINY FEATURES OF THIS (CHILD->*)

The complete output is given in the following attached file.
LG-max.txt

It tends to have more deletion errors at the end of an utterance.


So I would recommend

  • Using a G that covers as many words as possible so that we have fewer OOV words.
  • Always using log_add.

@csukuangfj
Copy link
Collaborator Author

The decoding command is

./pruned_transducer_stateless/decode.py \
  --epoch 99 \
  --avg 1 \
  --exp-dir pruned_transducer_stateless/exp \
  --max-duration 300 \
  --decoding-method fast_beam_search \
  --use-LG 1 \
  --use-max 0 \
  --beam 8 \
  --max-contexts 8 \
  --max-states 64 \
  --ngram-lm-scale 0.01 \
  --num-paths 200 \
  --nbest-scale 0.5

Where pruned_transducer_stateless/exp/epoch-99.pt is a symlink to
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/blob/main/exp/pretrained.pt

You can refer to https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12 to download it.

@danpovey
Copy link
Collaborator

What exactly is the role of the LG here, at what stage do you apply it?

@csukuangfj
Copy link
Collaborator Author

What exactly is the role of the LG here, at what stage do you apply it?

During RNN-T decoding, instead of using a trivial graph that contains only two states: state 0 and a final state, we use an LG instead. The LG graph is obtained by executing https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/compile_lg.py

@csukuangfj
Copy link
Collaborator Author

LG constrains the search space (paths) of RNN-T decoding.

@danpovey
Copy link
Collaborator

I suspect the issue with the end-of-utterance deletions, is that it has to do with the beam: I suspect that after it encounters something it doesn't expect, only epsilon arcs are within the beam. (I am talking about the log-likelihood beam, of course, not the top-k constraint which should not be called a beam).

@@ -53,6 +53,32 @@
--beam 4 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a floating-point beam, we should probably specify it at 4.0. This might be a bit too low, leading to deletions at end of utterance. You could try 8.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree.

The above results are obtained using the command posted in #420 (comment)

which indeed uses 8 (i.e., 8.0)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if 8.0 is not working well then try a much larger number, like 32.0. We can even make it super large, and just rely on the max-states constraint.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32.0 indeed helps. It reduces the WER to 2.82 from 3.43.

64.0 produces the same results as beam 32.0.

@csukuangfj
Copy link
Collaborator Author

I am talking about the log-likelihood beam, of course, not the top-k constraint which should not be called a beam

Yes, I agree. I am using num_active_paths, not beam_size, in sherpa for modified beam search.

@csukuangfj
Copy link
Collaborator Author

I have tried to use fast_beam_search + LG for pruned_transducer_stateless{,2,3,5} and the results are given below.

It tends to reduce deletion errors with LG decoding. However, other kinds of errors are increased and the WER becomes worse.

pruned_transducer_stateless

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.61
Errors: 150 insertions, 97 deletions, 1124 substitutions, over 52576 reference words (51355 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.82
Errors: 243 insertions, 78 deletions, 1162 substitutions, over 52576 reference words (51336 correct)

pruned_transducer_stateless2

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-38-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.57
Errors: 145 insertions, 102 deletions, 1103 substitutions, over 52576 reference words (51371 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-38-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.84
Errors: 229 insertions, 83 deletions, 1181 substitutions, over 52576 reference words (51312 correct)

pruned_transducer_stateless3

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-12-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.06
Errors: 99 insertions, 111 deletions, 755 substitutions, over 46885 reference words (46019 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-12-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.29
Errors: 202 insertions, 55 deletions, 817 substitutions, over 46885 reference words (46013 correct)

pruned_transducer_stateless5

fast_beam_search$ head -n 2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64.txt
%WER = 2.44
Errors: 136 insertions, 113 deletions, 1033 substitutions, over 52576 reference words (51430 correct)

fast_beam_search_nbest_LG$ head -n2 errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt
%WER = 2.65
Errors: 234 insertions, 70 deletions, 1087 substitutions, over 52576 reference words (51419 correct)

errs-test-clean-beam_20.0_max_contexts_8_max_states_64-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64.txt

errs-test-clean-beam_20.0_max_contexts_8_max_states_64num_paths_200_nbest_scale_0.5_ngram_lm_scale_0.01-epoch-99-avg-1-beam-20.0-max-contexts-8-max-states-64-nbest-scale-0.5-num-paths-200-ngram-lm-scale-0.01.txt

@csukuangfj csukuangfj merged commit dc89b61 into k2-fsa:master Jun 21, 2022
@csukuangfj csukuangfj deleted the fast-beam-search-nbest branch June 21, 2022 16:09
@pkufool pkufool mentioned this pull request Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants