Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTC recipe to AISHELL-1 #1576

Merged
merged 21 commits into from
Oct 7, 2022
Merged

Conversation

BenoitWang
Copy link
Collaborator

Hi @mravanelli @TParcollet , this PR adds a typical CTC-wav2vec recipe to AISHELL-1.
Test CER: 5.06%
Dev CER: 4.52%

Some points:

  1. chinese-wav2vec2-large (from Tencent) is used which is pretrained on 10k hours Chinese data
  2. bert-base-chinese is used as the tokenizer, ctc is trained on chars
  3. In prepare.py, pandas is not necessary to be used to generate csv, so it is deleted together with some unused variables.

@TParcollet
Copy link
Collaborator

Huge ! Is this comparable to the SOTA around?

@BenoitWang
Copy link
Collaborator Author

Hi @TParcollet , I think it's good for a system pure-CTC/greedy/without LM.
Hybrid models from espnet got better CER:

model Test CER Dev CER LM
our ctc-wav2vec 5.06% 4.52% No
espnet: branchformer-beam10-ctc0.4 4.4% 4.1% No
espnet: conformer-beam20-ctc0.3 4.9% 4.5% No

@TParcollet
Copy link
Collaborator

I see, not bad, but we use extra pre-training while they don't, correct ?

@BenoitWang
Copy link
Collaborator Author

Yes exact. Fair enough, but their branchformer is quite something according to the results.

Copy link
Collaborator

@anautsch anautsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BenoitWang minor details only.

The yaml file combines related hparam files well; as is for the train script.

Is the AISHELL-1 prepare script completely stripped of the extra dependency to pandas for all its recipes?
(not a bad thing to reduce dependencies, although pandas is neat, - just asking - pandas is not in the SB requirements and neither it is explicitly stated fo AISHELL-1, so it's a good catch)

recipes/AISHELL-1/ASR/CTC/README.md Outdated Show resolved Hide resolved
recipes/AISHELL-1/ASR/CTC/train_with_wav2vec.py Outdated Show resolved Hide resolved
@BenoitWang
Copy link
Collaborator Author

Hi @anautsch thanks for the review, the fix is done. And yes that's why I want to reduce pandas, it is only used to generate csv for all the recipes.

@BenoitWang
Copy link
Collaborator Author

Hi @TParcollet @anautsch @Adel-Moumen,

Thank you all for the reviews and tests! The HF link is added, here's a brief summary of the PR:

  1. add a CTC recipe
  2. fix naming problems
  3. fix dynamic batching conflicts for seq2seq & transformer recipes

@anautsch
Copy link
Collaborator

anautsch commented Oct 7, 2022

lgtm.

Tested recipes in --debug mode & the wav2vec2 with ddp.


Side note: we have an internal issue with --debug and eval checkpointing - this comes clear when running this transformer wav2vec2 recipe - here's the relevant log

   asr_brain.evaluate(
  File "speechbrain/core.py", line 1260, in evaluate
    self.on_evaluate_start(max_key=max_key, min_key=min_key)
  File "train_with_wav2vect.py", line 272, in on_evaluate_start
    ckpt = sb.utils.checkpoints.average_checkpoints(
  File "speechbrain/utils/checkpoints.py", line 1174, in average_checkpoints
    return averager(parameter_iterator)
  File "speechbrain/utils/checkpoints.py", line 1080, in average_state_dicts
    raise ValueError("No state dicts to average.")
ValueError: No state dicts to average.

@anautsch anautsch merged commit 39f9f39 into speechbrain:develop Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants