Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add small.en-tdrz checkpoint and initial support for speakerturn in decoding results #4

Merged
merged 4 commits into from
Mar 30, 2023

Conversation

akashmjn
Copy link
Owner

@akashmjn akashmjn commented Mar 30, 2023

This PR adds a support for a new --model small.en-tdrz

  • This model was finetuned to output <|speakerturn|> tokens.
  • As a quick hack, it simply replaces the <|startoflm|> token that was unused in the original whisper models how to use <|startoflm|> in whisper openai/whisper#414. This allows keeping the model structure exactly the same.
  • Small changes are made to tokenizer, transcribe and decoding so that this special token is not suppressed during decoding.
  • A follow up PR will control this via a flag (for now these tokens are always decoded), and force sampling of a timestamp token after a speaker turn token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant