Add `small.en-tdrz` checkpoint and initial support for `speakerturn` in decoding results #4

akashmjn · 2023-03-30T22:15:27Z

This PR adds a support for a new --model small.en-tdrz

This model was finetuned to output <|speakerturn|> tokens.
As a quick hack, it simply replaces the <|startoflm|> token that was unused in the original whisper models how to use <|startoflm|> in whisper openai/whisper#414. This allows keeping the model structure exactly the same.
Small changes are made to tokenizer, transcribe and decoding so that this special token is not suppressed during decoding.
A follow up PR will control this via a flag (for now these tokens are always decoded), and force sampling of a timestamp token after a speaker turn token.

akashmjn added 4 commits March 30, 2023 14:50

tmp commit hacking in spkturn decode support

d3844a2

rename sot_lm -> speaker_turn

3ab9f69

add pretrained small.en-tdrz checkpoint

d48293a

update readme with run info

281cdca

akashmjn merged commit 7cfca7e into main Mar 30, 2023

akashmjn mentioned this pull request May 27, 2023

whisper : mark speakers/voices (diarization) ggerganov/whisper.cpp#64

Open

akashmjn mentioned this pull request Aug 16, 2023

HF Transformers Weights #15

Open

Provide feedback