Skip to content

Commit

Permalink
Merge branch 'master' of github.com:kanjieater/AudiobookTextSync
Browse files Browse the repository at this point in the history
  • Loading branch information
kanjieater committed Feb 24, 2023
2 parents bdf1412 + 32851a7 commit 9fa7f5f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ You might see various issues while trying this out in the early state. Here are
## Stages
1. (not pushed yet) Filter down audio to improve future results - slow & probably not heavy cpu or gpu usage. Heavier on cpu
2. split_run & stable-ts: Starts off heavy on CPU & RAM to identify the audio spectrum
3. stable-ts: GPU heavy & requires lots of vRAM depending on the model. This is the part with the long taskbar, where it tries to transcribe a text from the audio. Currently the default is [large-v2](https://github.com/openai/whisper#available-models-and-languages)
3. stable-ts: GPU heavy & requires lots of vRAM depending on the model. This is the part with the long taskbar, where it tries to transcribe a text from the audio. Currently the default is [tiny](https://github.com/openai/whisper#available-models-and-languages). Ironically tiny, does a better job of keeping the phrases short, at the cost of accuracy of transcription, which since we are matching a script, doesn't matter. Also it runs 32x faster than large.
4. Merge vtt's for split subs
5. Split the script
6. match the script to the generated transcription to get good timestamps
Expand Down

0 comments on commit 9fa7f5f

Please sign in to comment.