-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow generating subtitles in the background #6247
Conversation
@@ -4,5 +4,5 @@ | |||
|
|||
# Select STT engine. | |||
# Available engines: vosk, whisper, whispercpp | |||
# Default: (enginetype=whisper) | |||
#SpeechToTextEngine.target=(enginetype=whisper) | |||
# Default: (enginetype=whispercpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we provide whisper.cpp in the Opencast repositories while not providing the other tools, I think it makes sense to make it the default.
# Default: whisper | ||
#whispercpp.root.path=whispercpp | ||
# Configuration for setting a custom path to the Whisper.cpp command line tool | ||
# Default: whisper.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I install Whisper.cpp from any of the Opencast repositories, the binary is actually named whisper.cpp
. So, let's make that the actual default.
@@ -192,33 +201,29 @@ public WorkflowOperationResult start(WorkflowInstance workflowInstance, JobConte | |||
|
|||
// Use the selection strategy from the workflow config to get the tracks we want to transcribe | |||
List<Track> tracksToTranscribe = filterTracksByStrategy(tracksWithAudio, trackSelectionStrategy); | |||
if (tracksToTranscribe.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no tracks we can skip right here. No need to run through everything else and then still skip at the end.
|
||
// Load the 'limit-to-one' configuration from the workflow operation. | ||
// This configuration sets the limit of generated subtitle files to one | ||
boolean limitToOne = BooleanUtils.toBoolean(workflowInstance.getCurrentOperation().getConfiguration(LIMIT_TO_ONE)); | ||
if (limitToOne) { | ||
tracksToTranscribe = List.of(tracksToTranscribe.get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replaces the very complex block below. This actually does the same since createSubtitle(…)
always returned true
, meaning it would always stop after the first track had been processed. That means we can just reduce the number of tracks to one right here.
ConfiguredTagsAndFlavors tagsAndFlavors, AppendSubtitleAs appendSubtitleAs, Boolean translate) | ||
throws WorkflowOperationException { | ||
|
||
// Start the transcription job, create subtitles file | ||
URI trackURI = track.getURI(); | ||
|
||
if (!track.hasAudio()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually checked that before. It's one of the track filters.
"Speech-to-Text job for media package '%s' failed, because of wrong workflow configuration. " | ||
+ "track-selection-strategy of type '%s' does not exist.", mediaPackage, strategyCfg)); | ||
} | ||
return TrackSelectionStrategy.EVERYTHING; // "transcribe everything" is the default/fallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just return if we already know the result. No need to continue. This makes the structure simpler.
} | ||
} | ||
return translateMode; | ||
return BooleanUtils.toBoolean(StringUtils.trimToEmpty(operation.getConfiguration(TRANSLATE_MODE))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one line basically replaces the whole function :D
We don't have to re-invent parsing booleans.
.../org/opencastproject/workflow/handler/speechtotext/SpeechToTextWorkflowOperationHandler.java
Outdated
Show resolved
Hide resolved
1138200
to
53c86c3
Compare
FYI: Branch cut for OC 17 is on November 6! |
This pull request has conflicts ☹ |
This patch allows generating subtitles in the backgrould, starting the jobs first and attaching the result later in the workflow. Unfortunately, looking into the code, I also found a lot of minor issues which I fixed as well, but which were hard to separate from this patch: - `language-fallback` didn't work - Defaults for generator enging and Whisper.cpp binary fixed - Defaults for several operations - Simplified code
53c86c3
to
6da9996
Compare
Fixed merge conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks way better as before. I couldn't find anything wrong with it.
I tested the following things (which worked):
- async on
- case 1: the transcription is faster then the attach operation
- case 2: the transcription is slower then the workflow reaching the attach operation (in this case, the process waits for the transcription job to finish)
- async off
- everything works as before
- limit to one
- set to false and async to true -> multiple speech-to-text-job IDs stored in workflow variables, everything works as expected
- set to false and async to false -> works
- set to true with async to true -> works
- set to true with async to false -> works
- attach operation:
- tags and flavors
The following things didn't work:
- language fallback
- removes the possibility to let whisper auto detect the language
- attach operation
- I don't know if this should be labeled as "not working", but the defined target-tags of the attach-operation are overriding the tags from the speechtotext operation. Don't know what a good solution would look here: Maybe we should add the attach-operation tags to the speechtotext-operation ones? Maybe we let it that way but add a note to the Docs?
- metadata: Wrong language codes. Metadata provides 3 digits and whisper expects 2 digit codes i guess
- metadata provides "deu" for example.. (but whisper works with language code "de"). Thats not like a Bug that was introduced. We could create another Issue for this or maybe this can be fixed easily?
.../org/opencastproject/workflow/handler/speechtotext/SpeechToTextWorkflowOperationHandler.java
Outdated
Show resolved
Hide resolved
docs/guides/admin/docs/workflowoperationhandlers/speechtotext-attach-woh.md
Show resolved
Hide resolved
That may be why someone broke the fallback before. But I'll change it so that this value has no default. That way, we can skip it if not present.
I think this is a misconception. Target tags apply to media package elements. Since the
That I didn't touch and won't fix in here. If I remember correctly, the Vosk code does some mapping internally. We could do something similary if we want to? But that's a separate issue. |
The language issue was fixed for |
Since some TTS engines can auto-detect languages, we don't always want to have a fallback language. That is why a default value of `en` doesn't make sense. This patch removes the default, truly making this an optional argument.
This patch clearly documents that setting target tags in the `specchtotext` operation has no effect if the transcription is run asynchronously.
Pull request has been updated.
|
The changes are looking good. I tested the fallback again -> It works as expected. The PR can be merged if you like. |
I created an Issue for the language Code problem: #6293 |
This patch allows generating subtitles in the backgrould, starting the jobs first and attaching the result later in the workflow.
Unfortunately, looking into the code, I also found a lot of minor issues which I fixed as well, but which were hard to separate from this patch:
language-fallback
didn't workHow to test this patch
etc/org.opencastproject.speechtotext.impl.engine.WhisperCppEngine.cfg
. Likely you just have to configure something like:whispercpp.root.path=/home/lars/dev/whisper.cpp/main
whispercpp.model=/home/lars/dev/whisper.cpp/models/ggml-tiny.bin
fast.yaml
workflow:https://github.com/user-attachments/assets/932ef91b-580a-4d11-9717-011b7bbdb142
Your pull request should…