Make subtitle language selection more powerful #6264
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of your pull request and other information
--sub-langs
should allow selecting one or more subtitle languages of the following types: normal, auto, both, normal if available else auto. Currently--sub-langs
does not support all such combinations (see #2262). For example, current behavior is to merge: a normal sub is selected if present otherwise the auto sub. Let's call this a simple merge. There is currently no way to download both normal and auto sub for the same lang.When
--sub-langs
is not specified, the current default behavior (see my PR #6240) is to select the first matching sub in the following order. Let's call it a special merge.en
en
. This is to handle cases where subs are named asen-US
oren-qlPKC2UN_YU
.en
en
en
subs were found.Note that this does not allow the ability to download only
en
subs. Thus, ifen
subs are missing, it will download an arbitrary (first) non-en lang.This PR makes language selection much more flexible be making the following major changes.
auto-
(e.g.,en
->auto-en
). The main reason is to enable selecting auto subs separately from the normal subs (e.g.,en.*
vsauto-en.*
). A secondary benefit is that auto subs are named differently on disk, which make is easy to distinguish them from normal subs just from the file name (also see handle subs and auto-subs separately #2262 (comment)).--sub-langs
and introduces a new special merging operator#<lang>
to selectively enable merging as per the default behavior described above, with the change that if<lang>
is not found, no other lang will be arbitrarily selected.en/fr
will downloaden
if available, elsefr
if available, else nothingI'm giving a comprehensive set of examples to show the new behavior of
--sub-langs
. The table assumes that subs are written to disk only. See #630 (comment) for corresponding embedding options.--write-subs
--write-auto-subs
--write-subs --write-auto-subs
--write-subs --write-auto-subs --sub-langs 'all'
en
andauto-en
--write-subs --write-auto-subs --sub-langs '#all'
en
if present elseauto-en
, same as existingall
--write-subs --write-auto-subs --sub-langs 'allnorm'
--write-subs --write-auto-subs --sub-langs 'allauto'
--write-subs --write-auto-subs --sub-langs 'en'
en
normal subs--write-subs --write-auto-subs --sub-langs 'auto-en'
en
auto subs--write-subs --write-auto-subs --sub-langs 'en,auto-en'
en
normal and auto subs--write-subs --write-auto-subs --sub-langs 'en/auto-en'
en
normal sub if present else auto subsen
--write-subs --write-auto-subs --sub-langs '#en'
en
sub after special merge--write-subs --write-auto-subs --sub-langs 'en,ja,auto-ja,es'
--write-subs --write-auto-subs --sub-langs 'en.*,auto-en.*'
--write-subs --write-auto-subs --sub-langs 'allnorm,auto-en-orig/auto-en'
en-orig
auto sub if present elseen
auto sub--write-subs --write-auto-subs --sub-langs 'allnorm,auto-.*-orig'
*-orig
auto sub--write-subs --write-auto-subs --sub-langs 'en/auto-en/fr/auto-fr'
fr
will only be selected ifen
is not present--write-subs --write-auto-subs --sub-langs '#en/fr'
The above design was chosen to make the code and options simpler, but it breaks existing behavior. Backward compatibility can be maintained at the cost of more complex code and likely needs 2 operators instead of 1. To summarize the break in behavior:
--write-subs --write-auto-subs
--write-subs --write-auto-subs --sub-langs 'all'
--sub-langs '#all'
for old behavior.--write-subs --write-auto-subs --sub-langs 'en'
en
is selecteden
is selected. Use--sub-langs 'en/orig-en'
for old behavior.TODOs
Let me know if this is an acceptable way to proceed. Once we come to a decision on the main design: renaming auto subs,
#
and/
operators and the general code, I can work on the TODOs.Fixes #2262 #4178
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?