-
-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle subs and auto-subs separately #2262
Comments
Right. You'll have to run yt-dlp twice, once with |
thanks for the feedback. If you don't mind, I will file a feature request, because
|
I have changed this to a feature request. But most people would expect There are also some other issues with the current subtitle selection options:
If possible, I want to have a single solution to all these issues. I was thinking of allowing |
I updated the OP. Maybe this is the kind of "single solution" you are looking for ?
Couldn't it be handled the same way? So 2 subs have the same language code, both are downloaded like this:
this is also handled by the proposal in my OP |
probably using |
another simplest way without breaking compatibility is ignoring all auto-translated subtitles (since they can't be translated right at all) unless users explicitly choose them. |
see |
Cool. don't know that option since hasn't seen that option in the help. the main issue is with |
In practice, users should always prefer 'auto-generated closed captions + good translated their language subtitles' by default. and the former one can provide like I guess nobody wants auto-translated subtitles because of inaccurate or completely wrong content. so skip=translated_subs could probably be independent with |
I agree with this. I think it's better if I think |
Would be great if there was some way of differentiating automatically generated subtitles from manually created ones in the final file. |
Just another +1 on this. Would be nice if there was a way to make it automatically do the right thing. Even better if it did the right thing by default. ... and by "right thing", I mean embedding every subtitle plus the original language's automatic caption(s). For the automatic captions, it'd also be nice if it could convert json3 captions to a format supported by ffmpeg, so the highest-quality subs would be used. That's a separate issue though. |
Chipping in: I'd like to set precedence by making sure the "orig" subs get downloaded first, and when they're missing then go for auto-subs. Is it how it works? Will the options as set out below, act as I've described?
Is there a difference between en-orig and en auto-generated? How do I identify manually written subs? |
Answer: No, all subs you selected will be downloaded at once. for users' view, Youtube has 2 types of subs: 1: auto-generated + uploader uploaded; 2: auto-translated. so if you want to achieve that, you need to download them in two runs. I think it's better to change to the users' view, then it also can easily filter auto-generated out if the uploader uploaded subs only |
Quite often, the auto-generated subtitles on youtube are better than the manually-uploaded ones. It depends on the channel though, and how much effort they put into subtitles. So it'd be nice if yt-dlp would grab and embed both in a single run. |
@pukkandan : Have you seen the table in my OP ? |
This comment was marked as off-topic.
This comment was marked as off-topic.
Can I receive clarification on what the API for this appears like on the python scripting side? I would like to default to downloading all manually uploaded subtitles, then allow them to download ai generated, both, or none (doing that atm). Where is this accessed in the python API? |
@skyler14 you can use e.g. you can run ydl_opts = {
'writesubtitles': True,
'writeautomaticsub': True,
'subtitleslangs': ['all', '-live_chat'],
} which is how to download all manually uploaded subs and fallback to all automatic captions. To download both types of subs, you would need to run two separate instances of yt-dlp, but you could extract the video info only once and reuse it in the 2nd invocation: import yt_dlp
URL = 'test:youtube'
manual_subs_opts = {
'writesubtitles': True,
'subtitleslangs': ['all', '-live_chat'],
}
with yt_dlp.YoutubeDL(manual_subs_opts) as ydl:
info = ydl.extract_info(URL)
auto_subs_opts = {
'writeautomaticsub': True,
'subtitleslangs': ['all', '-live_chat'],
'skip_download': True,
}
with yt_dlp.YoutubeDL(auto_subs_opts) as ydl:
ydl.process_ie_result(info) |
problem
Currently
--sub-langs all
users cannot download youtube auto-subs (in the same run or they end up with 100+ autosubs plus several hundred combinations of autosub-translations like "yo-ro | Yoruba from Romanian" etc.).proposal
It would be helpful to control subs and auto-subs independently:
This is highly intuitive and makes the parameters of the different multimedia files consistent with each other.
It also solves the problem below elegantly:
--sub-langs all --auto-sub-langs en
use case
To download all subs + English auto-subs without running yt-dlp twice.
What parameters do I have to use to download ALL subtitles but only English automatic subtitles (instead of the myriad of automatic subtitles in all existing languages on youtube) ?
--sub-langs
seems to control both--write-subs
and--write-auto-subs
whereas I would need something like
--sub-langs all --auto-sub-langs en.*
which doesn't exist, right ?related
--embed-auto-subs
#826youtube:skip=translated_subs
doesn't work for some videos. #3875translated_subs
andautomatic_captions
in codes. #6443-o "auto-sub:"
#6833The text was updated successfully, but these errors were encountered: