Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle subs and auto-subs separately #2262

Open
chrizilla opened this issue Jan 8, 2022 · 19 comments
Open

handle subs and auto-subs separately #2262

chrizilla opened this issue Jan 8, 2022 · 19 comments
Labels
enhancement New feature or request

Comments

@chrizilla
Copy link

chrizilla commented Jan 8, 2022

problem

Currently --sub-langs all users cannot download youtube auto-subs (in the same run or they end up with 100+ autosubs plus several hundred combinations of autosub-translations like "yo-ro | Yoruba from Romanian" etc.).

proposal

It would be helpful to control subs and auto-subs independently:

subs autosubs thumbnails (for reference)
--write-subs --write-auto-subs --write-thumbnail
--no-write-subs --no-write-auto-subs --no-write-thumbnail
--embed-subs --embed-auto-subs (#826) --embed-thumbnail
--no-embed-subs --no-embed-auto-subs --no-embed-thumbnail
--list-subs --list-auto-subs --list-thumbnails
--sub-langs --auto-sub-langs (#2205)
--sub-format (#2276) --auto-sub-format --thumbnail-format (#237)
--convert-subs --convert-auto-subs --convert-thumbnails
-o "subtitle:" -o "auto-sub:" (#6833) -o "thumbnail:"

This is highly intuitive and makes the parameters of the different multimedia files consistent with each other.

It also solves the problem below elegantly: --sub-langs all --auto-sub-langs en

use case

To download all subs + English auto-subs without running yt-dlp twice.

What parameters do I have to use to download ALL subtitles but only English automatic subtitles (instead of the myriad of automatic subtitles in all existing languages on youtube) ?

--sub-langs seems to control both --write-subs and --write-auto-subs
whereas I would need something like --sub-langs all --auto-sub-langs en.* which doesn't exist, right ?

related

@chrizilla chrizilla added the question Question label Jan 8, 2022
@pukkandan
Copy link
Member

Right. You'll have to run yt-dlp twice, once with --sub-langs all --write-subs and again with --sub-langs en --write-auto-subs

@chrizilla
Copy link
Author

chrizilla commented Jan 8, 2022

thanks for the feedback. If you don't mind, I will file a feature request, because

  • it is easy to forget to run yt-dlp a second time when a user is downloading many files
  • and also when a user has not used yt-dlp for a while
  • it increases workload to always run yt-dlp twice
  • user has to remember which parameters to use for 1st and 2nd run

@pukkandan pukkandan reopened this Jan 8, 2022
@pukkandan pukkandan added enhancement New feature or request and removed question Question labels Jan 8, 2022
@pukkandan pukkandan changed the title How to download all subs but only english auto subs ? Select sub and auto-sub languages separately Jan 8, 2022
@pukkandan
Copy link
Member

pukkandan commented Jan 8, 2022

I have changed this to a feature request. But most people would expect --sub-lang to keep behaving as it currently does. Any implementation must not change that.

There are also some other issues with the current subtitle selection options:

  1. Unable to download multiple subtitles of same language #946
  2. [Feature Request] Add --embed-auto-subs #826 (see Subtitle file is not deleted after being embedded #630)

If possible, I want to have a single solution to all these issues. I was thinking of allowing --format like syntax for --sub-format which should address all of these. But this is a complicated to implement

@chrizilla chrizilla mentioned this issue Jan 9, 2022
6 tasks
@chrizilla chrizilla changed the title Select sub and auto-sub languages separately handle subs and auto-subs separately Jan 9, 2022
@chrizilla
Copy link
Author

chrizilla commented Jan 9, 2022

@pukkandan said:
If possible, I want to have a single solution to all these issues.

I updated the OP. Maybe this is the kind of "single solution" you are looking for ?
 

@pukkandan said:
1. If 2 subs have same language code (happens often for generic), you cant select which to download (#946)

--write-all-thumbnails handles it this way:

videotitle.1.jpg
videotitle.2.jpg
videotitle.3.jpg
etc.

Couldn't it be handled the same way? So 2 subs have the same language code, both are downloaded like this:

videotitle.1.en.vtt
videotitle.2.en.vtt

@pukkandan said:
2. [Feature Request] Add --embed-auto-subs #826 (see Subtitle file is not deleted after being embedded #630)

this is also handled by the proposal in my OP

@NewUserHa
Copy link

probably using --sub-langs to select both 'normal subs' and 'auto-subs' can keep compatibility and make this issue easier to PR, than adding new independent options

@NewUserHa
Copy link

another simplest way without breaking compatibility is ignoring all auto-translated subtitles (since they can't be translated right at all) unless users explicitly choose them.

@pukkandan
Copy link
Member

another simplest way without breaking compatibility is ignoring all auto-translated subtitles (since they can't be translated right at all) unless users explicitly choose them.

see skip=translated_subs option in https://github.com/yt-dlp/yt-dlp#youtube. But how does that solve this issue?

@NewUserHa
Copy link

Cool. don't know that option since hasn't seen that option in the help.

the main issue is with --write-subs --write-auto-subs --sub-langs user_langs will always download the unwanted low-quality ones.
with --write-subs --write-auto-subs --sub-langs all skip=translated_subs together, I guess users can download all good-quality subtitles as the player of youtube shows(the amount is <3 usually), and just delete the unwanted ones without checking their quality.
if --write-subs --write-auto-subs --sub-langs users_wanted_langs skip=translated_subs can work, then the issue probably is solved.

@NewUserHa
Copy link

In practice, users should always prefer 'auto-generated closed captions + good translated their language subtitles' by default. and the former one can provide like [music] tips which could be useful for some users maybe. so it can be even better if it's able to download like both 'en+en(auto-gened CC)'.
don't know if --write-subs --write-auto-subs --sub-langs 'en, en.*' skip=translated_subs is able to.

I guess nobody wants auto-translated subtitles because of inaccurate or completely wrong content. so skip=translated_subs could probably be independent with hls or dash options.

@HobbyistDev
Copy link
Contributor

probably using --sub-langs to select both 'normal subs' and 'auto-subs' can keep compatibility and make this issue easier to PR, than adding new independent options

I agree with this. I think it's better if --sub-langs can accept auto-sub and original subs. Eg : --sub-langs all,auto["en"] which means all originals sub downloaded and only downloaded auto-sub for en.

I think --write-subs and --write-auto-subs can be simplified too like --write-subs auto as the same as --write-auto-subs and --write-subs will have default value like default or original

@christoph-heinrich
Copy link
Contributor

christoph-heinrich commented Aug 20, 2022

Would be great if there was some way of differentiating automatically generated subtitles from manually created ones in the final file.
Adding a suffix like -auto to the name should already be good enough.

@ToyKeeper
Copy link

ToyKeeper commented Nov 12, 2022

Just another +1 on this. Would be nice if there was a way to make it automatically do the right thing. Even better if it did the right thing by default.

... and by "right thing", I mean embedding every subtitle plus the original language's automatic caption(s).

For the automatic captions, it'd also be nice if it could convert json3 captions to a format supported by ffmpeg, so the highest-quality subs would be used. That's a separate issue though.

@pukkandan pukkandan moved this to Subtitle selection in Core enhancements Nov 21, 2022
@scrutinizer11
Copy link

scrutinizer11 commented Jan 21, 2023

Chipping in: I'd like to set precedence by making sure the "orig" subs get downloaded first, and when they're missing then go for auto-subs.

Is it how it works? Will the options as set out below, act as I've described?

--write-subs --sub-langs "(en|ru)-orig,en" --write-auto-subs

Is there a difference between en-orig and en auto-generated? How do I identify manually written subs?

@NewUserHa
Copy link

Answer: No, all subs you selected will be downloaded at once.

for users' view, Youtube has 2 types of subs: 1: auto-generated + uploader uploaded; 2: auto-translated.
for yt-dlp and youtube-dl, Youtube has 2(+2) types of subs: 1: uploader uploaded; 2: auto-subs ( auto-generated, auto-translated, automaticle subs(the content is auto-translated as well) ).

so if you want to achieve that, you need to download them in two runs.


I think it's better to change to the users' view, then it also can easily filter auto-generated out if the uploader uploaded subs only
are wanted with --write-subs --sub-langs "*orig, en, ..., to solve this usual problem/issue.

@ToyKeeper
Copy link

Quite often, the auto-generated subtitles on youtube are better than the manually-uploaded ones. It depends on the channel though, and how much effort they put into subtitles. So it'd be nice if yt-dlp would grab and embed both in a single run.

@chrizilla
Copy link
Author

@pukkandan : Have you seen the table in my OP ?
(Just making sure, because I added the table to the OP after you have replied to it.)
I hope you like it ?
Inside the table there are links to issues/RFEs solvable by the linkified switch if implemented.

@pukkandan

This comment was marked as off-topic.

@skyler14
Copy link

skyler14 commented Aug 16, 2024

Can I receive clarification on what the API for this appears like on the python scripting side? I would like to default to downloading all manually uploaded subtitles, then allow them to download ai generated, both, or none (doing that atm). Where is this accessed in the python API?

@bashonly
Copy link
Member

@skyler14 you can use devscripts/cli_to_api.py to convert CLI args to API params

e.g. you can run python devscripts/cli_to_api.py --write-sub --write-auto-sub --sub-lang all,-live_chat and it will give you:

ydl_opts = {
    'writesubtitles': True,
    'writeautomaticsub': True,
    'subtitleslangs': ['all', '-live_chat'],
}

which is how to download all manually uploaded subs and fallback to all automatic captions.

To download both types of subs, you would need to run two separate instances of yt-dlp, but you could extract the video info only once and reuse it in the 2nd invocation:

import yt_dlp

URL = 'test:youtube'

manual_subs_opts = {
   'writesubtitles': True,
   'subtitleslangs': ['all', '-live_chat'],
}

with yt_dlp.YoutubeDL(manual_subs_opts) as ydl:
   info = ydl.extract_info(URL)

auto_subs_opts = {
   'writeautomaticsub': True,
   'subtitleslangs': ['all', '-live_chat'],
   'skip_download': True,
}

with yt_dlp.YoutubeDL(auto_subs_opts) as ydl:
   ydl.process_ie_result(info)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Subtitle selection
Development

Successfully merging a pull request may close this issue.

9 participants