Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] youtube:skip=translated_subs doesn't work for some videos. #3875

Closed
7 tasks done
1sixth opened this issue May 26, 2022 · 14 comments
Closed
7 tasks done

[YouTube] youtube:skip=translated_subs doesn't work for some videos. #3875

1sixth opened this issue May 26, 2022 · 14 comments
Labels
question Question

Comments

@1sixth
Copy link

1sixth commented May 26, 2022

Checklist

Region

No response

Description

The translated_subs argument introduced in version 2022.04.08 only works in some cases.

yt-dlp works as intended in this example:

yt-dlp --ignore-config --write-subs --write-auto-subs --sub-langs all --extractor-args "youtube:skip=translated_subs" "https://www.youtube.com/watch?v=T2kS1gAbxhc"

It downloads 26 subtitles, skipping all the auto-translated ones.

But it doesn't work for this video: https://www.youtube.com/watch?v=bOXCLR3Wric

Verbose log

[debug] Command-line config: ['--verbose', '--ignore-config', '--write-subs', '--write-auto-subs', '--sub-langs', 'all', '--extractor-args', 'youtube:skip=translated_subs', 'https://www.youtube.com/watch?v=bOXCLR3Wric']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.05.18 [b14d52355]
[debug] Python version 3.9.12 (CPython 64bit) - Linux-5.17.9-x86_64-with-glibc2.34
[debug] Checking exe version: ffprobe -bsfs
[debug] Checking exe version: ffmpeg -bsfs
[debug] exe versions: ffmpeg 4.4.1 (setts), ffprobe 4.4.1, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.14.1, brotli-1.0.9, certifi-2021.10.08, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.1
[debug] Proxy map: {'no': '127.0.0.1,localhost', 'ftp': 'http://127.0.0.1:1081', 'https': 'http://127.0.0.1:1081', 'http': 'http://127.0.0.1:1081', 'all': 'http://127.0.0.1:1081', 'rsync': 'http://127.0.0.1:1081'}
[debug] [youtube] Extracting URL: https://www.youtube.com/watch?v=bOXCLR3Wric
[youtube] bOXCLR3Wric: Downloading webpage
[youtube] bOXCLR3Wric: Downloading android player API JSON
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, codec:vp9.2, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), acodec, lang, proto, filesize, fs_approx, tbr, vbr, abr, asr, vext, aext, hasaud, id
[debug] Downloading subtitles: en, af, sq, am, ar, hy, az, bn, eu, be, bs, bg, my, ca, ceb, zh-Hans, zh-Hant, co, hr, cs, da, nl, en-orig, eo, et, fil, fi, fr, gl, ka, de, el, gu, ht, ha, haw, iw, hi, hmn, hu, is, ig, id, ga, it, ja, jv, kn, kk, km, rw, ko, ku, ky, lo, la, lv, lt, lb, mk, mg, ms, ml, mt, mi, mr, mn, ne, no, ny, or, ps, fa, pl, pt, pa, ro, ru, sm, gd, sr, sn, sd, si, sk, sl, so, st, es, su, sw, sv, tg, ta, tt, te, th, tr, tk, uk, ur, ug, uz, vi, cy, fy, xh, yi, yo, zu
[debug] Default format spec: bestvideo*+bestaudio/best
[info] bOXCLR3Wric: Downloading 1 format(s): 313+251
[info] Writing video subtitles to: The unreasonable effectiveness of complex numbers in discrete math [bOXCLR3Wric].en.vtt
[debug] Invoking http downloader on "https://www.youtube.com/api/timedtext?v=bOXCLR3Wric&caps=asr&xoaf=5&xosf=1&hl=en&ip=0.0.0.0&ipbits=0&expire=1653592120&sparams=ip%2Cipbits%2Cexpire%2Cv%2Ccaps%2Cxoaf&signature=CE03F84903AC811E4A7EC49EEAA05D7F498B2C3C.ADF5D40DE8FEAA19172D9F174D18C543149E267E&key=yt8&lang=en&fmt=vtt"
[download] Destination: The unreasonable effectiveness of complex numbers in discrete math [bOXCLR3Wric].en.vtt
[download] 100% of 65.30KiB in 00:00
[info] Writing video subtitles to: The unreasonable effectiveness of complex numbers in discrete math [bOXCLR3Wric].af.vtt
[debug] Invoking http downloader on "https://www.youtube.com/api/timedtext?v=bOXCLR3Wric&caps=asr&xoaf=5&xosf=1&hl=en&ip=0.0.0.0&ipbits=0&expire=1653592120&sparams=ip%2Cipbits%2Cexpire%2Cv%2Ccaps%2Cxoaf&signature=CE03F84903AC811E4A7EC49EEAA05D7F498B2C3C.ADF5D40DE8FEAA19172D9F174D18C543149E267E&key=yt8&kind=asr&lang=en&tlang=af&fmt=vtt"
[download] Destination: The unreasonable effectiveness of complex numbers in discrete math [bOXCLR3Wric].af.vtt
[download] 100% of 354.51KiB in 00:00
[info] Writing video subtitles to: The unreasonable effectiveness of complex numbers in discrete math [bOXCLR3Wric].sq.vtt
[debug] Invoking http downloader on "https://www.youtube.com/api/timedtext?v=bOXCLR3Wric&caps=asr&xoaf=5&xosf=1&hl=en&ip=0.0.0.0&ipbits=0&expire=1653592120&sparams=ip%2Cipbits%2Cexpire%2Cv%2Ccaps%2Cxoaf&signature=CE03F84903AC811E4A7EC49EEAA05D7F498B2C3C.ADF5D40DE8FEAA19172D9F174D18C543149E267E&key=yt8&kind=asr&lang=en&tlang=sq&fmt=vtt"
[download] Destination: The unreasonable effectiveness of complex numbers in discrete math [bOXCLR3Wric].sq.vtt
[download] 100% of 376.53KiB in 00:00
^C
ERROR: Interrupted by user
@1sixth 1sixth added site-bug Issue with a specific website triage Untriaged issue labels May 26, 2022
@pukkandan
Copy link
Member

skip=translated_subs hides subs that are auto-translated from normal subtitles. These are just auto-generated subs of different languages. Use --list-subs to see all the available subs

@pukkandan pukkandan added question Question and removed site-bug Issue with a specific website triage Untriaged issue labels May 26, 2022
@1sixth
Copy link
Author

1sixth commented May 26, 2022

skip=translated_subs hides subs that are auto-translated from normal subtitles. These are just auto-generated subs of different languages.

It feels somewhat counterintuitive to me because initially I thought skip=translated_subs would skip everything in Auto-translate:

image

I wonder if there is a way to achieve that.

@pukkandan
Copy link
Member

There is always only one "original" auto-subs. This is labeled as (Original) (lang ends with -orig) in --list-subs

@1sixth
Copy link
Author

1sixth commented May 28, 2022

Thank you for your explanation. My use case is to download all non auto-subs + English (auto-generated). Now I am able to do that with:

yt-dlp --write-subs --sub-langs all "https://www.youtube.com/watch?v=bOXCLR3Wric"
yt-dlp --write-auto-subs --sub-langs en-orig "https://www.youtube.com/watch?v=bOXCLR3Wric"

@NewUserHa
Copy link

skip=translated_subs hides subs that are auto-translated from normal subtitles. These are just auto-generated subs of different languages. Use --list-subs to see all the available subs

But the contents of "auto-generated" subs are still the same as the content of "skiped translated_subs".

Could yt-dlp add an option to avoid all machine-translated subs?

@pukkandan
Copy link
Member

pukkandan commented Nov 18, 2022

Don't use --write-auto-subs? All auto-subs are machine generated

@NewUserHa
Copy link

NewUserHa commented Nov 18, 2022

The "auto-generated" speech-to-text sub/caption (e.g. English sub of English video) is a special case. Because non-native speakers or accent problems in some videos, It makes it easier for many audiences to watch.
Therefore the "auto-generated speech-to-text sub/caption" could be an exception from "-auto-subs".

@pukkandan
Copy link
Member

pukkandan commented Nov 18, 2022

[info] Available automatic captions for xxx:
Language        Name                                               
en-orig         English                                            Speech to text, in spoken language
es              Spanish                                            Speech to text
es-en           Spanish from English                               Auto-translated from en subs

[info] Available subtitles for xxx:
Language Name                    
en       English                 Uploaded by creator

@NewUserHa
Copy link

NewUserHa commented Nov 18, 2022

The both "en-orig" and "en".
Because the former contains additional: "sound effects, relevant musical cues, and other relevant audio information", and any missing sentence in the latter.

I usually only saw one "auto-generated" sub, which is usually english on youtube web, so what is the "Spanish Speech to text" while there already a english Speech to text sub?

@chrizilla
Copy link

My use case is to download all non auto-subs + English (auto-generated).

@1sixth : Your use case is handled by this feature request: #2262

[info] Available automatic captions for xxx:
Language        Name                                               
en-orig         English                                            Speech to text, in spoken language
es              Spanish                                            Speech to text
es-en           Spanish from English                               Auto-translated from en subs

[info] Available subtitles for xxx:
Language Name                    
en       English                 Uploaded by creator

@pukkandan : Did you manually add the annotations (3rd column) or can yt-dlp output it?
--print subtitles_table and --print automatic_captions_table didn't include this column in my tests.

@pukkandan
Copy link
Member

I wrote it to explain

@MC-dusk
Copy link

MC-dusk commented Dec 5, 2023

Excuse me, but I really want to know that if the problem has been solved?
For me I want to download all normal subs and all (actually there is only one for each video) auto-sub from speech to text which labeled "orig" no matter which language it is. That means I don't need any auto-trans subs, either from normal subs or auto-gen subs.
I've read #2262 #4090 as well, if I didn't miss something, for now I still have to use two lines to do that as 1sixth said above:

yt-dlp --write-subs --sub-langs all https://www.youtube.com/watch?v=bOXCLR3Wric
yt-dlp --write-auto-subs https://www.youtube.com/watch?v=bOXCLR3Wric

@pukkandan
Copy link
Member

Issues are closed when solved. Which means, #2262 is not yet implemented and running 2 commands is your best workaround for now

@MC-dusk
Copy link

MC-dusk commented Dec 7, 2023

Issues are closed when solved. Which means, #2262 is not yet implemented and running 2 commands is your best workaround for now

Thank you for your reply. I'm sorry, but I think there's another problem. The auto-sub's default language is english, which means if I don't choose --sub-langs es-orig, then it will download en-sub (trans from es-orig), even if the orig-auto-sub is another language like spanish. For example:

yt-dlp https://www.youtube.com/watch?v=8A4FW0NV95M --write-auto-subs

When using --list-subs, it shows:

es-orig  Spanish (Original)

The orig-auto-sub is spanish, but after downloading there's only an en.vtt.

So the command should be modified as: (using regex)

yt-dlp https://www.youtube.com/watch?v=8A4FW0NV95M --write-auto-subs --sub-langs .+-orig

Though it can solve the problem, it's still strange, cause for each video there's only one orig-auto-sub, which may not be en. Downloading the original language is more reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question
Projects
None yet
Development

No branches or pull requests

5 participants