-
-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File names use "full-width" special characters: '?' vs '?' #5014
Comments
It's on purpose. It's done here: Line 661 in 3c757d5
Take a look in the sanitize_filename function. It is performed when There isn't a way to disable it, unless you edit the code (change what chars are affected by the fullwidth replacement in the loop, or remove all calls to the function). Maybe we should get a new option that disables sanitization to the bare minimum (only disallowing forward slash and allowing everything else). I don't mind it personally because the title is fully preserved in the accompanying metadata (embedded and in info.json). Alternatively you can use --compat-options filename-sanitization to mimic youtube-dl's behavior which doesn't do this git-blame says this added it 989a01c |
We are doing the bare minimum sanitization, except that the code doesn't know what characters the filesystem allows. So we sanitize for the most restrictive. If you know of a way to correctly identify what characters need to be sanitized, let us know in #4547 |
Duplicate of #4767 and related |
Summary of all solutions:
|
Ok, that makes sense why it's on purpose. (also I apologize for not seeing #4767 I looked for probably 15 minutes accumulatively and could not find anything) |
No worries. Github issue search sucks unless you know the right search terms. But now that this issue is pinned, hopefully more people dont make duplicates |
Although this is closed, I wanted to add information. None of the above options (delete, make fullwidth, or make underscore) seem reasonable to me. I thought I could tell yt-dlp to ignore certain characters by simply replacing them with themselves in metadata, but that produces an error. I am currently working around this problem. I allow the default (fullwidth) option, which at least preserves the information about which character WAS there, so I can put it back at the command line, with things like: My ideal would be to allow the user to specify a string containing all the characters they DON'T want yt-dlp to help with. |
The replacement of As there doesn't seem to be an option to disable this, the following can be used instead:
or if you want to replace
or the "big"
|
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
Checklist
Region
USA
Provide a description that is worded well enough to be understood
Recently (last month or so) all videos I download from YouTube use "full-width" special characters. (such as question mark, explication mark, colon, semicolon, &c.)
Other characters such as [A-Za-z] are "normal".
The video name appears to be "normal" in the json file from
--write-info-json
so I guess there must be some problem when naming the actual video file?Example of "full-width" special character
?
(U+FF1F) vs "normal" value?
(U+003F)Provide verbose output that clearly demonstrates the problem
yt-dlp -vU <your command line>
)[debug] Command-line config
) and insert it belowComplete Verbose Output
The text was updated successfully, but these errors were encountered: