-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Response::text errors on non-utf8 bytes #246
Comments
This is actually expected, as |
seanmonstar
changed the title
reqwest can't handle http://google.com - Encoding problem!
Response::text errors on non-utf8 bytes
Jan 16, 2018
Closed
#256 will fix this. |
seanmonstar
pushed a commit
that referenced
this issue
Feb 15, 2018
* Detect encoding and decode text response Fixes #246 * Try to get encoding from Content-Type header * Remove uchardet encoding detection for now * Add non utf-8 test case for Response::text() * Reduce copies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@seanmonstar Thank you very much for this great crate! It is badly needed, and I appreciate that you shared it.
Run the following simple test:
This is the output:
The error happens because
Response::text()
ignores theContent-Type: text/html; charset=ISO-8859-1
header from google.Response::text()
is usingread_to_string()
from the std library which explicitly requires utf-8 encoding.I think it is a rather big problem if reqwest can't handle google.com. You could use the ecoding crate and honor the encoding header. As a short term workaround you could provide a method to return a
&[u8]
rather than aString
, and the user can work around the bug.NOTE: Google may change their page tomorrow and everything will work fine. Nonetheless I am glad it broke becuse otherwise this would have been hard to discover!
Here is the data in case google changes their pages:
2018_01_16_www.google.com.data_non_utf8.txt.gz
2018_01_16_www.google.com.header.txt.gz
BTW, it is rather funny that the offending bytes are around the "Advertising Program" string :)
The text was updated successfully, but these errors were encountered: