Skip to content

Response::text errors on non-utf8 bytes #246

Closed
@rusterize

Description

@seanmonstar Thank you very much for this great crate! It is badly needed, and I appreciate that you shared it.

Run the following simple test:

extern crate reqwest;
use reqwest::Error;

fn main() {
    match run() {
        Ok(_) => println!("success!"),
        Err(e) => eprintln!("Error: {}",e),
    }
}

fn run() -> Result<(), Error> {
    let client = reqwest::Client::new();
    let mut res = reqwest::get("http://google.com")?;
    let text = res.text()?;
    Ok(())
}

This is the output:

sh-4.4$ ./target/debug/rtest
Error: stream did not contain valid UTF-8

The error happens because Response::text() ignores the Content-Type: text/html; charset=ISO-8859-1 header from google. Response::text() is using read_to_string() from the std library which explicitly requires utf-8 encoding.

I think it is a rather big problem if reqwest can't handle google.com. You could use the ecoding crate and honor the encoding header. As a short term workaround you could provide a method to return a &[u8] rather than a String, and the user can work around the bug.

NOTE: Google may change their page tomorrow and everything will work fine. Nonetheless I am glad it broke becuse otherwise this would have been hard to discover!

Here is the data in case google changes their pages:
2018_01_16_www.google.com.data_non_utf8.txt.gz
2018_01_16_www.google.com.header.txt.gz

BTW, it is rather funny that the offending bytes are around the "Advertising Program" string :)

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions