-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BOM in Response::text_with_charset
#1897
Comments
This seems like a bug to me. Consider the following: fn main() {
for s in [&b"\xEF\xBB\xBFhelloo", &b"\xEF\xBB\xBFhello\xFF"] {
let mut server = mockito::Server::new();
let _mock = server.mock("GET", "/").with_body(s).create();
println!(
"{:?}",
reqwest::blocking::get(server.url())
.unwrap()
.error_for_status()
.unwrap()
.text()
);
}
} ( This outputs:
Surely whether the BOM gets or doesn't get stripped shouldn't depend on whether there were any replacements for invalid codepoints or not. Here's a proposed fix: diff --git a/src/async_impl/response.rs b/src/async_impl/response.rs
index 340e541..2251452 100644
--- a/src/async_impl/response.rs
+++ b/src/async_impl/response.rs
@@ -185,13 +185,9 @@ pub async fn text_with_charset(self, default_encoding: &str) -> crate::Result<St
let full = self.bytes().await?;
let (text, _, _) = encoding.decode(&full);
- if let Cow::Owned(s) = text {
- return Ok(s);
- }
- unsafe {
- // decoding returned Cow::Borrowed, meaning these bytes
- // are already valid utf8
- Ok(String::from_utf8_unchecked(full.to_vec()))
+ match text {
+ Cow::Owned(s) => Ok(s),
+ Cow::Borrowed(s) => Ok(s.to_owned()),
}
}
I wonder why did this code ever bother doing the |
Actually, why not just: diff --git a/src/async_impl/response.rs b/src/async_impl/response.rs
index 340e541..eaa5b20 100644
--- a/src/async_impl/response.rs
+++ b/src/async_impl/response.rs
@@ -1,4 +1,3 @@
-use std::borrow::Cow;
use std::fmt;
use std::net::SocketAddr;
use std::pin::Pin;
@@ -185,14 +184,7 @@ pub async fn text_with_charset(self, default_encoding: &str) -> crate::Result<St
let full = self.bytes().await?;
let (text, _, _) = encoding.decode(&full);
- if let Cow::Owned(s) = text {
- return Ok(s);
- }
- unsafe {
- // decoding returned Cow::Borrowed, meaning these bytes
- // are already valid utf8
- Ok(String::from_utf8_unchecked(full.to_vec()))
- }
+ Ok(text.into_owned())
}
/// Try to deserialize the response body as JSON. That's the same thing as the match above. |
This was the original pull request: #256 I cannot tell why it was written that way. If there's no significant reason to leave it alone, then seems worth fixing. |
It looks like you suggested that piece of code back then and the reason for doing so was to avoid a copy, and it indeed did avoid a copy back then because A bunch of refactorings later, the current code no longer avoids a copy because it does |
Oh, indeed I did. I should have read a bit farther down that PR. I agree with your assessment. |
The byte order mark (BOM) is now stripped from utf-8 encoded response bodies when calling `Response::text` and `Response::text_with_charset`. This should prevent surprising behaviour when trying to use the returned String. Closes seanmonstar#1897
The byte order mark (BOM) is now stripped from utf-8 encoded response bodies when calling `Response::text` and `Response::text_with_charset`. This should prevent surprising behaviour when trying to use the returned String. Closes seanmonstar#1897
The byte order mark (BOM) is now stripped from utf-8 encoded response bodies when calling `Response::text` and `Response::text_with_charset`. This should prevent surprising behaviour when trying to use the returned String. Closes seanmonstar#1897
The byte order mark (BOM) is now stripped from utf-8 encoded response bodies when calling `Response::text` and `Response::text_with_charset`. This should prevent surprising behaviour when trying to use the returned String. Closes seanmonstar#1897
The byte order mark (BOM) is now stripped from utf-8 encoded response bodies when calling `Response::text` and `Response::text_with_charset`. This should prevent surprising behaviour when trying to use the returned String. Closes #1897
The byte order mark (BOM) is now stripped from utf-8 encoded response bodies when calling `Response::text` and `Response::text_with_charset`. This should prevent surprising behaviour when trying to use the returned String. Closes seanmonstar#1897
Hi, just seeking a bit of clarification on whether including the byte order mark in decoded UTF-8 is intentional or not when calling
Reponse::text
&Response::text_with_charset
.By ignoring the
Cow::Borrowed
branch we're returningfull
which can include a BOM, whereas the str slice in theCow::Borrowed
returned by the call todecode
strips the BOM.This tripped us up when trying to deserialize an XML response after calling
.text()
and I'm unclear on where the responsibility for removing the BOM should live.The text was updated successfully, but these errors were encountered: