Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for get_title with multibyte characters #46584

Conversation

haruhisa-shin
Copy link
Contributor

This PR adds tests for <title> tag values containing multibyte characters, such as entity references and CJK languages.

We want to use these tests to confirm bugs reported in WebKit https://bugs.webkit.org/show_bug.cgi?id=270063.

Copy link
Contributor

@whimboo whimboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help in extending the WebDriver tests for the classis protocol! We still lack coverage on a lot of places...

I've taken a look and both of the newly added tests work well for all the supported browsers.

@whimboo whimboo merged commit 27c0328 into web-platform-tests:master Jun 3, 2024
19 checks passed
webkit-commit-queue pushed a commit to haruhisa-shin/WebKit that referenced this pull request Jun 6, 2024
…retrieved correctly

https://bugs.webkit.org/show_bug.cgi?id=270063

Reviewed by Alexey Proskuryakov.

If the document title contains multibyte characters such as Japanese
or entity references, the "Get Title" result will be garbled.

The title string is obtained by JavaScript's "document.title()",
and this data is encoded in UTF8.
However, the StringBuilder.append() function used to create HTTP messages
uses fromLatin1() internally to generate strings from byte data.
This seems to be causing the multibyte characters to be garbled.

This patch changes to use String::fromUTF8() before concatenation to
restore the correct WTF::String even if it contains multibyte characters.

Also, the change of get.py is regression test for this issue.
This is an export from web-platform-tests/wpt#46584.

* Source/WebDriver/socket/HTTPServerSocket.cpp:
(WebDriver::HTTPRequestHandler::packHTTPMessage const):
* WebDriverTests/imported/w3c/webdriver/tests/classic/get_title/get.py:
(test_strip_and_collapse):
(test_title_included_entity_references):
(test_title_included_multibyte_char):

Canonical link: https://commits.webkit.org/279767@main
mnutt pushed a commit to movableink/webkit that referenced this pull request Aug 27, 2024
…retrieved correctly

https://bugs.webkit.org/show_bug.cgi?id=270063

Reviewed by Alexey Proskuryakov.

If the document title contains multibyte characters such as Japanese
or entity references, the "Get Title" result will be garbled.

The title string is obtained by JavaScript's "document.title()",
and this data is encoded in UTF8.
However, the StringBuilder.append() function used to create HTTP messages
uses fromLatin1() internally to generate strings from byte data.
This seems to be causing the multibyte characters to be garbled.

This patch changes to use String::fromUTF8() before concatenation to
restore the correct WTF::String even if it contains multibyte characters.

Also, the change of get.py is regression test for this issue.
This is an export from web-platform-tests/wpt#46584.

* Source/WebDriver/socket/HTTPServerSocket.cpp:
(WebDriver::HTTPRequestHandler::packHTTPMessage const):
* WebDriverTests/imported/w3c/webdriver/tests/classic/get_title/get.py:
(test_strip_and_collapse):
(test_title_included_entity_references):
(test_title_included_multibyte_char):

Canonical link: https://commits.webkit.org/279767@main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants