-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Win32UnicodeEnvironmentTests.test_encodableUnicodeEnvironment for Azure pipelines #1302
Conversation
It seems that something is still failing but I don't seem to grasp the error. |
The travis build failed, due to lint: https://travis-ci.com/github/twisted/twisted/jobs/351767847 |
3234199
to
bf1f307
Compare
This post from @hawkowl is a very good description of UTF-8 vs. CP1252 encoding issues on Windows: https://twistedmatrix.com/pipermail/twisted-python/2020-June/065140.html |
edfaafe
to
088d4bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change.
I would like to have the docstring of the tests updated, to make it clear what is the expected behaviour.
At the same time, I think that this should be a bugfix.
I guess that this does not affect only our CI, but also anyone trying to use spawn process on WIndows with unicode EVN values.
So the newsfragment can be something like this:
twisted.internet.reactor.spawnProcess can now be used with Unicode environment variables.
It would also have to have a full functional/integrated test to check if the unicode environment variables are available in side the new process as unicode or as UTF-8 bytes.
That is, extend Win32ProcessTests to have an test_osEnviron_bytes and test_osEnviron_unicode , similar to test_stdinReader_bytesArgs/test_stdinReader_unicodeArgs
Let me know if you need help with the tests and I suggest a path for such a test.
In the unit tests I see that the environ is converted to UTF-8, but is not clear how the environmental is actually available/visible inside the spawned process.
And I think that this should be a generic test, for all reactors and not only for Windows / iocp.
I see that you made a change to posix_reactor, but I don't see any updated tests for Unicode env vars.
Does it mean that posix Unicode env vars were already working, and we already have a test for that.
In that case, I think that those existing tests should also be enabled for Windows.
src/twisted/test/test_process.py
Outdated
def test_UTF8StringInEnvironment(self): | ||
""" | ||
L{os.environ} (inherited by every subprocess on Windows) can | ||
contain a UTF-8 string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand this docstring. ... will it handle Unicode key or value.
It looks like in this test, the environment contain Unicode (not UTF-8) and once GetEnvironmentDictionary is called the value is automatically converted/encoded to UTF-8.
Does it mean that only the value can be Unicode?
I looks like both tests will have the same unicode, so my suggestion is to have a single test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is documented in the Python 3 docs: https://docs.python.org/3/library/os.html#os.environ os.environ
contains str
keys and values. str
is Unicode on Python 3.
I think changing the Reactor portion is wrong -- Windows environment variables are stored internally as Unicode, and the APIs take Unicode. Therefore, we should ask users to provide the native (Unicode) representation in the API, decoding it ourselves from bytes if need be, since if we pass bytes to the Python APIs, it may do implicit decoding based on the character map. There is no real bytes encoding of a Windows environment variable, unless we (or Python) encodes it. The solution is basically not to change any of the code, but to ensure that these tests run under an accurate code page. The Unicode tests fail because of the Latin-1 code page, and so them failing is "correct". The issue is that we're dealing with Unicode, so we want a working Unicode input/output system. |
088d4bf
to
f7162b9
Compare
Also, remove an extraneous comment about Python 2, since the minimum Python version that Twisted supports is Python 3.5
f7162b9
to
4f807ed
Compare
On Python 3, |
@adiroiban I don't understand what you are asking for here. On Windows, |
4b10cda
to
6c0585b
Compare
6c0585b
to
ae8a231
Compare
@hawkowl OK, I took out the change which eliminated |
on Windows Set PYTHONIOENCODING variable to utf8 for Python 3.5, 3.6
2ea4bbe
to
e0cd02f
Compare
e0cd02f
to
85069b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The narrower change here LGTM. Let’s land this so we can get non-ASCII contributors in here ASAP :)
@@ -496,18 +496,18 @@ def parseChunks(self, chunks): | |||
strings giving key value pairs of the environment from that process. | |||
Return this as a dictionary. | |||
""" | |||
environString = b''.join(chunks) | |||
if not environString: | |||
environBytes = b''.join(chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 love the more descriptive names
@mgorny Thanks a lot to Michał Górny who was patient in helping me track down the problem and come up with a fix. I found it very weird that if I did a PR from the exact same branch as @mgorny, my branch would pass CI, but if @mgorny did another PR from the same branch, the CI on his PR would fail. It took me a long time to track it down to an environment variable:
which had non-ASCII characters in the value, and thus triggered CI failures on Windows. :) |
https://twistedmatrix.com/trac/ticket/9858
@mgorny helped me debug this under this PR: #1305
When mgorny does a pull request from his GitHub ID, they following environment variable is set on Azure pipelines:
That triggered a codepath which caused the CI to fail on Windows, even if he did trivial changes like only changing whitespace.
This patch fixes the problem. I also added another test, putting an environment variable similar to Michał's name.