Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Win32UnicodeEnvironmentTests.test_encodableUnicodeEnvironment for Azure pipelines #1302

Merged
merged 10 commits into from
Aug 3, 2020

Conversation

rodrigc
Copy link
Contributor

@rodrigc rodrigc commented Jun 20, 2020

https://twistedmatrix.com/trac/ticket/9858

@mgorny helped me debug this under this PR: #1305

When mgorny does a pull request from his GitHub ID, they following environment variable is set on Azure pipelines:

BUILD_SOURCEVERSIONAUTHOR Michał Górny

That triggered a codepath which caused the CI to fail on Windows, even if he did trivial changes like only changing whitespace.

This patch fixes the problem. I also added another test, putting an environment variable similar to Michał's name.

@rodrigc
Copy link
Contributor Author

rodrigc commented Jun 20, 2020

@mgorny can you code review this? I am pretty sure that this test bug on Windows has been breaking the CI on your PR's. I have cherry picked this fix to your PR's: #1278
#1275

to try it out.

I want to get your Python 3.9 PR's in (with your full non-ASCII name in the checkin comments!!)

@mgorny
Copy link
Contributor

mgorny commented Jun 20, 2020

It seems that something is still failing but I don't seem to grasp the error.

@rodrigc
Copy link
Contributor Author

rodrigc commented Jun 20, 2020

The travis build failed, due to lint: https://travis-ci.com/github/twisted/twisted/jobs/351767847
I've pushed another fix for that.

@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch 3 times, most recently from 3234199 to bf1f307 Compare June 20, 2020 22:06
@rodrigc
Copy link
Contributor Author

rodrigc commented Jun 20, 2020

This post from @hawkowl is a very good description of UTF-8 vs. CP1252 encoding issues on Windows: https://twistedmatrix.com/pipermail/twisted-python/2020-June/065140.html

@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch 3 times, most recently from edfaafe to 088d4bf Compare June 23, 2020 01:53
@rodrigc rodrigc requested review from glyph, hawkowl and a team and removed request for glyph and hawkowl June 23, 2020 02:58
Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change.

I would like to have the docstring of the tests updated, to make it clear what is the expected behaviour.

At the same time, I think that this should be a bugfix.
I guess that this does not affect only our CI, but also anyone trying to use spawn process on WIndows with unicode EVN values.

So the newsfragment can be something like this:

twisted.internet.reactor.spawnProcess can now be used with Unicode environment variables.

It would also have to have a full functional/integrated test to check if the unicode environment variables are available in side the new process as unicode or as UTF-8 bytes.

That is, extend Win32ProcessTests to have an test_osEnviron_bytes and test_osEnviron_unicode , similar to test_stdinReader_bytesArgs/test_stdinReader_unicodeArgs

Let me know if you need help with the tests and I suggest a path for such a test.

In the unit tests I see that the environ is converted to UTF-8, but is not clear how the environmental is actually available/visible inside the spawned process.

And I think that this should be a generic test, for all reactors and not only for Windows / iocp.

I see that you made a change to posix_reactor, but I don't see any updated tests for Unicode env vars.
Does it mean that posix Unicode env vars were already working, and we already have a test for that.
In that case, I think that those existing tests should also be enabled for Windows.

src/twisted/test/test_process.py Outdated Show resolved Hide resolved
src/twisted/test/test_process.py Outdated Show resolved Hide resolved
src/twisted/test/test_process.py Outdated Show resolved Hide resolved
def test_UTF8StringInEnvironment(self):
"""
L{os.environ} (inherited by every subprocess on Windows) can
contain a UTF-8 string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand this docstring. ... will it handle Unicode key or value.

It looks like in this test, the environment contain Unicode (not UTF-8) and once GetEnvironmentDictionary is called the value is automatically converted/encoded to UTF-8.

Does it mean that only the value can be Unicode?

I looks like both tests will have the same unicode, so my suggestion is to have a single test.

Copy link
Contributor Author

@rodrigc rodrigc Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documented in the Python 3 docs: https://docs.python.org/3/library/os.html#os.environ os.environ contains str keys and values. str is Unicode on Python 3.

azure-pipelines/windows_test_jobs.yml Show resolved Hide resolved
@hawkowl
Copy link
Member

hawkowl commented Jun 24, 2020

I think changing the Reactor portion is wrong -- Windows environment variables are stored internally as Unicode, and the APIs take Unicode. Therefore, we should ask users to provide the native (Unicode) representation in the API, decoding it ourselves from bytes if need be, since if we pass bytes to the Python APIs, it may do implicit decoding based on the character map. There is no real bytes encoding of a Windows environment variable, unless we (or Python) encodes it.

The solution is basically not to change any of the code, but to ensure that these tests run under an accurate code page. The Unicode tests fail because of the Latin-1 code page, and so them failing is "correct". The issue is that we're dealing with Unicode, so we want a working Unicode input/output system.

@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch from 088d4bf to f7162b9 Compare July 19, 2020 17:30
rodrigc added 2 commits July 19, 2020 10:31
Also, remove an extraneous comment about Python 2, since the minimum
Python version that Twisted supports is Python 3.5
@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch from f7162b9 to 4f807ed Compare July 19, 2020 17:31
@rodrigc
Copy link
Contributor Author

rodrigc commented Jul 19, 2020

I see that you made a change to posix_reactor, but I don't see any updated tests for Unicode env vars.
Does it mean that posix Unicode env vars were already working, and we already have a test for that.
In that case, I think that those existing tests should also be enabled for Windows.

On Python 3, os.environ is str on all platforms, including POSIX. In Python 3, str is Unicode, including on POSIX.

@rodrigc
Copy link
Contributor Author

rodrigc commented Jul 19, 2020

That is, extend Win32ProcessTests to have an test_osEnviron_bytes and test_osEnviron_unicode , similar to test_stdinReader_bytesArgs/test_stdinReader_unicodeArgs

Let me know if you need help with the tests and I suggest a path for such a test.

@adiroiban I don't understand what you are asking for here. On Windows, os.environ is Unicode.
On Windows, there is no os.environbbecause os.supports_bytes_environ is always False.
See:
https://docs.python.org/3/library/os.html#os.environ
https://docs.python.org/3/library/os.html#os.environb
https://docs.python.org/3/library/os.html#os.supports_bytes_environ

@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch 3 times, most recently from 4b10cda to 6c0585b Compare July 19, 2020 20:38
@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch from 6c0585b to ae8a231 Compare July 19, 2020 21:08
@rodrigc
Copy link
Contributor Author

rodrigc commented Jul 19, 2020

I think changing the Reactor portion is wrong -- Windows environment variables are stored internally as Unicode, and the APIs take Unicode. Therefore, we should ask users to provide the native (Unicode) representation in the API, decoding it ourselves from bytes if need be, since if we pass bytes to the Python APIs, it may do implicit decoding based on the character map. There is no real bytes encoding of a Windows environment variable, unless we (or Python) encodes it.

The solution is basically not to change any of the code, but to ensure that these tests run under an accurate code page. The Unicode tests fail because of the Latin-1 code page, and so them failing is "correct". The issue is that we're dealing with Unicode, so we want a working Unicode input/output system.

@hawkowl OK, I took out the change which eliminated argChecker() from the reactor.
I made one slight change to twisted/internet/base.py to put the actual bad key or value in the TypeError message,
to make it a bit easier to debug when something goes wrong.

@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch 2 times, most recently from 2ea4bbe to e0cd02f Compare July 19, 2020 22:04
@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch from e0cd02f to 85069b5 Compare July 19, 2020 22:22
@rodrigc rodrigc requested a review from adiroiban July 20, 2020 02:53
Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The narrower change here LGTM. Let’s land this so we can get non-ASCII contributors in here ASAP :)

@@ -496,18 +496,18 @@ def parseChunks(self, chunks):
strings giving key value pairs of the environment from that process.
Return this as a dictionary.
"""
environString = b''.join(chunks)
if not environString:
environBytes = b''.join(chunks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 love the more descriptive names

@rodrigc rodrigc merged commit 484d276 into trunk Aug 3, 2020
@rodrigc rodrigc deleted the 9858-rodrigc-mgorny-unicode branch August 3, 2020 18:27
@rodrigc
Copy link
Contributor Author

rodrigc commented Aug 6, 2020

@mgorny Thanks a lot to Michał Górny who was patient in helping me track down the problem and come up with a fix. I found it very weird that if I did a PR from the exact same branch as @mgorny, my branch would pass CI, but if @mgorny did another PR from the same branch, the CI on his PR would fail.

It took me a long time to track it down to an environment variable:

BUILD_SOURCEVERSIONAUTHOR Michał Górny

which had non-ASCII characters in the value, and thus triggered CI failures on Windows. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants