Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] PR for testing windows encoding problems #1305

Merged
merged 7 commits into from
Aug 3, 2020

Conversation

mgorny
Copy link
Contributor

@mgorny mgorny commented Jun 21, 2020

Followup for #1278 (comment)

CC @rodrigc

@mgorny mgorny changed the base branch from 9858-rodrigc-mgorny-unicode to trunk June 21, 2020 07:23
@rodrigc
Copy link
Contributor

rodrigc commented Jun 21, 2020

Thanks for filing this PR to test things out. As I predicted, the CI on Windows failed, even though your commit was trivial.
The first thing I notice is this:

https://dev.azure.com/twistedmatrix/twisted/_build/results?buildId=1970&view=logs&j=7f8b683a-20ec-5925-1eb0-0f9b67db6dcc&t=75248844-3620-5e60-359d-192232fba6b9

This:

python -c "import os; print(os.environ)"

fails with:

2020-06-21T07:24:15.5571943Z Traceback (most recent call last):
2020-06-21T07:24:15.5580812Z   File "<string>", line 1, in <module>
2020-06-21T07:24:15.5581991Z   File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\encodings\cp1252.py", line 19, in encode
2020-06-21T07:24:15.5583012Z     return codecs.charmap_encode(input,self.errors,encoding_table)[0]
2020-06-21T07:24:15.5584308Z UnicodeEncodeError: 'charmap' codec can't encode character '\u0142' in position 2585: character maps to <undefined>
2020-06-21T07:24:15.6292586Z 64

Looks like the default encoding for the console is CP1252, and not UTF-8.

@hawkowl posted a good writeup of this here: https://twistedmatrix.com/pipermail/twisted-python/2020-June/065140.html

I need to figure out a way to force the Azure Pipeline's Window console to be UTF-8. I'm not sure how to do that.

@hawkowl
Copy link
Member

hawkowl commented Jun 21, 2020 via email

@rodrigc
Copy link
Contributor

rodrigc commented Jun 21, 2020

@hawkowl yes I think you are right. On the Azure pipeline side, the docs for configuring an environment variable is here:

https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch

and on the Python side, the docs for those variables is here:

https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING

So if we can set the right value in the Azure pipeline yaml, we might be able to get this to work.

@rodrigc
Copy link
Contributor

rodrigc commented Jun 21, 2020

We might need to CHCP 65001 to force the codepage to UTF-8,
in addition to setting the PYTHONIOENCODING and PYTHONUTF8 environment variables.

https://ss64.com/nt/chcp.html

@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch 3 times, most recently from 6e7d817 to b41a613 Compare June 21, 2020 09:54
@rodrigc
Copy link
Contributor

rodrigc commented Jun 21, 2020

@hawkowl so I set PYTHONUTF8=1 in the azure yaml for windows, and I got the Python 3.7 and 3.8 builds to work. Looks like PYTHONUTF8 is only on Python 3.7+

In this log: https://dev.azure.com/twistedmatrix/twisted/_build/results?buildId=1975&view=logs&j=a9f5be88-9f44-53f4-896b-687d3085a015&t=2d2e9c85-c051-5b0a-8878-a82a75551fb6

I see an environment variable is getting set:

BUILD_SOURCEVERSIONAUTHOR Michał Górny

This confirm's my suspicion that Michał's non-ASCII name was getting exported as an environment variable somehow on the Windows build.

@rodrigc
Copy link
Contributor

rodrigc commented Jun 21, 2020

@hawkowl do you know what the correct incantation of PYTHONIOENCODING is for Python 3.5 and 3.6, but won't interfere with the PYTHONUTF8 variable on Python 3.7+?

@hawkowl
Copy link
Member

hawkowl commented Jun 21, 2020 via email

@rodrigc
Copy link
Contributor

rodrigc commented Jun 21, 2020

@hawkowl OK, I managed to set PYTHONIOENCODING=utf-8:surrogateescape in the Azure pipeline, and things got further along.

However on Python 3.5 and 3.6 some tests are failing:

https://dev.azure.com/twistedmatrix/twisted/_build/results?buildId=1982&view=logs&j=b37a2146-32d6-5808-6b10-83380cf11296&t=f9c15f0d-fc29-520b-b028-e1b618593e2c&l=16499


Traceback (most recent call last):
  File "d:\a\1\s\build\alldeps-withcov-windows\lib\site-packages\twisted\test\test_process.py", line 2260, in test_stdinReader_bytesArgs
    d = self._test_stdinReader(pyExe, args, env, path)
  File "d:\a\1\s\build\alldeps-withcov-windows\lib\site-packages\twisted\test\test_process.py", line 2237, in _test_stdinReader
    reactor.spawnProcess(p, pyExe, args, env, path)
  File "d:\a\1\s\build\alldeps-withcov-windows\lib\site-packages\twisted\internet\posixbase.py", line 337, in spawnProcess
    args, env = self._checkProcessArgs(args, env)
  File "d:\a\1\s\build\alldeps-withcov-windows\lib\site-packages\twisted\internet\base.py", line 1014, in _checkProcessArgs
    "non-string value: {}".format(val))
builtins.TypeError: Environment contains a non-string value: Michał Górny

twisted.test.test_process.Win32ProcessTests.test_stdinReader_bytesArgs

In twisted/internet/base.py there is a function _checkProcessArgs() which
validates all arguments against sys.getfilesystemencoding().

It looks like on Python 3.5 and 3.6, setting PYTHONIOENCODING=utf-8:surrogateescape does not cause sys.getfilesystemencoding() to change.

My recommendation is to nuke the _checkProcessArgs() function from twisted/internet/base.py.

This was added 12 years ago, and is a leftover from Python 2.

In Python 3, os.environ is unicode on all platforms.

@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch from 3d410c6 to 3962f6a Compare June 21, 2020 18:46
@rodrigc
Copy link
Contributor

rodrigc commented Jun 21, 2020

So I deleted _checkProcessArgs() and one accompanying test, and managed to get all the tests to pass on all platforms, with this environment variable set:

BUILD_SOURCEVERSIONAUTHOR Michał Górny

@hawkowl I can submit a separate PR to eliminate _checkProcessArgs(), but I don't know if you think that is the way to go in terms of solving this problem.

@rodrigc
Copy link
Contributor

rodrigc commented Jun 23, 2020

@mgorny thanks for helping me test this under your ID. I submitted a full PR here: #1302

@twm
Copy link
Contributor

twm commented Jul 18, 2020

I'm closing this PR as it appears to have been superseded by #1302. If I'm mistaken please feel free to re-open. Thank you all!

@twm twm closed this Jul 18, 2020
rodrigc added 2 commits July 19, 2020 10:31
Also, remove an extraneous comment about Python 2, since the minimum
Python version that Twisted supports is Python 3.5
@rodrigc
Copy link
Contributor

rodrigc commented Jul 19, 2020

#1302 is not merged yet. We are leaving this PR open, to test when someone with non-ASCII characters in their name, specifically Michał Górny, files a PR.

@rodrigc rodrigc reopened this Jul 19, 2020
@rodrigc rodrigc force-pushed the 9858-rodrigc-mgorny-unicode branch from af4d5ab to 85069b5 Compare July 21, 2020 07:18
@rodrigc rodrigc merged commit cef8bd8 into twisted:trunk Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants