Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode isses in pip install #7138

Closed
wants to merge 12 commits into from
Closed

Fix unicode isses in pip install #7138

wants to merge 12 commits into from

Conversation

julienmalard
Copy link

Continues #5712

This solves problems when installing packages with unicode module or package names.
@julienmalard julienmalard changed the title My fixes Fix unicode isses in pip install Oct 4, 2019
@pradyunsg pradyunsg added C: encoding Related to text encoding and likely, UnicodeErrors OS: windows Windows specific state: needs discussion This needs some more discussion type: bugfix labels Oct 7, 2019
@julienmalard
Copy link
Author

I changed a few things and all tests seem to be passing except Python 2.7 on Windows. I am a bit confused with the Pytest debugger output. If anyone could help out please do let me know. (Note: Appveyor Python 3.x on Windows also does not run because it gets aborted after 2.7 fails, but Python 3.x on Azure pipelines and Windows is functioning.)

@pradyunsg
Copy link
Member

ISTM that the tests are failing in Python 2 in general.

@julienmalard
Copy link
Author

Yes, it seems that my new unicode wheel test broke it. I will check. Thanks!

@julienmalard
Copy link
Author

@pradyunsg : It is back to failing only for Windows Python 2.7. Any idea why this specific error is happening? Thanks!

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Nov 6, 2019
@pradyunsg
Copy link
Member

Hi @julienmalard! I'm not sure if I know enough about Windows + Python 2 to help here.

@uranusjr
Copy link
Member

uranusjr commented Jan 8, 2020

Huh Azure is saying build not found… maybe we need to give it a kick to make a new build?

@uranusjr
Copy link
Member

uranusjr commented Jan 8, 2020

My assumption is io.open() accepts Unicode filenames, but you’re passing in str so Python 2 tries to implicitly convert it to Unicode using a wrong encoding. I’ll revisit this when I have time.

@pradyunsg pradyunsg reopened this Jan 8, 2020
@pradyunsg
Copy link
Member

I’ll revisit this when I have time.

That was a fun PR to work on.

uranusjr added a commit to uranusjr/pip that referenced this pull request Jan 8, 2020
Co-authored-by: Julien Malard <julien.malard@mail.mcgill.ca>
uranusjr added a commit to uranusjr/pip that referenced this pull request Jan 8, 2020
Co-authored-by: Julien Malard <julien.malard@mail.mcgill.ca>
uranusjr added a commit to uranusjr/pip that referenced this pull request Jan 10, 2020
Co-authored-by: Julien Malard <julien.malard@mail.mcgill.ca>
@justinmayer
Copy link

Hi @julienmalard / @uranusjr / @pradyunsg. Pelican maintainer here. Many thanks for all you do to make Pip great! 👏

We have an open issue about including our tests in the source tarball that is published to PyPI, which we reluctantly stopped doing at some point because in the past including the tests appeared to cause installation failures on some Windows systems, presumably due to the Unicode-related problems mentioned in this issue (and/or potentially-related issue pypa/setuptools#1399).

After doing some tests today building Pelican master with Poetry and installing the build artifacts on a Windows VM via GitHub Actions CI, I was able to successfully install both the sdist and wheel, even with the tests included: https://github.com/getpelican/pelican/runs/660597734?check_suite_focus=true#step:4:126

Subsequently running pip uninstall pelican, on the other hand, resulted in a UnicodeDecodeError: https://github.com/getpelican/pelican/runs/660612810?check_suite_focus=true#step:4:150

I don't know if the above error is directly related to the problem this PR aims to solve. I don't know much about UnicodeDecodeError and even less about Windows. With those caveats aside, is there anything I can do to help narrow down the problem so we can hopefully get a fix merged soon?

@uranusjr
Copy link
Member

I seems like the installation method (Poetry?) writes the RECORD file (which is used to list installed files so they can be uninstalled later) in a wrong encoding. The RECORD file should always be written in UTF-8.

@uranusjr
Copy link
Member

Actually I think this is now easier to implement with the recent refactorings to CSV code (the ones that conflict with this PR). I’ll spend some time today and see if I can make this work.

@pfmoore
Copy link
Member

pfmoore commented May 12, 2020

Note the comment here - as far as I know, there's no defined standard encoding for RECORD. Personally I can't imagine anything other than UTF-8 being what we would standardise, though, so I'd view that as an omission in the standard, rather than permission to use anything else 😉

@uranusjr
Copy link
Member

uranusjr commented May 12, 2020

I was hoping nobody would notice this lack of spec issue :p

Both pkg_resources and importlib.metadata already use UTF-8, so we should fix the specification at this point.

@julienmalard
Copy link
Author

@justinmayer Thanks for the offer to help! I do remember pip uninstall causing a problem as well, but had decided to focus on fixing installation first! But if you can help fix both at the same time, that would be great.

I had gotten stuck on trying to figure out why the Windows builds were not working (and the links above to the most recent tests do not seem to be working either). My main problem was being unable to track down the exact lines in the pip code that caused the crash on Windows build. If you could help me with that, I might be able to fix it and update the pull request (as I do unfortunately have experience with Windows).

@uranusjr Please do let me know what you are able to do and if there are any parts of this puzzle that I could help you on without duplicating efforts.

Thank you all!

@uranusjr
Copy link
Member

@julienmalard I’m working on a fix in #8223. It does not use io.open() (csv on Python 2 has trouble interpreting Unicode), but uses type hints to handle encodings across the encode/decode boundaries.

@julienmalard
Copy link
Author

@uranusjr Thank you! Would that fix all that this pull request attempted to do then? If not, please let me know if there is anything you would like me to contribute.

@uranusjr
Copy link
Member

I hope so, yes. It’d be awesome if you could take a look at the PR and test some actual use cases so we can catch any obvious failures!

@justinmayer
Copy link

Following up on my comment above... Unless I am mistaken, this test run indicates that the Pip changes in #8223 do indeed fix installing Pelican sdist/wheel on Windows, even when tests are included in the package.

I'm not sure when Pip 20.2 is due for release, but once it has been released, I think we can safely resume including tests in Pelican packages. Many thanks to everyone who helped make that possible! 🎉🎈🎊

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: encoding Related to text encoding and likely, UnicodeErrors needs rebase or merge PR has conflicts with current master OS: windows Windows specific state: needs discussion This needs some more discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants