-
-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get hashes from PyPI JSON API #1109
Get hashes from PyPI JSON API #1109
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1109 +/- ##
==========================================
+ Coverage 99.44% 99.46% +0.01%
==========================================
Files 36 36
Lines 2540 2631 +91
Branches 312 318 +6
==========================================
+ Hits 2526 2617 +91
Misses 8 8
Partials 6 6
Continue to review full report at Codecov.
|
it might be worth always getting the hashes now rather than skipping them if the hashes are already in the requirements.txt (this behaviour causes me problems, eg when new wheels are uploaded to an old version I don't get the hash of the new wheel |
We could ignore current hashes using |
But if think it deserves a discussion in a separate issue. |
piptools/repositories/pypi.py
Outdated
for package_index in package_indexes: | ||
url = "{url}/{name}/json".format(url=package_index.pypi_url, name=ireq.name) | ||
try: | ||
data = self.session.get(url).json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably remember which URLs support the PyPI JSON api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we know that JSON API of a custom index URL is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this URL doesn't 404
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it raises 404 then it falls back to the files hash checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps 404 should be handled explicitly because currently, it falls back on every HTTP/JSON error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I might be missing your point, but I'll try to describe mine in detail.
Using this heuristic, we can't be sure that JSON API is not supported. Consider the following case:
- package1 from https://pypi1-server/pypi/package1/json
- package2 from https://pypi2-server/pypi/package2/json
- package3 from https://pypi1-server/pypi/package1/json
Suppose we run the following command:
$ pip-compile --index-url=https://pypi1-server/ --extra-index-url=https://pypi2-server/
Step 1. When pip-tools processes package1
everything is okay because it hits https://pypi1-server/pypi/package1/json which responses 200.
Step2. After, pip-tools compiles package2
and hit the first index URL, which is https://pypi1-server/pypi/package2/json. But this URL raises 404 because the package is on pypi2-server. So it has to go to the second index URL, which is https://pypi2-server/pypi/package2/json, which responses 200.
Step3. Then pip-tools processes package3
the same as in step 1 with the first index URL.
How would we know that JSON API of a custom index URL is supported?
If this URL doesn't 404.
Considering the above statement the test case will be failed on step 3 because pypi1-server
would be marked as "doesn't support JSON API" on step 2 because pip-tools couldn't find package2 on pypi1-server
due to it responded 404.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the simple index and the JSON API 404s then you don't know
if the simple index 200s and the JSON API 404s then you know the API is unsupported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yes, that would work. Assuming that suggestion is to hit less the PyPI servers now we have to hit the servers twice on 404. So what's the point of the optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
graingert: You have to hit it twice to download the hashes from indexes that don't support the JSON API
Sorry, I don't understand what are you trying to achieve by that. Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think this should be fixed anyway please feel free to open an issue/PR. Thank you for the help!
You have to hit it twice to download the hashes from indexes that don't
support the JSON API
…On Sun, 19 Apr 2020, 15:23 Albert Tugushev, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In piptools/repositories/pypi.py
<#1109 (comment)>:
> - try:
- data = self.session.get(url).json()
- except (ValueError, HTTPError) as e:
- log.error(
- "Fetch package info from PyPI failed: {url}: {e}".format(url=url, e=e)
- )
- return None
- return data
+ package_indexes = [
+ PackageIndex(url=index_url, file_storage_domain="")
+ for index_url in self.finder.search_scope.index_urls
+ ]
+ for package_index in package_indexes:
+ url = "{url}/{name}/json".format(url=package_index.pypi_url, name=ireq.name)
+ try:
+ data = self.session.get(url).json()
I see. Yes, that would work. Assuming that suggestion is to hit less the
PyPI servers now we have to hit the servers twice on 404. So what's the
point?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADFATBWV3HX4WUC6BEFYY3RNMCNHANCNFSM4MLXNCFQ>
.
|
Okay, it's ready for review now. |
I may be missing something in If so, I notice there's one other case like that in I'll take a closer look at this whole thing tonight, but it looks really nice, especially that timed example you've provided. Thanks! |
It respects pip's |
Great! This is a much appreciated improvement. If you like, minor English edits in the docstrings:
|
Thanks @AndydeCleyre for reviewing this! Your suggestions addressed in a03f3fd17dafd0e8477f11785f03099e224f222c. |
a03f3fd
to
82d093f
Compare
Rebased onto the master and squashed commits. |
@AndydeCleyre @auvipy @graingert Thanks for reviewing this! 🙏 |
@atugushev the speedup here is going to be incredible! ❤️ ❤️ ❤️ ❤️ 🎉 I was just rehashing a requirements that uses numpy pandas and lxml and thinking longingly of this PR. @atugushev I think it's worth collecting some benchmarks for the changelog, like "generate-hashes in 5.1 will take 0.001x the time 5.0 used to" |
|
@atugushev fyi this PR means you now get .egg hashes in the requirements.txt, eg:
|
Erm, I wonder should we skip egg hashes? |
@atugushev what mechanism skipped them before? |
pip doesn't support installation from eggs, so it picks wheels and source dists. |
So we could skip |
Resolves #543.
Before
After
Changelog-friendly one-liner: Get hashes from PyPI JSON API.
Contributor checklist