-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema validation crashes when running in an environment without internet access #1916
Comments
Related to this is a large increase in the execution time of $ python -m timeit -s 'import pyhf' 'pyhf.simplemodels.uncorrelated_background(signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0])'
50 loops, best of 5: 4.36 msec per loop And for v0.7.0rc1: $ python -m timeit -s 'import pyhf' 'pyhf.simplemodels.uncorrelated_background(signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0])'
1 loop, best of 5: 218 msec per loop For any code that builds many |
The network request should at least be cache'd for the next time the model is built. Your |
Yeah, this is a regression. I thought I had this fixed because I remember mentioning it somewhere. |
It's not starting a new session: python -m timeit -s 'import pyhf' -v 'pyhf.simplemodels.uncorrelated_background(signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0])'
1 loop -> 0.491 secs
raw times: 209 msec, 232 msec, 242 msec, 229 msec, 218 msec
1 loop, best of 5: 209 msec per loop You can see the first |
That's weird as the resolver shouldn't use https calls once cached... so there must be something else going on here? |
Every time Edit: Ah, but this part hasn't changed since the last release. You're talking about the other cache (in Edit 2: Okay, the answer there is similar and still in that same code (https://github.com/python-jsonschema/jsonschema/blob/v4.7.1/jsonschema/validators.py#L696). |
Interesting, then
So we'll have to refactor this based on the unexpected behavior in jsonschema. EDIT: a naive solution would be to add the full URI for |
Sorry if I am missing your point, but is that not the behavior of |
Summary
In master and the 0.7.0 release candidate, pyhf operations involving model validation will crash in offline environments with a RefResolutionError. This is a common situation e.g. with worker nodes on HTC clusters.
The bug was introduced after 0.6.3, I think in #1753 where the pre-loading was dropped.
OS / Environment
Steps to Reproduce
I don't know a good way to prepare the environment to demonstrate this.
But the below test exposes the attempt by the RefResolver to resolve the schema id through the https URL, and fails against the release candidate/master, but passes in 0.6.3
File Upload (optional)
No response
Expected Results
I expect schema validation to succeed without crashing even when there is no network access that allows resolving the https schema-ids.
Actual Results
jsonschema.exceptions.RefResolutionError: HTTPSConnectionPool(host='scikit-hep.org', port=443): Max retries exceeded with url: /pyhf/schemas/1.0.0/defs.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2b2bb8457c40>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
pyhf Version
pyhf, version 0.7.0rc2
Code of Conduct
The text was updated successfully, but these errors were encountered: