-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a method to check if the OpenSearch query is too long #120
Conversation
Codecov Report
@@ Coverage Diff @@
## master #120 +/- ##
=========================================
+ Coverage 97.4% 97.51% +0.1%
=========================================
Files 3 3
Lines 386 402 +16
=========================================
+ Hits 376 392 +16
Misses 10 10
Continue to review full report at Codecov.
|
Plus symbols were being interpreted as spaces in queries. Covered Solr's date parsing spec in full in _format_query_date().
Should we maybe open a ticket with The only limitation I can find in the documentation is:
A quick look through the DHuS Open Source Release also didn't turn up any limits so I am guessing it is enforced server-side, possibly by Solr. edit: a little digging turns out this might be a POST vs GET limitation by Tomcat or Jetty, not Solr. |
tests/test_mod.py
Outdated
@@ -202,26 +202,25 @@ def test_large_query(): | |||
@pytest.mark.scihub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will only fail if cassettes are disabled or newly recorded. Which also means that it won't fail after configuration changes at the Copernicus server side. For the sake of speed of the tests I would still keep the vcr decorator but maybe add a line to the test documentation or testing instructions at the Readme or #119 that --vcr record_new
should be performed on a semi-regular basis to catch such gremlins.
Nice, thanks for finding a reference for the 8k character limit. 👍 By the way, this PR also adds a workaround for the minor issue of plus symbols getting misinterpreted as spaces on the server side. This is a clear bug in the DHuS, so I opened an issue: SentinelDataHub/DataHubSystem#23 |
And the 200-point polygon limit that is mentioned in the docs appears to be just a consequence of this query limit. When experimenting with the query limits I managed to get successful responses with polygons containing more than 300 points. The only problem is that you have to reduce the number of decimal places significantly to be able to fit that many in a query. |
I went up to 400 once. With the new function you added one can easily test by how much the decimal accuracy has to be reduced for the query to succeed, without brute forcing the Open Access Hub, which is an added bonus. 👍 |
I noticed that SciHub has an effective query length limit of about 2700–3600 bytes (depending on the content) when I tried to use a too detailed WKT polygon.
I experimented a bit and determined that the query size is effectively limited by the formatting and length of the server's internal query. I added a method
api.check_query_length()
that is able to calculate whether a query is too long with fairly good accuracy.I also made the
SentinelAPIError
error messages for such situations more informative (the server reports back no useful information in this case).I set up the relevant unit test to fail if the server's limit should change at some point in the future.
Edit: turns out the effective query length limit is approximately 3893 characters, but any special characters are counted twice towards that.