Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to check if the OpenSearch query is too long #120

Merged
merged 4 commits into from
Jun 8, 2017

Conversation

valgur
Copy link
Member

@valgur valgur commented Jun 2, 2017

I noticed that SciHub has an effective query length limit of about 2700–3600 bytes (depending on the content) when I tried to use a too detailed WKT polygon.
I experimented a bit and determined that the query size is effectively limited by the formatting and length of the server's internal query. I added a method api.check_query_length() that is able to calculate whether a query is too long with fairly good accuracy.
I also made the SentinelAPIError error messages for such situations more informative (the server reports back no useful information in this case).
I set up the relevant unit test to fail if the server's limit should change at some point in the future.

Edit: turns out the effective query length limit is approximately 3893 characters, but any special characters are counted twice towards that.

@codecov
Copy link

codecov bot commented Jun 2, 2017

Codecov Report

Merging #120 into master will increase coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #120     +/-   ##
=========================================
+ Coverage    97.4%   97.51%   +0.1%     
=========================================
  Files           3        3             
  Lines         386      402     +16     
=========================================
+ Hits          376      392     +16     
  Misses         10       10
Impacted Files Coverage Δ
sentinelsat/sentinel.py 98.39% <100%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f14e27a...076f7ce. Read the comment docs.

@valgur valgur changed the title Print a warning if the used query was probably too long Add a method to check if the OpenSearch query is too long Jun 3, 2017
valgur added 2 commits June 3, 2017 17:32
Plus symbols were being interpreted as spaces in queries.
Covered Solr's date parsing spec in full in _format_query_date().
@kr-stn
Copy link
Member

kr-stn commented Jun 4, 2017

Should we maybe open a ticket with eosupport@copernicus.esa.int and ask for the exact query length? We could then fix the max query length to a precise value instead of a best estimate.

The only limitation I can find in the documentation is:

The polygon describing the geographical area can have a maximum of 200 points that must be within an area described by 10 degrees of latitude and 10 degrees of longitude.

A quick look through the DHuS Open Source Release also didn't turn up any limits so I am guessing it is enforced server-side, possibly by Solr.

edit: a little digging turns out this might be a POST vs GET limitation by Tomcat or Jetty, not Solr.

@@ -202,26 +202,25 @@ def test_large_query():
@pytest.mark.scihub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will only fail if cassettes are disabled or newly recorded. Which also means that it won't fail after configuration changes at the Copernicus server side. For the sake of speed of the tests I would still keep the vcr decorator but maybe add a line to the test documentation or testing instructions at the Readme or #119 that --vcr record_new should be performed on a semi-regular basis to catch such gremlins.

@kr-stn kr-stn modified the milestone: next Minor release Jun 4, 2017
@valgur
Copy link
Member Author

valgur commented Jun 4, 2017

Nice, thanks for finding a reference for the 8k character limit. 👍
I don't think that asking the support would help much, since this seems to be a quite arbitrary limit which is not explicitly specified anywhere in the server-side configuration.

By the way, this PR also adds a workaround for the minor issue of plus symbols getting misinterpreted as spaces on the server side. This is a clear bug in the DHuS, so I opened an issue: SentinelDataHub/DataHubSystem#23

@valgur
Copy link
Member Author

valgur commented Jun 4, 2017

And the 200-point polygon limit that is mentioned in the docs appears to be just a consequence of this query limit. When experimenting with the query limits I managed to get successful responses with polygons containing more than 300 points. The only problem is that you have to reduce the number of decimal places significantly to be able to fit that many in a query.

@kr-stn
Copy link
Member

kr-stn commented Jun 4, 2017

containing more than 300 points. The only problem is that you have to reduce the number of decimal places significantly to be able to fit that many in a query.

I went up to 400 once. With the new function you added one can easily test by how much the decimal accuracy has to be reduced for the query to succeed, without brute forcing the Open Access Hub, which is an added bonus. 👍

@valgur valgur merged commit c7e9ccc into sentinelsat:master Jun 8, 2017
@valgur valgur deleted the length-warning branch June 9, 2017 15:07
@kr-stn kr-stn modified the milestones: v0.12, next Minor release Jul 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants