Add a method to check if the OpenSearch query is too long #120

valgur · 2017-06-02T19:28:34Z

I noticed that SciHub has an effective query length limit of about 2700–3600 bytes (depending on the content) when I tried to use a too detailed WKT polygon.
I experimented a bit and determined that the query size is effectively limited by the formatting and length of the server's internal query. I added a method api.check_query_length() that is able to calculate whether a query is too long with fairly good accuracy.
I also made the SentinelAPIError error messages for such situations more informative (the server reports back no useful information in this case).
I set up the relevant unit test to fail if the server's limit should change at some point in the future.

Edit: turns out the effective query length limit is approximately 3893 characters, but any special characters are counted twice towards that.

codecov · 2017-06-02T23:10:08Z

Codecov Report

Merging #120 into master will increase coverage by 0.1%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master     #120     +/-   ##
=========================================
+ Coverage    97.4%   97.51%   +0.1%     
=========================================
  Files           3        3             
  Lines         386      402     +16     
=========================================
+ Hits          376      392     +16     
  Misses         10       10

Impacted Files	Coverage Δ
sentinelsat/sentinel.py	`98.39% <100%> (+0.08%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f14e27a...076f7ce. Read the comment docs.

Plus symbols were being interpreted as spaces in queries. Covered Solr's date parsing spec in full in _format_query_date().

kr-stn · 2017-06-04T09:10:21Z

Should we maybe open a ticket with eosupport@copernicus.esa.int and ask for the exact query length? We could then fix the max query length to a precise value instead of a best estimate.

The only limitation I can find in the documentation is:

The polygon describing the geographical area can have a maximum of 200 points that must be within an area described by 10 degrees of latitude and 10 degrees of longitude.

A quick look through the DHuS Open Source Release also didn't turn up any limits so I am guessing it is enforced server-side, possibly by Solr.

edit: a little digging turns out this might be a POST vs GET limitation by Tomcat or Jetty, not Solr.

kr-stn · 2017-06-04T09:20:57Z

tests/test_mod.py

@@ -202,26 +202,25 @@ def test_large_query():
 @pytest.mark.scihub


This test will only fail if cassettes are disabled or newly recorded. Which also means that it won't fail after configuration changes at the Copernicus server side. For the sake of speed of the tests I would still keep the vcr decorator but maybe add a line to the test documentation or testing instructions at the Readme or #119 that --vcr record_new should be performed on a semi-regular basis to catch such gremlins.

valgur · 2017-06-04T09:56:21Z

Nice, thanks for finding a reference for the 8k character limit. 👍
I don't think that asking the support would help much, since this seems to be a quite arbitrary limit which is not explicitly specified anywhere in the server-side configuration.

By the way, this PR also adds a workaround for the minor issue of plus symbols getting misinterpreted as spaces on the server side. This is a clear bug in the DHuS, so I opened an issue: SentinelDataHub/DataHubSystem#23

valgur · 2017-06-04T09:59:41Z

And the 200-point polygon limit that is mentioned in the docs appears to be just a consequence of this query limit. When experimenting with the query limits I managed to get successful responses with polygons containing more than 300 points. The only problem is that you have to reduce the number of decimal places significantly to be able to fit that many in a query.

kr-stn · 2017-06-04T10:03:33Z

containing more than 300 points. The only problem is that you have to reduce the number of decimal places significantly to be able to fit that many in a query.

I went up to 400 once. With the new function you added one can easily test by how much the decimal accuracy has to be reduced for the query to succeed, without brute forcing the Open Access Hub, which is an added bonus. 👍

valgur force-pushed the length-warning branch from 85e98ab to 6849bbd Compare June 2, 2017 22:31

Print a warning if the used query was probably too long

b104084

valgur force-pushed the length-warning branch from 6849bbd to b104084 Compare June 2, 2017 23:10

valgur changed the title ~~Print a warning if the used query was probably too long~~ Add a method to check if the OpenSearch query is too long Jun 3, 2017

Add a method to check if the query string is too long and will fail

6518414

valgur force-pushed the length-warning branch from e1a078f to 6518414 Compare June 3, 2017 13:56

valgur added 2 commits June 3, 2017 17:32

Simplify the effective query length calculation

3508110

Fix plus symbols getting mangled in queries, improve date validation

076f7ce

Plus symbols were being interpreted as spaces in queries. Covered Solr's date parsing spec in full in _format_query_date().

kr-stn reviewed Jun 4, 2017

View reviewed changes

kr-stn approved these changes Jun 4, 2017

View reviewed changes

kr-stn modified the milestone: next Minor release Jun 4, 2017

valgur merged commit c7e9ccc into sentinelsat:master Jun 8, 2017

valgur mentioned this pull request Jun 9, 2017

Make the number of decimal figures configurable in geojson_to_wkt(), improve unit test fixtures #132

Merged

valgur deleted the length-warning branch June 9, 2017 15:07

kr-stn modified the milestones: v0.12, next Minor release Jul 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a method to check if the OpenSearch query is too long #120

Add a method to check if the OpenSearch query is too long #120

valgur commented Jun 2, 2017 •

edited

Loading

codecov bot commented Jun 2, 2017 •

edited

Loading

kr-stn commented Jun 4, 2017 •

edited

Loading

kr-stn Jun 4, 2017

valgur commented Jun 4, 2017 •

edited

Loading

valgur commented Jun 4, 2017 •

edited

Loading

kr-stn commented Jun 4, 2017

		@@ -202,26 +202,25 @@ def test_large_query():
		@pytest.mark.scihub

Add a method to check if the OpenSearch query is too long #120

Add a method to check if the OpenSearch query is too long #120

Conversation

valgur commented Jun 2, 2017 • edited Loading

codecov bot commented Jun 2, 2017 • edited Loading

Codecov Report

kr-stn commented Jun 4, 2017 • edited Loading

kr-stn Jun 4, 2017

Choose a reason for hiding this comment

valgur commented Jun 4, 2017 • edited Loading

valgur commented Jun 4, 2017 • edited Loading

kr-stn commented Jun 4, 2017

valgur commented Jun 2, 2017 •

edited

Loading

codecov bot commented Jun 2, 2017 •

edited

Loading

kr-stn commented Jun 4, 2017 •

edited

Loading

valgur commented Jun 4, 2017 •

edited

Loading

valgur commented Jun 4, 2017 •

edited

Loading