You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to do this typically a HEAD request needs to be made. This is sometime problamatic on S3 because public requester-pays buckets often only allow GET and not HEAD requests.
In this PR we tried to work around this restriction: #3025
by opening a get object stream and reading the length metadata. Followed by abort/close calls to avoid actually pulling down any data.
However, some combination of AWS SDK and Apache http lib can produce the following exception
org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 2,196,257,163; received: 0)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:198)
at org.apache.http.impl.io.ContentLengthInputStream.close(ContentLengthInputStream.java:101)
at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:142)
at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172)
at java.base/java.io.FilterInputStream.close(FilterInputStream.java:180)
at software.amazon.awssdk.services.s3.checksums.ChecksumValidatingInputStream.close(ChecksumValidatingInputStream.java:162)
at java.base/java.io.FilterInputStream.close(FilterInputStream.java:180)
at software.amazon.awssdk.core.io.SdkFilterInputStream.close(SdkFilterInputStream.java:83)
at geotrellis.store.s3.util.S3RangeReader.totalLength$lzycompute(S3RangeReader.scala:59)
at geotrellis.store.s3.util.S3RangeReader.totalLength(S3RangeReader.scala:44)
at geotrellis.util.StreamingByteReader.ensureChunk(StreamingByteReader.scala:109)
at geotrellis.util.StreamingByteReader.get(StreamingByteReader.scala:130)
at geotrellis.raster.io.geotiff.reader.GeoTiffInfo$.read(GeoTiffInfo.scala:127)
Issue
This issue is to remove all usage of RangeRader.totalLength method and deprecate it for removal in next major version.
Ultimately we do not need to know the total length of the file. RangeReader interface is used exclusively to read GeoTiffs (and possibly similar files). First few KB of such files are a header that will contain or point to table of offsets. Consequently when we're reading GeoTiff segments we're always basing the read ranges from the offset table and never have a chance to walk off the end of the file. One exception to this is performing the initial header read. In theory the file should never be smaller than that requests, but we should double check the behavior of GeoTiff reader if it is.
As a side benefit we can remove one extra HEAD/GET request per file and avoid paying the AWS costs on that.
The text was updated successfully, but these errors were encountered:
Currently
RangeReader
implementations will clip read requests to known maximum file size:geotrellis/util/src/main/scala/geotrellis/util/RangeReader.scala
Lines 41 to 42 in 514e6fa
In order to do this typically a
HEAD
request needs to be made. This is sometime problamatic on S3 because public requester-pays buckets often only allowGET
and notHEAD
requests.In this PR we tried to work around this restriction: #3025
geotrellis/s3/src/main/scala/geotrellis/store/s3/util/S3RangeReader.scala
Lines 55 to 60 in 514e6fa
by opening a get object stream and reading the length metadata. Followed by abort/close calls to avoid actually pulling down any data.
However, some combination of AWS SDK and Apache http lib can produce the following exception
Issue
This issue is to remove all usage of
RangeRader.totalLength
method and deprecate it for removal in next major version.Ultimately we do not need to know the total length of the file.
RangeReader
interface is used exclusively to read GeoTiffs (and possibly similar files). First few KB of such files are a header that will contain or point to table of offsets. Consequently when we're reading GeoTiff segments we're always basing the read ranges from the offset table and never have a chance to walk off the end of the file. One exception to this is performing the initial header read. In theory the file should never be smaller than that requests, but we should double check the behavior of GeoTiff reader if it is.As a side benefit we can remove one extra HEAD/GET request per file and avoid paying the AWS costs on that.
The text was updated successfully, but these errors were encountered: