Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix S3LayerWriter NullPointerException #3096

Conversation

CloudNiner
Copy link
Contributor

@CloudNiner CloudNiner commented Sep 30, 2019

Overview

AmazonS3URI will have a null key if the user passes a uri that doesn't contain one, such as s3://foo. This gets passed down into the various S3 reader/writer classes in GeoTrellis, and is converted to an empty string in some cases but not others.

In some cases, an S3LayerWriter will be constructed using the attached S3AttributeStore bucket + keyPrefix, if we know we're getting an S3AttributeStore. Otherwise the S3LayerWriter will be constructed using passed in bucket/keyPath values which can come from anywhere, including the nullable AmazonS3URI.key.

The quickest, least disruptive, and most comprehensive method seemed to be to fixup the AmazonS3URI.getKey method to return "" instead of null and also add an additional error check in the S3AttributeStore apply method that converts incoming nulls in prefix to empty string. More comprehensive error handling would take two forms. The "easier" way would be to perform the check in every apply in the downstream S3Layer[Reader|Writer|...] classes, of which there are quite a few. The "harder" way would be to abstract the bucket and prefix properties to a common trait, and introduce the null check and conversion to empty string there. I went with this intermediate approach for now as its lighter weight on the eve of the GT3.0 release cutoff.

Checklist

  • docs/CHANGELOG.rst updated, if necessary
  • Unit tests added for bug-fix or new feature

Notes

I added a new script that starts the docker container necessary for running tests that target S3 via the s3 client.

Closes #2857

@echeipesh
Copy link
Contributor

echeipesh commented Oct 1, 2019

Could always throw a test in here where you write/read a layer to catalog that has no key:
https://github.com/locationtech/geotrellis/blob/master/s3-spark/src/test/scala/geotrellis/spark/store/s3/S3SpatialSpec.scala

Its a little weird but its a normal spec that gets its tests from https://github.com/locationtech/geotrellis/blob/master/spark-testkit/src/main/scala/geotrellis/spark/testkit/io/PersistenceSpec.scala -- so you can add more describe and it blocks. The test layers and readers are available to you right there.

Edit: Also the approach seems reasonable, I have nothing to disagree with

@CloudNiner
Copy link
Contributor Author

👍 I'll give that a shot, thanks!

@pomadchin pomadchin changed the title [WIP] Fix S3LayerWriter NullPointerException Fix S3LayerWriter NullPointerException Oct 1, 2019
@CloudNiner CloudNiner force-pushed the feature/awf/s3-layer-writer-prefix#2857 branch 2 times, most recently from 02dd689 to fbe4fa6 Compare October 1, 2019 17:18
@CloudNiner CloudNiner requested a review from echeipesh October 1, 2019 17:25
@@ -217,7 +217,7 @@ object S3AttributeStore {
final val SEP = "__"

def apply(bucket: String, root: String, s3Client: => S3Client = S3ClientProducer.get()) =
new S3AttributeStore(bucket, root, s3Client)
new S3AttributeStore(bucket, Option(root).getOrElse(""), s3Client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine either way, no change request but question: Is this left over from initial approach or was this a special case that needed to be handled and if so why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most likely way to get nulls in the first place is addressed by the change to AmazonS3URI, but this was just another spot where a null could exist if a user is passing strings generated by some other Java API. This is a middle ground band-aid versus a more comprehensive update to ensure that all the downstream Layer classes use cleaned bucket/prefix values on the S3AttributeStore, which sometimes doesn't happen now -- each downstream Layer class holds copies of the bucket/prefix values that should theoretically already exist on the S3AttributeStore.

@echeipesh
Copy link
Contributor

Should be squashed for cleaner history, atomic change otherwise.

...when performing operations against a
bucket with no prefix, e.g.
"s3://example-bucket".

Fixed by preventing AmazonS3URI.getKey from
returning nulls. This _should_ be safe as a
null path prefix is functionally equivalent
to an empty string in that they both represent
"no path prefix".

This change was made to reduce the chance that
nulls escape downstream into our various
Layer[Writer|Reader|Copier|...] methods that
don't have a consistent way to handle a null
prefix.

We also check for null in S3AttributeStore as
a layered defense. More aggressive checking in
each of the downstream S3Layer classes would
be prohibitive ATM because the bucket/prefix
information is copied in each. Users should
prefer use of the apply method over the class
constructor in order to take advantage of this
safety check.
@CloudNiner CloudNiner force-pushed the feature/awf/s3-layer-writer-prefix#2857 branch from cbddf7e to c146f4e Compare October 2, 2019 16:16
@CloudNiner CloudNiner merged commit b9ca38c into locationtech:master Oct 2, 2019
@CloudNiner CloudNiner deleted the feature/awf/s3-layer-writer-prefix#2857 branch October 2, 2019 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3LayerWriter Cannot Write to URIs that Lack a Key Prefix
3 participants