-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix S3LayerWriter NullPointerException #3096
Fix S3LayerWriter NullPointerException #3096
Conversation
Could always throw a test in here where you write/read a layer to catalog that has no key: Its a little weird but its a normal spec that gets its tests from https://github.com/locationtech/geotrellis/blob/master/spark-testkit/src/main/scala/geotrellis/spark/testkit/io/PersistenceSpec.scala -- so you can add more Edit: Also the approach seems reasonable, I have nothing to disagree with |
👍 I'll give that a shot, thanks! |
02dd689
to
fbe4fa6
Compare
@@ -217,7 +217,7 @@ object S3AttributeStore { | |||
final val SEP = "__" | |||
|
|||
def apply(bucket: String, root: String, s3Client: => S3Client = S3ClientProducer.get()) = | |||
new S3AttributeStore(bucket, root, s3Client) | |||
new S3AttributeStore(bucket, Option(root).getOrElse(""), s3Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine either way, no change request but question: Is this left over from initial approach or was this a special case that needed to be handled and if so why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most likely way to get nulls in the first place is addressed by the change to AmazonS3URI, but this was just another spot where a null could exist if a user is passing strings generated by some other Java API. This is a middle ground band-aid versus a more comprehensive update to ensure that all the downstream Layer classes use cleaned bucket/prefix values on the S3AttributeStore, which sometimes doesn't happen now -- each downstream Layer class holds copies of the bucket/prefix values that should theoretically already exist on the S3AttributeStore.
Should be squashed for cleaner history, atomic change otherwise. |
...when performing operations against a bucket with no prefix, e.g. "s3://example-bucket". Fixed by preventing AmazonS3URI.getKey from returning nulls. This _should_ be safe as a null path prefix is functionally equivalent to an empty string in that they both represent "no path prefix". This change was made to reduce the chance that nulls escape downstream into our various Layer[Writer|Reader|Copier|...] methods that don't have a consistent way to handle a null prefix. We also check for null in S3AttributeStore as a layered defense. More aggressive checking in each of the downstream S3Layer classes would be prohibitive ATM because the bucket/prefix information is copied in each. Users should prefer use of the apply method over the class constructor in order to take advantage of this safety check.
cbddf7e
to
c146f4e
Compare
Overview
AmazonS3URI will have a null
key
if the user passes a uri that doesn't contain one, such ass3://foo
. This gets passed down into the various S3 reader/writer classes in GeoTrellis, and is converted to an empty string in some cases but not others.In some cases, an S3LayerWriter will be constructed using the attached S3AttributeStore bucket + keyPrefix, if we know we're getting an S3AttributeStore. Otherwise the S3LayerWriter will be constructed using passed in bucket/keyPath values which can come from anywhere, including the nullable
AmazonS3URI.key
.The quickest, least disruptive, and most comprehensive method seemed to be to fixup the
AmazonS3URI.getKey
method to return""
instead ofnull
and also add an additional error check in theS3AttributeStore
apply method that converts incoming nulls in prefix to empty string. More comprehensive error handling would take two forms. The "easier" way would be to perform the check in every apply in the downstream S3Layer[Reader|Writer|...] classes, of which there are quite a few. The "harder" way would be to abstract the bucket and prefix properties to a common trait, and introduce the null check and conversion to empty string there. I went with this intermediate approach for now as its lighter weight on the eve of the GT3.0 release cutoff.Checklist
docs/CHANGELOG.rst
updated, if necessaryNotes
I added a new script that starts the docker container necessary for running tests that target S3 via the s3 client.
Closes #2857