Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make export configurable with asset types #5619

Merged
merged 43 commits into from
Sep 1, 2021

Conversation

pcaisse
Copy link
Contributor

@pcaisse pcaisse commented Aug 31, 2021

Overview

Brief description of what this PR does, and why it is needed.

Checklist

  • Description of PR is in an appropriate section of the changelog and grouped with similar changes if possible
  • Swagger specification updated
  • New tables and queries have appropriate indices added
  • Any content changes are properly templated using BUILDCONFIG.APP_NAME
  • Any new SQL strings have tests
  • Any new endpoints have scope validation and are included in the integration test csv

Demo

$ tree ./scratch/catalog/
./scratch/catalog/
├── catalog.json
├── images
│   ├── 17c2d044-a906-4cb6-a720-dc72f1a16eeb
│   │   ├── CatrimaniMucajaiDG-20150208-clipped.tif
│   │   └── item.json
│   ├── 6b399450-c4c6-4b9d-a950-52af1be6630d
│   │   ├── cog.tif
│   │   └── item.json
│   ├── collection.json
│   └── d0c09074-8410-4172-affa-ee0fbdf259d6
│       ├── cog.tif
│       └── item.json
├── labels
│   ├── 3d24f67c-e619-4b7b-8d5a-f7b54d87c697.json
│   ├── aa6809cc-4fee-4882-8ceb-ee52b4507797.json
│   ├── be6d0fc6-8f03-413a-ab02-00d4424046da.json
│   ├── collection.json
│   └── data
│       ├── 3d24f67c-e619-4b7b-8d5a-f7b54d87c697.geojson
│       ├── aa6809cc-4fee-4882-8ceb-ee52b4507797.geojson
│       └── be6d0fc6-8f03-413a-ab02-00d4424046da.geojson
└── README.md

6 directories, 17 files

Notes

  • Functionality-wise, this differs slightly from what's described in I3 of the scope issue. I implemented this change because I ran into an issue of COG filenames not being unique (all existing uploads are called cog.tif) so the files would overwrite each other in the export since they're all in the same directory as it was previously organized (see 6302e11). To overcome this, I grouped item JSON and the corresponding COG (if configured) within a directory named after the item id to ensure uniqueness.
  • The export_asset_types is just a varchar[] since fancier array types aren't supported by the JDBC driver as I understand it. I think enforcing everything via Scala types at the application and batch layers should get us enough safety in any case.
  • This removes the defunct STAC export v1 code
  • I added logging to the campaign export which was helpful for some debugging I did and also nice to have in general
  • Please verify that I didn't break anything in 8d6f4c5

Testing Instructions

Reassemble API and batch and restart the server.

Log into Groundwork as dev@rasterfoundry.com and get a JWT. Export it so jwt-httpie-auth can find
it:

export JWT_AUTH_TOKEN=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6Ik9URXpORGRDUkRRM1JUa3hOVGc1TVVFME9UQTNOekJGTVRZeU0wTkRNekV5TXpoRk1qUXpOQSJ9.eyJodHRwczovL2FwcC5yYXN0ZXJmb3VuZHJ5LmNvbTtwbGF0Zm9ybSI6IjMxMjc3NjI2LTk2OGItNGU0MC04NDBiLTU1OWQ5YzY3ODYzYyIsImh0dHBzOi8vYXBwLnJhc3RlcmZvdW5kcnkuY29tO29yZ2FuaXphdGlvbiI6ImRmYWM2MzA3LWI1ZWYtNDNmNy1iZWRhLWI5ZjIwOGJiNzcyNiIsImh0dHBzOi8vYXBwLnJhc3RlcmZvdW5kcnkuY29tO2Fubm90YXRlQXBwIjp0cnVlLCJodHRwczovL2FwcC5yYXN0ZXJmb3VuZHJ5LmNvbTtzY29wZXMiOiJncm91bmR3b3JrVXNlciIsImh0dHBzOi8vYXBwLnJhc3RlcmZvdW5kcnkuY29tO2dyb3VuZHdvcmtQcm9Vc2VyIjpmYWxzZSwibmlja25hbWUiOiJkZXYiLCJuYW1lIjoiZGV2QHJhc3RlcmZvdW5kcnkuY29tIiwicGljdHVyZSI6Imh0dHBzOi8vcy5ncmF2YXRhci5jb20vYXZhdGFyLzIzMzcyNmM0MWJiNTZmMGM2NWIwNzc5ZjQ0OWJkNzQ2P3M9NDgwJnI9cGcmZD1odHRwcyUzQSUyRiUyRmNkbi5hdXRoMC5jb20lMkZhdmF0YXJzJTJGZGUucG5nIiwidXBkYXRlZF9hdCI6IjIwMjEtMDgtMzBUMjE6Mjg6MDUuMjkyWiIsImVtYWlsIjoiZGV2QHJhc3RlcmZvdW5kcnkuY29tIiwiZW1haWxfdmVyaWZpZWQiOnRydWUsImlzcyI6Imh0dHBzOi8vcmFzdGVyLWZvdW5kcnktZGV2LmF1dGgwLmNvbS8iLCJzdWIiOiJhdXRoMHw1OTMxOGE5ZDJmYmJjYTNlMTZiY2ZjOTIiLCJhdWQiOiJCTkFKclpjMmhTRzdtNjBEZEJTd2hnaE05QnZSZVQ0dSIsImlhdCI6MTYzMDQyMzA2NSwiZXhwIjoxNjMwNDU5MDY1LCJub25jZSI6IlptaEpia1E0ZDBSclZVUlNaMlJFVlZsb2NqZzJVaTVIYzJoVWJYZEJaakp0ZFdkS1JWUkJRbFk0ZWc9PSJ9.c1IVh-p4_LrTyKhfM_NFG6WcW3p2_lpqlht-BP3w-pG2V2QLHTGVcjfZofQPRnrvPo_QHxPzek0URVOm08QzQR8Yi5oGPE_P1npdjtQYP9vCwGsnEYOgiv_qFN3wP4HX0gXdaQBM4_L3w2G0rHzKRmonOKxDwPY4DeKLqF1Th9O0v4h_1veZLlUrvQ2y3xDdDvxQZl9NBhoSDncbOqdmxDEAD1WF2iukC6B09m0tyBuXn9ftGf8M-XNkpIdL4aVKWqL9KNC8vT-8ozd2PH8kescdhG13X5FKAVJuCB7Zpm11Esu5kB4JqRrz5NdX-LC1fxHxTzGG398qynZeaYoU9g

Create an export via API request (this assumes you have dev data loaded because it depends on a
campaign with this id being in the database to work):

echo '{
    "name": "sample export",
    "taskStatuses": [],
    "license": {
        "license": "proprietary",
        "url": null
    },
    "exportAssetTypes": ["cog", "signed_url"],
    "campaignId": "8d5f36b7-83a5-4115-975b-278bbc059c09"
}' | http --auth-type=jwt POST :9100/api/stac/

Ensure that only those two strings are accepted for the export asset type, list cannot be empty, and property is not required (can be null or missing).

Run a STAC export using the export id returned from the response:

./scripts/console batch 'java -cp /opt/raster-foundry/jars/batch-assembly.jar com.rasterfoundry.batch.Main write_stac_catalog <export_id>'

Download the catalog at s3://rasterfoundry-development-data-us-east-1/stac-exports/<export_id>/catalog.zip and unzip.

Inspect the catalog to make sure the collection JSON and the item JSON reference each other correctly via links and that the images come down in the appropriate folder grouped by item id.

Ensure the assets on each item include both the signed URL with its expiration date and the COG with appropriate metadata.

Also check the STAC assets and links to make sure the link and media types are correct.

Closes #1352

pcaisse added 30 commits August 31, 2021 17:31
since custom types in arrays aren't supported by JDBC
Only download scene layer items if export definition includes that as an
asset type.
if the original filename is missing
The STAC link doesn't actually have the S3 URI to the TIFF on it which
is on the STAC item.
Without this, COGs with the same name (eg. cog.tif) will overwrite each
other.
This function seems to not do anything for two reasons:
1. It doesn't look like there's a data key for scene item assets. There
   is for label items so it's possible this was copied over from that.
2. The maybeSignUri function is never passed in a whitelist so it looks
   like it would never have an effect and would just return the URI,
though it doesn't seem like it's getting that far.
@pcaisse pcaisse requested a review from jisantuc August 31, 2021 23:00
.transact(xa)
.unsafeToFuture
}
case StacExport.CampaignExport(_, _, _, _, campaignId) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only line I changed in this file, FYI. The rest is wacky formatting changes from scalafmt.

Copy link
Contributor

@jisantuc jisantuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things that are no longer correct are:

  • the README links to items at the location where they used to live, instead of at the new location (e.g. images/6c73b56a-6fd1-4c02-888b-320f0545fd87.json instead of images/6c73b56a-6fd1-4c02-888b-320f0545fd87/item.json)
  • the label items' source links do the same (../images/415a9426-671d-4a89-b0fd-cb2e4fe4660b.json instead of ../images/415a9426-671d-4a89-b0fd-cb2e4fe4660b/item.json)

Other than that though looks good 😎 Nice work on something that turned out to be way larger than advertised

Comment on lines +173 to +180
fs2.io
.readInputStream(
IO(s3Object.getObjectContent().getDelegateStream),
// Chunk size is set to default used for S3 CLI multipart (8MB)
// See: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html#multipart-chunksize
1024 * 8,
ExportData.fileBlocker
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I remember trying to get the s3 client and fs2 to cooperate forever ago and I never figured out the magic combination of accesses on s3Object to get it to work 😎

Decoder.decodeString.emap { str =>
Either
.catchNonFatal(unsafeFromString(str))
.leftMap(_ => "ExportAssetType")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to provide a nicer error message here, since when bad strings are passed the user-facing message just says DecodingFailure at .exportAssetTypes[0]: ExportAssetType

@pcaisse pcaisse merged commit c8cee0f into develop Sep 1, 2021
@pcaisse pcaisse deleted the feature/pc/update-export-with-asset-types branch September 1, 2021 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants