Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google-cloud-nio 0.123.23: certain non-requester-pays accesses fail when --gcs-project-for-requester-pays is specified #7716

Closed
droazen opened this issue Mar 11, 2022 · 8 comments · Fixed by #7730

Comments

@droazen
Copy link
Contributor

droazen commented Mar 11, 2022

As reported by @jkobject testing our latest gatk-nightly image, certain non-requester-pays accesses fail with the latest google-cloud-nio version (0.123.23) when --gcs-project-for-requester-pays is specified.

The specific issue appears to be checks for the existence of non-existent files in non-requester-pays buckets when --gcs-project-for-requester-pays is set, resulting in a "User project specified in the request is invalid" error:

code:      400
message:   User project specified in the request is invalid.
reason:    invalid
location:  null
retryable: false
com.google.cloud.storage.StorageException: User project specified in the request is invalid.
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:233)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:376)
	at com.google.cloud.storage.StorageImpl.lambda$listBlobs$11(StorageImpl.java:391)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:51)
	at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:388)
	at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:359)
	at com.google.cloud.storage.contrib.nio.CloudStoragePath.seemsLikeADirectoryAndUsePseudoDirectories(CloudStoragePath.java:118)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:743)
	at java.nio.file.Files.exists(Files.java:2385)
	at htsjdk.tribble.util.ParsingUtils.resourceExists(ParsingUtils.java:418)
	at htsjdk.tribble.TribbleIndexedFeatureReader.loadIndex(TribbleIndexedFeatureReader.java:162)
	at htsjdk.tribble.TribbleIndexedFeatureReader.hasIndex(TribbleIndexedFeatureReader.java:228)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:331)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:236)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:204)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:191)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:154)
	at org.broadinstitute.hellbender.utils.IntervalUtils.featureFileToIntervals(IntervalUtils.java:356)
	at org.broadinstitute.hellbender.utils.IntervalUtils.parseIntervalArguments(IntervalUtils.java:319)
	at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:239)
	at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:200)
	at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:180)
	at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)
	at org.broadinstitute.hellbender.engine.GATKTool.initializeIntervals(GATKTool.java:525)
	at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:728)
	at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
	at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
GET https://storage.googleapis.com/storage/v1/b/fc-secure-bd7b8bc9-f665-4269-997e-5a402088a369/o?maxResults=1&prefix=5c2db926-3b1c-479c-9ed3-a99ce518de91/omics_mutect2/60955825-7723-4bc9-8202-bdd9975bb5c0/call-mutect2/Mutect2/7d737efc-c8be-4a6d-8803-4f786129521a/call-SplitIntervals/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list.idx/&projection=full&userProject
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "User project specified in the request is invalid.",
    "reason" : "invalid"
  } ],
  "message" : "User project specified in the request is invalid."
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:366)
	... 33 more

This can be triggered by providing an interval_list file in a non-RP bucket to the -L argument, since GATK will check for the existence of an index on the interval_list file and fail with the above error.

@droazen
Copy link
Contributor Author

droazen commented Mar 11, 2022

@jkobject After further testing we think the issue might be more specific: the error seems to occur only when checking for the existence of non-existent files in non-requester-pays buckets when a requester-pays project is specified. In your case, it is checking for the existence of an index for your interval_list file and failing, since interval_list files don't have indices.

As another experiment, it would be helpful to know whether your Mutect2 command succeeds if you specify a specific interval string like "1:1000000-2000000" (doesn't matter which interval) for the -L argument instead of the interval_list file (but keeping all other inputs the same).

@jkobject
Copy link

Yes it works with this input

@droazen
Copy link
Contributor Author

droazen commented Mar 11, 2022

@jkobject Aha, thanks for the confirmation!

@droazen
Copy link
Contributor Author

droazen commented Mar 11, 2022

PR to fix this is up at googleapis/java-storage-nio#858

@droazen
Copy link
Contributor Author

droazen commented Mar 16, 2022

Our PR has been merged, and a release with the fix is in progress: googleapis/java-storage-nio#864

lbergelson added a commit that referenced this issue Mar 18, 2022
* The previous attempt to fix requester pays didn't fix it in many cases.
This incorporates a newer version of the NIO library with several patches to fix
edge cases we were hitting.
  * googleapis/java-storage-nio#850
  * googleapis/java-storage-nio#856
  * googleapis/java-storage-nio#857
* upgrade com.google.cloud:google-cloud-nio:0.123.23 ->0.123.25
* fixes #7716
lbergelson added a commit that referenced this issue Mar 18, 2022
* The previous attempt to fix requester pays didn't fix it in many cases.
This incorporates a newer version of the NIO library with several patches to fix
edge cases we were hitting.
  * googleapis/java-storage-nio#849
  * googleapis/java-storage-nio#856
  * googleapis/java-storage-nio#857
* upgrade com.google.cloud:google-cloud-nio:0.123.23 ->0.123.25
* fixes #7716
lbergelson added a commit that referenced this issue Mar 18, 2022
* The previous attempt to fix requester pays didn't fix it in many cases.
This incorporates a newer version of the NIO library with several patches to fix
edge cases we were hitting.
  * googleapis/java-storage-nio#849
  * googleapis/java-storage-nio#856
  * googleapis/java-storage-nio#857
* upgrade com.google.cloud:google-cloud-nio:0.123.23 ->0.123.25
* fixes #7716
@jkobject
Copy link

Hello!

Any update on this?

Best,

@droazen
Copy link
Contributor Author

droazen commented Mar 24, 2022

@jkobject We are currently testing the new google-cloud-nio release in #7730. In that branch, we added tests for all possible combinations of requester-pays and non-requester-pays inputs, but not all of the tests are passing yet. We're still looking into the reasons for the failures -- it could be a flaw in the way the tests are set up, or it could be that the failures are real. We should have an update by end of the week.

@droazen
Copy link
Contributor Author

droazen commented Mar 25, 2022

@jkobject Here's an update for you on this issue, as promised: we've tested the patch in #7730 extensively on both our local machines and on a clean Google Cloud VM, and found it to work perfectly with all kinds of requester pays inputs to GATK (fastas, bams, vcfs, interval_lists, etc.). We now believe that the test failures in the PR are artifacts of some configuration issue in our Travis CI test environment, and that the PR does actually fix requester pays support in GATK.

We are considering merging and releasing the branch as-is, and dealing with the issues in our test suite post-release. It would make us more confident doing this if you could test the branch out as well on your end and confirm whether it works for you. You can do this using the following commands:

git clone https://github.com/broadinstitute/gatk.git gatk
cd gatk
git fetch
git checkout -b lb_refix_requester_pays origin/lb_refix_requester_pays
./gradlew clean installDist
./gatk <a GATK command that failed for you previously>

droazen added a commit that referenced this issue Apr 7, 2022
* The previous attempt to fix requester pays didn't fix it in many cases.
This incorporates a newer version of the NIO library with several patches to fix
edge cases we were hitting.
  * googleapis/java-storage-nio#849
  * googleapis/java-storage-nio#856
  * googleapis/java-storage-nio#857
* upgrade com.google.cloud:google-cloud-nio:0.123.23 ->0.123.25

fixes #7716

Co-authored-by: David Roazen <droazen@broadinstitute.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants