‼️ NOTICE custom-resources: various custom resources may fail to deploy / destroy #26325
Description
Status: In-Progress
What is the issue?
In #26212, we upgraded our NodeJS runtime to Node18, which meant all our custom resources now needed to operate on AWS SDK for JavaScript v3. There were a few places that we missed:
- eks: cluster-resource-handler onDelete fails for fargate and cluster events (Fix)
- s3: autoDeleteObjects fails when bucket doesn't exist on resource deletion (Fix)
- core: Custom::CrossRegionExportWriter fails (Fix) (Fix)
- integ-test-alpha: assertions are not working (Fix)
- custom-resource: ignoreErrorCodesMatching broken on AwsCustomResource with Node.js 18/AWS SDK v3 (Fix)
- redshift-alpha: Custom resource unable to find AWS SDK v3 (Fix)
- redshift-alpha: redshift cluster-reboot fails (Fix)
Who is affected?
Users of aws-cdk-lib version 2.87.0
How do I resolve this?
Upgrade to a version higher than 2.87.0
Workaround
No workaround
Original posting
Describe the bug
When running the integration tests for aws-eks or aws-stepfunctions-tasks where the cluster-resource-handler is invoked will result in a failure when onDelete is called. This is because the key code
which is caught during the exception no longer exists. Fargate's handler is similarly affected.
Expected Behavior
When calling the integration tests I expected the clusters to successfully create, update, and delete themselves.
Current Behavior
The final step of deleting the cluster fails with:
2023-07-10T18:59:56.441Z fdc661e6-24f7-4d12-ac99-51342071842d ERROR Invoke Error {
"errorType": "ResourceNotFoundException",
"errorMessage": "No cluster found for name: integrationtesteksclusterE5C0ED98-41454fc08d0746558ff42bc9a701230c.",
"name": "ResourceNotFoundException",
"$fault": "client",
"$metadata": {
"httpStatusCode": 404,
"requestId": "75681069-c732-4896-98f7-ce3fb2f8e777",
"attempts": 1,
"totalRetryDelay": 0
},
"clusterName": "integrationtesteksclusterE5C0ED98-41454fc08d0746558ff42bc9a701230c",
"nodegroupName": null,
"fargateProfileName": null,
"addonName": null,
"stack": [
"ResourceNotFoundException: No cluster found for name: integrationtesteksclusterE5C0ED98-41454fc08d0746558ff42bc9a701230c.",
" at deserializeAws_restJson1ResourceNotFoundExceptionResponse (/var/runtime/node_modules/@aws-sdk/client-eks/dist-cjs/protocols/Aws_restJson1.js:2586:23)",
" at deserializeAws_restJson1DescribeClusterCommandError (/var/runtime/node_modules/@aws-sdk/client-eks/dist-cjs/protocols/Aws_restJson1.js:1492:25)",
" at process.processTicksAndRejections (node:internal/process/task_queues:95:5)",
" at async /var/runtime/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24",
" at async /var/runtime/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:13:20",
" at async StandardRetryStrategy.retry (/var/runtime/node_modules/@aws-sdk/middleware-retry/dist-cjs/StandardRetryStrategy.js:51:46)",
" at async /var/runtime/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:6:22",
" at async ClusterResourceHandler.isDeleteComplete (/var/task/cluster.js:69:26)"
]
}
The cluster itself is deleting, but our evaluation of the result is failing, and thus is being treated as a failure.
Reproduction Steps
Check out the most recent build of aws-cdk
and run any eks test which includes a fargate profile (e.g. integ.eks-cluster-ipv6)
Possible Solution
We can change the current evaluation to use $e.metadata.httpResponseCode === 404
instead of a string evaluation against the message.
Additional Information/Context
There's a bunch of other stuff that's broken in the eks tests, especially with the helm chart for the kubernetes-dashboard. I've been working on a fix for the better part of 3 days and still haven't hit the bottom of the breakage.
This is affecting three tests in aws-stepfunctions-tasks.
I believe these failures are related to #26212 but I haven't had the time to identify the exact changes. The upgrade from aws-sdk-js v2 to v3 would have ideally triggered a re-run of all of the integration tests which use the SDK, but I don't believe the resource trees can see that difference.
CDK CLI Version
0.0.0 (build c38e784) (npx cdk)
2.42.0 (build 7d8ef0b) (local install)
Framework Version
No response
Node.js Version
v18.16.0
OS
MacOS 13.4
Language
Typescript
Language Version
No response
Other information
No response