Skip to content

‼️ NOTICE custom-resources: various custom resources may fail to deploy / destroy #26325

Closed
@kishiel

Description

Status: In-Progress

What is the issue?

In #26212, we upgraded our NodeJS runtime to Node18, which meant all our custom resources now needed to operate on AWS SDK for JavaScript v3. There were a few places that we missed:

Who is affected?

Users of aws-cdk-lib version 2.87.0

How do I resolve this?

Upgrade to a version higher than 2.87.0

Workaround

No workaround


Original posting

Describe the bug

When running the integration tests for aws-eks or aws-stepfunctions-tasks where the cluster-resource-handler is invoked will result in a failure when onDelete is called. This is because the key code which is caught during the exception no longer exists. Fargate's handler is similarly affected.

Expected Behavior

When calling the integration tests I expected the clusters to successfully create, update, and delete themselves.

Current Behavior

The final step of deleting the cluster fails with:

2023-07-10T18:59:56.441Z    fdc661e6-24f7-4d12-ac99-51342071842d    ERROR   Invoke Error    {
    "errorType": "ResourceNotFoundException",
    "errorMessage": "No cluster found for name: integrationtesteksclusterE5C0ED98-41454fc08d0746558ff42bc9a701230c.",
    "name": "ResourceNotFoundException",
    "$fault": "client",
    "$metadata": {
        "httpStatusCode": 404,
        "requestId": "75681069-c732-4896-98f7-ce3fb2f8e777",
        "attempts": 1,
        "totalRetryDelay": 0
    },
    "clusterName": "integrationtesteksclusterE5C0ED98-41454fc08d0746558ff42bc9a701230c",
    "nodegroupName": null,
    "fargateProfileName": null,
    "addonName": null,
    "stack": [
        "ResourceNotFoundException: No cluster found for name: integrationtesteksclusterE5C0ED98-41454fc08d0746558ff42bc9a701230c.",
        "    at deserializeAws_restJson1ResourceNotFoundExceptionResponse (/var/runtime/node_modules/@aws-sdk/client-eks/dist-cjs/protocols/Aws_restJson1.js:2586:23)",
        "    at deserializeAws_restJson1DescribeClusterCommandError (/var/runtime/node_modules/@aws-sdk/client-eks/dist-cjs/protocols/Aws_restJson1.js:1492:25)",
        "    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)",
        "    at async /var/runtime/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24",
        "    at async /var/runtime/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:13:20",
        "    at async StandardRetryStrategy.retry (/var/runtime/node_modules/@aws-sdk/middleware-retry/dist-cjs/StandardRetryStrategy.js:51:46)",
        "    at async /var/runtime/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:6:22",
        "    at async ClusterResourceHandler.isDeleteComplete (/var/task/cluster.js:69:26)"
    ]
}

The cluster itself is deleting, but our evaluation of the result is failing, and thus is being treated as a failure.

Reproduction Steps

Check out the most recent build of aws-cdk and run any eks test which includes a fargate profile (e.g. integ.eks-cluster-ipv6)

Possible Solution

We can change the current evaluation to use $e.metadata.httpResponseCode === 404 instead of a string evaluation against the message.

Additional Information/Context

There's a bunch of other stuff that's broken in the eks tests, especially with the helm chart for the kubernetes-dashboard. I've been working on a fix for the better part of 3 days and still haven't hit the bottom of the breakage.

This is affecting three tests in aws-stepfunctions-tasks.

I believe these failures are related to #26212 but I haven't had the time to identify the exact changes. The upgrade from aws-sdk-js v2 to v3 would have ideally triggered a re-run of all of the integration tests which use the SDK, but I don't believe the resource trees can see that difference.

CDK CLI Version

0.0.0 (build c38e784) (npx cdk)
2.42.0 (build 7d8ef0b) (local install)

Framework Version

No response

Node.js Version

v18.16.0

OS

MacOS 13.4

Language

Typescript

Language Version

No response

Other information

No response

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-eksRelated to Amazon Elastic Kubernetes ServicebugThis issue is a bug.effort/mediumMedium work item – several days of effortp1

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions