Skip to content

Performance degradation in 3033.2.0? #597

Closed
@dee-kryvenko

Description

Description

We are upgrading from 2905.2.4 to 3033.2.0 on AWS managed with Kops using the following AMI:

data "aws_ami" "flatcar" {
  owners      = ["075585003325"]
  most_recent = true

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  filter {
    name   = "name"
    values = ["Flatcar-stable-${var.flatcar_version}*"]
  }
}

And we are getting what seems to be a performance hit. We have tightly limited workloads:

        resources:
          requests:
            memory: 128Mi
            cpu: 50m
          limits:
            memory: 128Mi
            cpu: 500m

And some of them (specifically - based on Java SpringBoot) just unable to start after the upgrade. They just take ages to init the Java code until the probe backs off and restarts the container. We have ruled out everything else, i.e. kops version, K8s version etc - just by swapping the node group AMI from 2905.2.4 to 3033.2.0 is what triggers this behavior, under the same resource constraints and probes configuration.

Impact

We have detected this in our test clusters, and we are not able to upgrade our prod clusters. If a bunch of workloads will just unable to start after the rolling upgrade in prod - we will have a major outage on our hands.

Environment and steps to reproduce

K8s 1.20.14, kops 1.20.3, AWS.

Expected behavior

I'd expect containers able to start with the same probes and resources constraints as they were on previous versions.

Additional information

N/A

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions