Description
Description
We are upgrading from 2905.2.4 to 3033.2.0 on AWS managed with Kops using the following AMI:
data "aws_ami" "flatcar" {
owners = ["075585003325"]
most_recent = true
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-stable-${var.flatcar_version}*"]
}
}
And we are getting what seems to be a performance hit. We have tightly limited workloads:
resources:
requests:
memory: 128Mi
cpu: 50m
limits:
memory: 128Mi
cpu: 500m
And some of them (specifically - based on Java SpringBoot) just unable to start after the upgrade. They just take ages to init the Java code until the probe backs off and restarts the container. We have ruled out everything else, i.e. kops version, K8s version etc - just by swapping the node group AMI from 2905.2.4
to 3033.2.0
is what triggers this behavior, under the same resource constraints and probes configuration.
Impact
We have detected this in our test clusters, and we are not able to upgrade our prod clusters. If a bunch of workloads will just unable to start after the rolling upgrade in prod - we will have a major outage on our hands.
Environment and steps to reproduce
K8s 1.20.14, kops 1.20.3, AWS.
Expected behavior
I'd expect containers able to start with the same probes and resources constraints as they were on previous versions.
Additional information
N/A