Limitrange request causes indefinite amount of pods spawned #93750

dza89 · 2020-08-06T12:44:21Z

What happened:
Pods keep spawning indefinitely with OutOfcpu error

What you expected to happen:
A single error returned

How to reproduce it (as minimally and precisely as possible):

Add limitrange:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limit-range
  namespace: "{{ meta.name }}"
spec:
  limits:
  - default:
      cpu: 1
      memory: 512Mi
    defaultRequest:
      cpu: 1
      memory: 256Mi
    type: Container

Make sure your nodes have less resources available then 1 CPU (set high for this example)

Do a dummy deployment without any resources set

Anything else we need to know?:

My findings:
Since the limits are set on container level, the kubelet receives the requests, then the requestlimit is added and the container cannot be scheduled, the kubelet returns OutOfcpu and the replicaset deploys a new pod. And this starts an endless loop of pod spawning.

Environment:

Kubernetes version 16.1
Cloud provider or hardware configuration: EKS
OS (e.g: cat /etc/os-release): Amazon Linux 2
Kernel (e.g. uname -a): 4.14.186-146.268.amzn2.x86_64
Others:
kubeletVersion: v1.16.13-eks-2ba888

The text was updated successfully, but these errors were encountered:

dza89 · 2020-08-06T12:44:41Z

/sig scheduling

cablespaghetti · 2020-08-06T14:28:17Z

Oh great it's not just us! We're also on EKS v1.16. This started last week for us...I wonder if it's an EKS problem specifically...

edit: That said. We're not using LimitRange.

dza89 · 2020-08-06T14:33:42Z

@cablespaghetti
Hmm, it was exactly the same for us. After swapping to the latest EKS AMI. I suspect a different version of the kubelet.
Maybe it's not limited to limitrange? Removing this however did fix our issue.

cablespaghetti · 2020-08-06T14:46:42Z

I'm downgrading to ami-05ac566a7ec2378db which is from May but was the previous AMI we were running. Will feedback if that fixes it...

edit: To clarify this is going from a 1.16.12 or 1.16.13 AMI to a 1.16.8 one.

dza89 · 2020-08-06T15:28:11Z

You don't have anything else, like a policy that applies resources on the container level?

cablespaghetti · 2020-08-06T15:48:31Z

We don't no. Just heard back from AWS:

Based on the details you provided, I understand that many of your pods are failing to start due to "OutOfcpu" although worker instances have enough CPU available for pods.

As you suggested, I have looked into similar reports by different customers and indeed I detected that this behavior was reported and it was correlated to EKS AMI versions 1.16.10+ , as it seems to be related to this kubernetes issue[1] which was reported for K8 version 1.16.10 .

Based on the previous analysis on this issue, our suggested approach is to update your cluster to version 1.17 if applicable, otherwise in case you don't prefer update approach, as you are currently considering you can retain earlier AMI such as 1.16.8 .

On behalf of AWS I apologize for any inconvenience caused by this issue, I hope that this information assist you with addressing this issue, meanwhile feel free to update me with any further queries or inputs you may need to add.

References:
[1] #90455

I spent a while puzzling over this and it seems to be that this is a problem because the control plane on EKS 1.16 is 1.16.8 which pre-dates this change in later versions of the kubelet.

dza89 · 2020-08-06T18:19:43Z

Ah ok, I saw it happen numerous times with init containers.
Removing the limitranger worked because then there were no more resources definined.
/close

k8s-ci-robot · 2020-08-06T18:19:57Z

@dza89: Closing this issue.

In response to this:

Ah ok, I saw it happen numerous times with init containers.
Removing the limitranger worked because then there were no more resources definined.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dza89 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 6, 2020

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 6, 2020

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 6, 2020

mrbobbytables mentioned this issue Aug 6, 2020

Limitrange request causes indefinite amount of pods spawned kubernetes/community#5011

Closed

k8s-ci-robot closed this as completed Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limitrange request causes indefinite amount of pods spawned #93750

Limitrange request causes indefinite amount of pods spawned #93750

dza89 commented Aug 6, 2020 •

edited

Loading

dza89 commented Aug 6, 2020

cablespaghetti commented Aug 6, 2020 •

edited

Loading

dza89 commented Aug 6, 2020

cablespaghetti commented Aug 6, 2020 •

edited

Loading

dza89 commented Aug 6, 2020

cablespaghetti commented Aug 6, 2020 •

edited

Loading

dza89 commented Aug 6, 2020

k8s-ci-robot commented Aug 6, 2020

Limitrange request causes indefinite amount of pods spawned #93750

Limitrange request causes indefinite amount of pods spawned #93750

Comments

dza89 commented Aug 6, 2020 • edited Loading

dza89 commented Aug 6, 2020

cablespaghetti commented Aug 6, 2020 • edited Loading

dza89 commented Aug 6, 2020

cablespaghetti commented Aug 6, 2020 • edited Loading

dza89 commented Aug 6, 2020

cablespaghetti commented Aug 6, 2020 • edited Loading

dza89 commented Aug 6, 2020

k8s-ci-robot commented Aug 6, 2020

dza89 commented Aug 6, 2020 •

edited

Loading

cablespaghetti commented Aug 6, 2020 •

edited

Loading

cablespaghetti commented Aug 6, 2020 •

edited

Loading

cablespaghetti commented Aug 6, 2020 •

edited

Loading