Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make maximum number of EBS attached volumes a tunable option #22994

Closed
jeremyeder opened this issue Mar 15, 2016 · 4 comments
Closed

Make maximum number of EBS attached volumes a tunable option #22994

jeremyeder opened this issue Mar 15, 2016 · 4 comments

Comments

@jeremyeder
Copy link

We would like to be able to override the maximum number of EBS volumes attached to one node. It can be wrapped in warnings about non-support etc.... The purpose is to do exploratory work on that limit, without having to recompile the application.

One suitable option is mentioned by @jsafrane here: #22942 (comment)

IOW, add KUBE_MAX_PD_VOLS=200 in /etc/sysconfig/atomic-openshift-master and restart.

/cc @ekuric @justinsb

@jsafrane
Copy link
Member

I have sort of proof of concept that removes any checks from kubelet. This way, only scheduler enforces limit of nr. of attached volumes to a node. And it's wrong:

  1. I have 39 pods, each with its own EBS volume, they run happily.
  2. I schedule another 39 pods. Scheduler leaves them as 'Pending' / 'failed to fit in any node'. So far so good.
  3. Now, I delete my 39 running pods. Scheduler does not see them and starts the new pods.
  4. Kubelet starts slowly attaching volumes for the new pods and detaching volumes for the old pods and at one time I got ~50 volumes attached to my node. That's IMO bad. Of course, kubelet cleaned the mess eventually and I ended up with 39 volumes attached, still, we should not go over the limit.

Any suggestion? I'll probably restore my old patch with checking KUBE_MAX_PD_VOLS also in kubelet.

@childsb
Copy link
Contributor

childsb commented Mar 18, 2016

@jsafrane can you explain the bug if we leave the kubelet code alone? is the upper bound on attachable AWS /dev/ devices always 39? Where is the second check in the kubelet? (i'm looking at: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L908)

This spot looks like its just using a range of possible devices and not a 2nd quota check/enforcement.

@jsafrane
Copy link
Member

can you explain the bug if we leave the kubelet code alone?

Scheduler recognizes an env. variable to tun number of EBS/PDs that can be attached to a AWS/GCE node. If it's higher than 39, any pods assigned to a node with 39 EBS volumes will fail with "Too many EBS volumes attached to node".

Where is the second check in the kubelet

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1064

@jsafrane
Copy link
Member

I've created PR #23254, it will just attach as many volumes as scheduler wants. This can lead to more than KUBE_MAX_PD_VOLS attached to a node. I'd say it's up to any future attach controller to enforce KUBE_MAX_PD_VOLS properly.

Also, we should not depend on magic env. variables. There should be clear configuration option for such values.

k8s-github-robot pushed a commit that referenced this issue May 19, 2016
Automatic merge from submit-queue

AWS: Move enforcement of attached AWS device limit from kubelet to scheduler

Limit of nr. of attached EBS volumes to a node is now enforced by scheduler. It can be adjusted by `KUBE_MAX_PD_VOLS` env. variable there. Therefore we don't need the same check in kubelet. If the system admin wants to attach more, we should allow it.

Kubelet limit is now 650 attached volumes ('ba'..'zz').

Note that the scheduler counts only *pods* assigned to a node. When a pod is deleted and a new pod is scheduled on a node, kubelet start (slowly) detaching the old volume and (slowly) attaching the new volume. Depending on AWS speed **it may happen that more than KUBE_MAX_PD_VOLS volumes are actually attached to a node for some time!** Kubelet will clean it up in few seconds / minutes (both attach/detach is quite slow).

Fixes #22994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants