Introduce Allocable to API Node.Status #13984

dchen1107 · 2015-09-15T19:09:52Z

Currently Node.Status has Capacity, but no concept of Machine Allocable to serve several purposes:

For Kubernetes 1.0 release, we introduced "/docker-daemon", "/kubelet", "/kube-proxy", "/system" etc. raw containers, so that we could monitor their resource usage pattern, and detects regression easily. In a long run, we want cap their usage uncertain limit / request. We don't do it know because 1) docker is still using tons of computing resource, and consequence of constraint docker's resource consumption is pretty high. 2) There is no NodeSpec yet, and we cannot control Kubenetes nodes, and OSS users might introduce arbitrary daemons to a given node which makes /system unmanageable. Even with above resources, we cannot do a full resource management / control on the node, but introducing Allocable concept to node could prevent real bad resource overcommit.
For mesos, hadoop, etc. integration, they might want to partition compute resource on a given node, and limit how much Kubelet use; meanwhile they can query kubelet, and reserve some portion of the rest for their own purpose.

I proposed

Introduce a flag called --allocable-resources to Kubelet for now, and reports Allocable to upstream layers. In a long run, we could replace such flag with more sophisticated configuration through some machine / node3 management components.
All upstream control components including scheduler, kubelet should do feasibility checking against Node.Status.Allocable, instead of Capacity.

cc/ @bgrant0607 @davidopp @sttts @karlkfi @vishh

dchen1107 · 2015-09-15T19:10:45Z

cc/ @kubernetes/goog-node

sttts · 2015-09-15T19:18:04Z

Allocable values should be changeable during the node life cycle. In the Mesos case, the resources of a slave might change dynamically (technically when the executor reregisters). It's enough if the values can be patched in the apiserver object, from the executor.

bgrant0607 · 2015-09-17T00:33:32Z

I assume the kubelet will post the value to apiserver.

Would a Kubelet config file be easier for you to update than its flags? #12245 is in progress, and we also hope to wrap up #1627 in the 1.2 timeframe.

karlkfi · 2015-09-17T00:43:04Z

As long as the config is settable for initial launch of a new kubelet and updatable at runtime, I'm not sure the method is super important.

dchen1107 · 2015-10-09T23:00:00Z

#14532

derekwaynecarr · 2015-11-16T22:33:55Z

The primary operator goal here is that I should be able to eliminate the need to do a static-pod for resource reservation, and the kubelet should support a dynamic resource reservation model for incompressible resources like memory/disk. For things like CPU, I know we have issues where CPU usage spikes as the number of pods on the node increases, but I am less concerned on that in the near term.

I need to take a deeper look tomorrow, but I think I recall that there are open issues to resolve around how we re-parent system daemons when running in a systemd environment.

Open question:

If/when we reparent all containers in a common cgroup based on qos tier, do you guys have any thoughts on differentiating allocable based on qos tier at all?

timstclair · 2015-11-16T22:41:03Z

I'm not sure I understand the question. Are you proposing having different reservations at different QoS tiers? I don't see how that would work since kubelet doesn't control what is running in the reserved portions.

vishh · 2015-11-16T23:57:59Z

I need to take a deeper look tomorrow, but I think I recall that there are open issues to resolve around how we re-parent system daemons when running in a systemd environment.

Kubelet can auto-detect systemd deployments and avoid re-parenting system daemons.

If/when we reparent all containers in a common cgroup based on qos tier, do you guys have any thoughts on differentiating allocable based on qos tier at all?

Are you referring to per qos class quota? If the node exposes detailed usage information, the policy around how the resource are distributed across qos classes can probably be managed in higher layers.

derekwaynecarr · 2015-11-18T22:33:20Z

@vishh - makes sense.

dchen1107 · 2016-02-24T16:22:05Z

I am closing this one. We are going to measure once the release is cut and decide the values for those flags.

dchen1107 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/api Indicates an issue on api area. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 15, 2015

dchen1107 mentioned this issue Sep 15, 2015

Resource related conformance tests fail in non-exclusive Mesos environments d2iq-archive/kubernetes-mesos#489

Closed

3 tasks

dchen1107 mentioned this issue Sep 24, 2015

Node becomes unstable when overcommit on memory qos, does not always recover #14452

Closed

sttts mentioned this issue Sep 29, 2015

MESOS: pass node capacity values to kubelet #14260

Merged

dchen1107 assigned timstclair Oct 9, 2015

dchen1107 added this to the v1.2-candidate milestone Oct 9, 2015

sttts mentioned this issue Nov 9, 2015

Take max-pods into account in the scheduler d2iq-archive/kubernetes-mesos#613

Open

dchen1107 added the area/system-requirement label Nov 12, 2015

timstclair mentioned this issue Nov 13, 2015

Proposal: Node Allocatable Resources #17201

Merged

dchen1107 mentioned this issue Nov 16, 2015

Out of resource killing #17186

Closed

dchen1107 mentioned this issue Nov 18, 2015

Write resource management overview #15673

Open

timstclair mentioned this issue Dec 4, 2015

Add Allocatable to NodeStatus #18189

Merged

dchen1107 mentioned this issue Dec 12, 2015

Implement node Allocatable #18611

Merged

dchen1107 mentioned this issue Jan 20, 2016

Use Allocatable to replace Capacity #19083

Merged

bgrant0607 modified the milestones: v1.2, v1.2-candidate Jan 29, 2016

dchen1107 closed this as completed Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Allocable to API Node.Status #13984

Introduce Allocable to API Node.Status #13984

dchen1107 commented Sep 15, 2015

dchen1107 commented Sep 15, 2015

sttts commented Sep 15, 2015

bgrant0607 commented Sep 17, 2015

karlkfi commented Sep 17, 2015

dchen1107 commented Oct 9, 2015

derekwaynecarr commented Nov 16, 2015

timstclair commented Nov 16, 2015

vishh commented Nov 16, 2015

derekwaynecarr commented Nov 18, 2015

dchen1107 commented Feb 24, 2016

Introduce Allocable to API Node.Status #13984

Introduce Allocable to API Node.Status #13984

Comments

dchen1107 commented Sep 15, 2015

dchen1107 commented Sep 15, 2015

sttts commented Sep 15, 2015

bgrant0607 commented Sep 17, 2015

karlkfi commented Sep 17, 2015

dchen1107 commented Oct 9, 2015

derekwaynecarr commented Nov 16, 2015

timstclair commented Nov 16, 2015

vishh commented Nov 16, 2015

derekwaynecarr commented Nov 18, 2015

dchen1107 commented Feb 24, 2016