Allow admin user to explicitly unschedule the node #4585

pravisankar · 2015-02-19T08:35:00Z

Node deactivation will block scheduling of new pods on the node.

googlebot · 2015-02-19T08:35:01Z

Thanks for your pull request.

It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA) at https://cla.developers.google.com/.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check the information on your CLA or see this help article on setting the email on your git commits.

Once you've done that, please reply here to let us know. If you signed the CLA as a corporation, please let us know the company's name.

pravisankar · 2015-02-19T08:38:05Z

Related discussion on #3885
@bgrant0607 @ddysher @smarterclayton

pravisankar · 2015-02-19T08:40:49Z

Covered under Red Hat corporate CLA

ddysher · 2015-02-20T03:22:59Z

We shouldn't have the custom logic in apiserver. Also, as @bgrant0607 mentioned, status should be reconstructable and be observed by k8s components, we can't have user update status I think.

An alternative proposal is to update node spec, have node controller check the spec and update status accordingly.

bgrant0607 · 2015-02-21T06:06:25Z

Discussed w/ @ddysher today.

There should be 2 fields in NodeSpec: Schedulable and Runnable. These reflect disabling of scheduling new pods and execution of any user pods, respectively, by administrative action, independent of unplanned failures.

There should also be 2 new NodeConditions, Schedulable and Runnable. The spec fields and other conditions (e.g., Ready) would be combined to provide the status values. The scheduler would then only need to look at the Schedulable condition, in the normal case.

davidopp · 2015-02-21T08:05:26Z

There should be 2 fields in NodeSpec: Schedulable and Runnable. These reflect disabling of
scheduling new pods and execution of any user pods, respectively, by administrative action,
independent of unplanned failures.

Should these be label selectors rather than binary values? Internally have a number of use cases where the administrator wants to make some subset of pods un-{schedulable,runnable} as opposed to it being a binary decision.

There should also be 2 new NodeConditions, Schedulable and Runnable

I'm not completely opposed to this, but it seems a little weird to put the logic for determining whether a node is schedulable into a controller rather than in the scheduler. Runnable seems less weird since node controller is already making decisions based on health checks.

bgrant0607 · 2015-02-21T17:07:19Z

Yes, this model was deliberately simpler than we know it will eventually need to be. We don't have any notion of priority yet, nor forgiveness, nor disruption SLOs, nor pre-stop hook timeouts (graceful termination deadlines).

I also thought about making nodes fit into the model for graceful termination of all objects: #1535. That's intended for preparation for object deletion, which wouldn't normally be the case for maintenance, but we could do something similar.

We could add a stop time goal, expected duration, and reason to the Spec, and no new conditions, and leave it entirely to the scheduler to assess schedulability and to the controller to determine runnability for any particular pod. That would provide enough information for forgiveness, disruption SLOs, and respecting graceful termination expectations. The scenarios where one wants to selectively disable certain classes of workloads in the cluster or on individual machines for other reasons are more rare.

However, I'm concerned that a more sophisticated model would be harder for users to understand, and we don't yet have any of the features that would make it necessary.

If we wanted the scheduler to only need to look at the node's status and not the spec, we could simply reflect the administrative bits into the conditions, without combining with other conditions, which is what I had previously proposed in the original NodeCondition PR.

ddysher · 2015-02-21T21:08:12Z

@pravisankar Some summary and backgroud: First, we need two new specs: Runnable and Schedulable. We can start with simple bool, but eventually, we add more information to support pod forgiveness (#1574), etc. Second, we have three conditions: Runnable, Schedulable and Ready (we'll remove Reachable for now). Kubelet will periodically push Ready status; node controller will periodically check spec and Ready status to populate other two conditions. At the moment, Node controller will also do pod eviction if Ready status is not updated for a long time. Third, scheduler will only schedule pod to schedulable nodes. As a precaution, it's up to scheduler to check other condition too.

As @bgrant0607 mentioned, there is a lot policies ommitted, but let's make it fancier later. The change involves kubelet, scheduler and node controller. I'll start kubelet POST status shortly, let me know what I can help at your side.

@davidopp runnable and schedulable is a notion of a node, by "make subset of pods un-schedulable/runnable", I guess you mean make a node unschedulable for certain type of pods? I don't think we have enough features to do this, even if we do, I think we should also use nodespec. Label is for indetifying information, whereas schedulable/runnable is status (reflection of specification). Label can be anything, but we need a clear spec for the model.

davidopp · 2015-02-22T05:39:25Z

docs/node.md

+will not affect any existing pods on the node but it will disable creation of
+any new pods on the node. Node deactivate example:
+```
+kubectl update nodes 10.1.2.3 --patch='{"apiVersion": "v1beta1", "deactivate": true}'


since it only blocks scheduling, maybe call it "stopScheduling" or something like that? deactivate seems ambiguous, and might be interpreted as also killing the running pods

pravisankar · 2015-02-23T21:22:41Z

As per the feedback,
Renamed node spec field 'deactivate' to 'unschedule'.
Moved the node unschedule logic from apiserver to node controller.
Runnable condition will be introduced in a seperate pull request.
@ddysher please review the changes

davidopp · 2015-02-23T21:36:41Z

pkg/api/types.go

 	Capacity ResourceList `json:"capacity,omitempty"`
+
+	// Unschedule controls node schedulability of new pods. By default node is schedulable.
+	Unschedule bool `json:"unschedule,omitempty"`


Please call this MakeUnschedulable (and make a similar change in the versioned structs)

pravisankar · 2015-02-23T22:38:03Z

@davidopp @ddysher renamed 'unschedule' to 'makeUnschedulable'.

bgrant0607 · 2015-03-06T01:02:16Z

pkg/api/v1beta1/types.go

@@ -704,6 +706,8 @@ type Minion struct {
 	NodeResources NodeResources `json:"resources,omitempty" description:"characterization of node resources"`
 	// Pod IP range assigned to the node
 	PodCIDR string `json:"cidr,omitempty" description:"IP range assigned to the node"`
+	// Unschedulable controls node schedulability of new pods. By default node is schedulable.
+	Unschedulable bool `json:"unschedulable,omitempty" description:"enable or disable pod scheduling on the node"`


Actually, true => disable, so I'd omit "enable or".

ddysher · 2015-03-06T02:20:00Z

@pravisankar Thanks! Few minor comments.

pravisankar · 2015-03-06T20:30:46Z

@ddysher @davidopp updated

erictune · 2015-03-09T23:17:17Z

Needs rebase.

pravisankar · 2015-03-10T00:07:28Z

rebased PTAL

ddysher · 2015-03-10T20:05:59Z

Needs rebase. LGTM

pravisankar · 2015-03-11T00:06:53Z

rebased and updated swagger spec using hack/update-swagger-spec.sh

ddysher · 2015-03-12T14:22:09Z

/cc @nikhiljindal Can you take a look at the swagger spec?

bgrant0607 · 2015-03-12T21:01:27Z

@ddysher Unfortunately, the swagger contents are rearranged every time that script is run, so it's hard to inspect carefully. Hopefully we'll fix that soon. For now, we should just merge if we're happy with the changes.

bgrant0607 · 2015-03-12T21:02:00Z

Unfortunately it needs to be rebased again, though, and probably needs to re-update the swagger.

Setting Unschedulable on the node will not touch any existing pods on the node but will block scheduling of new pods on the node.

pravisankar · 2015-03-12T21:32:23Z

rebased (updated swagger spec one more time)

ddysher · 2015-03-12T22:08:42Z

LGTM, thanks for your work!

Allow admin user to explicitly unschedule the node

yujuhong assigned bgrant0607 Feb 19, 2015

pravisankar force-pushed the deactivate-node branch from 93006dc to c1435b2 Compare February 20, 2015 17:17

googlebot added the cla: no label Feb 20, 2015

pravisankar force-pushed the deactivate-node branch from c1435b2 to edbc031 Compare February 21, 2015 03:02

bgrant0607 assigned ddysher and unassigned bgrant0607 Feb 21, 2015

davidopp reviewed Feb 22, 2015
View reviewed changes

ddysher added cla: yes and removed cla: no labels Feb 23, 2015

pravisankar force-pushed the deactivate-node branch from edbc031 to 35f2ce1 Compare February 23, 2015 21:14

googlebot added cla: no and removed cla: yes labels Feb 23, 2015

pravisankar changed the title ~~Support node deactivation~~ Allow admin user to explicitly unschedule the node Feb 23, 2015

pravisankar force-pushed the deactivate-node branch from 35f2ce1 to c139454 Compare February 23, 2015 21:26

davidopp reviewed Feb 23, 2015
View reviewed changes

pravisankar force-pushed the deactivate-node branch 2 times, most recently from 500a50d to a33081b Compare February 23, 2015 22:35

pravisankar force-pushed the deactivate-node branch from a33081b to 6f47ce1 Compare February 24, 2015 00:03

bgrant0607 reviewed Mar 6, 2015
View reviewed changes

pravisankar force-pushed the deactivate-node branch from 3eee36f to fb5d8f4 Compare March 6, 2015 01:53

pravisankar force-pushed the deactivate-node branch from fb5d8f4 to 99c3580 Compare March 6, 2015 20:27

erictune mentioned this pull request Mar 9, 2015

upgrade: Rolling system updates via delete/re-create nodes #5044

Closed

pravisankar force-pushed the deactivate-node branch from 99c3580 to 0652774 Compare March 9, 2015 23:41

erictune mentioned this pull request Mar 9, 2015

Added kernel upgrade workflow documentation #5211

Merged

pravisankar force-pushed the deactivate-node branch from 0652774 to 86d4769 Compare March 11, 2015 00:04

pravisankar force-pushed the deactivate-node branch 2 times, most recently from 34bbf66 to c348f63 Compare March 11, 2015 23:04

Ravi Sankar Penta added 2 commits March 12, 2015 14:27

Allow admin user to explicitly unschedule the node

f6ecec5

Setting Unschedulable on the node will not touch any existing pods on the node but will block scheduling of new pods on the node.

Updated swagger spec

b0efb7a

pravisankar force-pushed the deactivate-node branch from c348f63 to b0efb7a Compare March 12, 2015 21:30

ddysher added lgtm "Looks good to me", indicates that a PR is ready to be merged. cla: yes and removed cla: no labels Mar 12, 2015

ddysher added a commit that referenced this pull request Mar 12, 2015

Merge pull request #4585 from pravisankar/deactivate-node

ae162fd

Allow admin user to explicitly unschedule the node

ddysher merged commit ae162fd into kubernetes:master Mar 12, 2015

bgrant0607 mentioned this pull request Apr 15, 2015

Proposal: Add "Termination Notice" #6804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow admin user to explicitly unschedule the node #4585

Allow admin user to explicitly unschedule the node #4585

pravisankar commented Feb 19, 2015

googlebot commented Feb 19, 2015

pravisankar commented Feb 19, 2015

pravisankar commented Feb 19, 2015

ddysher commented Feb 20, 2015

bgrant0607 commented Feb 21, 2015

davidopp commented Feb 21, 2015

bgrant0607 commented Feb 21, 2015

ddysher commented Feb 21, 2015

davidopp Feb 22, 2015

pravisankar commented Feb 23, 2015

davidopp Feb 23, 2015

pravisankar commented Feb 23, 2015

bgrant0607 Mar 6, 2015

ddysher commented Mar 6, 2015

pravisankar commented Mar 6, 2015

erictune commented Mar 9, 2015

pravisankar commented Mar 10, 2015

ddysher commented Mar 10, 2015

pravisankar commented Mar 11, 2015

ddysher commented Mar 12, 2015

bgrant0607 commented Mar 12, 2015

bgrant0607 commented Mar 12, 2015

pravisankar commented Mar 12, 2015

ddysher commented Mar 12, 2015

Allow admin user to explicitly unschedule the node #4585

Allow admin user to explicitly unschedule the node #4585

Conversation

pravisankar commented Feb 19, 2015

googlebot commented Feb 19, 2015

pravisankar commented Feb 19, 2015

pravisankar commented Feb 19, 2015

ddysher commented Feb 20, 2015

bgrant0607 commented Feb 21, 2015

davidopp commented Feb 21, 2015

bgrant0607 commented Feb 21, 2015

ddysher commented Feb 21, 2015

davidopp Feb 22, 2015

Choose a reason for hiding this comment

pravisankar commented Feb 23, 2015

davidopp Feb 23, 2015

Choose a reason for hiding this comment

pravisankar commented Feb 23, 2015

bgrant0607 Mar 6, 2015

Choose a reason for hiding this comment

ddysher commented Mar 6, 2015

pravisankar commented Mar 6, 2015

erictune commented Mar 9, 2015

pravisankar commented Mar 10, 2015

ddysher commented Mar 10, 2015

pravisankar commented Mar 11, 2015

ddysher commented Mar 12, 2015

bgrant0607 commented Mar 12, 2015

bgrant0607 commented Mar 12, 2015

pravisankar commented Mar 12, 2015

ddysher commented Mar 12, 2015