Implement alpha version of PreferAvoidPods #20699

jiangyaoguo · 2016-02-05T08:25:01Z

This is part of #18853

This change is

k8s-github-robot · 2016-02-05T10:13:33Z

The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label?

k8s-github-robot · 2016-02-05T15:27:37Z

PR needs rebase

k8s-github-robot · 2016-02-06T21:02:44Z

PR needs rebase

davidopp · 2016-02-07T18:07:40Z

cc/ @smarterclayton @mikedanese @bgrant0607

davidopp · 2016-02-07T18:12:05Z

pkg/api/v1/types.go

@@ -2229,6 +2229,11 @@ type DeleteOptions struct {
 	// specified type will be used.
 	// Defaults to a per object value if not specified. zero means delete immediately.
 	GracePeriodSeconds *int64 `json:"gracePeriodSeconds"`
+
+	// If true, when deleting the pod managed by a replication controller, scheduler will try to put the replacement


Slightly different wording that might be a bit clearer: "If the object to be deleted is a pod managed by a controller, setting to true requests that the replacement pod that the controller creates (if the controller decides to create such a pod) be scheduled to a different node than the one where it is currently running. Otherwise has no effect."

smarterclayton · 2016-02-07T18:20:30Z

I thought we said in the other thread this needs to be on a new API object and that we have to be able to convert both DeleteOptions and PodDeleteOptions to the internal type?

davidopp · 2016-02-07T19:55:26Z

Here are the two comments I think you're referring to:

#18853 (comment)
#18853 (comment)

I interpreted these comments to mean that anything beyond what @jiangyaoguo was a "nice to have."

Do we foresee other Pod-specific delete options? It doesn't seem so bad to just put in a comment "has no effect when applied to non-Pod" unless there are going to be multiple other scenarios like this.

cc/ @lavalamp

smarterclayton · 2016-02-07T21:16:32Z

We can always change the schema in the future to take away the flag since
it would have no effect, but we wouldn't be able to be stricter in
validating incoming requests, because the field would be accepted by
version 1.2 (even if ignored). So it paints us into a corner a little bit
more than today.

On Sun, Feb 7, 2016 at 2:55 PM, David Oppenheimer notifications@github.com
wrote:

Here are the two comments I think you're referring to:

#18853 (comment)
#18853 (comment)
#18853 (comment)
#18853 (comment)

I interpreted these comments to mean that anything beyond what
@jiangyaoguo https://github.com/jiangyaoguo was a "nice to have."

Do we foresee other Pod-specific delete options? It doesn't seem so bad to
just put in a comment "has no effect when applied to non-Pod" unless there
are going to be multiple other scenarios like this.

cc/ @lavalamp https://github.com/lavalamp

—
Reply to this email directly or view it on GitHub
#20699 (comment)
.

k8s-github-robot · 2016-02-08T04:14:38Z

@jiangyaoguo
You must link to the test flake issue which caused you to request this manual re-test.
Re-test requests should be in the form of: k8s-bot test this issue: #<number>
Here is the list of open test flakes.

k8s-github-robot · 2016-02-08T16:55:40Z

PR needs rebase

lavalamp · 2016-02-09T01:41:12Z

I would really prefer to make a specific PodDeletionOptions object. How urgent is this?

davidopp · 2016-02-10T09:07:01Z

Quick notes from conversation with @lavalamp today:

The fact that AvoidPreviousNode was requested in the api.DeleteOptions needs to be persisted somewhere by the API server, presumably an annotation on the pod that it then deletes. An alternative would be to just have the client who requests the delete add the annotation before doing the delete, and not bother with adding this option to DeleteOptions at all.
More controversial alternatives
- have a new API resource that records the information; it would be available to all controllers (though it's not clear that other controllers than the one that created the pod need the information), and could eventually generalize to a RateLimiter-type object that records pod-machine pairs to avoid
- similar to previous sub-bullet but put the information in the Node object rather than a new object

I'm pretty opposed to the "more controversial alternatives." It seems overly complicated, at least in the sense that associating the "avoid previous machine" information with the pod that is being deleted avoids any concern about garbage collecting the information, as compared to storing it in some global structure (as in either of the two "more controversial alternatives").

I'm on the fence about the first bullet.

Urgency: This is not super-urgent; it blocks deployment of rescheduler, but we haven't even started implementing rescheduler.

jiangyaoguo · 2016-07-05T02:26:49Z

@davidopp rebased.

davidopp · 2016-07-09T22:13:33Z

LGTM

Very sorry for the delay in reviewing. Please rebase one more time, and then I will add LGTM label.

Thanks for your work on this!

@bgrant0607

Automatic merge from submit-queue [WIP/RFC] Rescheduling in Kubernetes design proposal Proposal by @bgrant0607 and @davidopp (and inspired by years of discussion and experience from folks who worked on Borg and Omega). This doc is a proposal for a set of inter-related concepts related to "rescheduling" -- that is, "moving" an already-running pod to a new node in order to improve where it is running. (Specific concepts discussed are priority, preemption, disruption budget, quota, `/evict` subresource, and rescheduler.) Feedback on the proposal is very welcome. For now, please stick to comments about the design, not spelling, punctuation, grammar, broken links, etc., so we can keep the doc uncluttered enough to make it easy for folks to comment on the more important things. ref/ #22054 #18724 #19080 #12611 #20699 #17393 #12140 #22212 @HaiyangDING @mqliang @derekwaynecarr @kubernetes/sig-scheduling @kubernetes/huawei @timothysc @mml @dchen1107

davidopp · 2016-07-12T06:51:04Z

Please squash the commits to one commit.

davidopp · 2016-07-13T08:41:39Z

@jiangyaoguo your new commit has a compiler error:

# k8s.io/kubernetes/plugin/pkg/scheduler/algorithm/priorities
plugin/pkg/scheduler/algorithm/priorities/priorities.go:307: nodes.Items undefined (type []*"k8s.io/kubernetes/pkg/api".Node has no field or method Items)
plugin/pkg/scheduler/algorithm/priorities/priorities.go:314: nodes.Items undefined (type []*"k8s.io/kubernetes/pkg/api".Node has no field or method Items)

I didn't look carefully, but it might be that you need to update your code to incorporate the change made in #28781

jiangyaoguo · 2016-07-13T08:57:59Z

@davidopp I encounter some network problem now. Will update soon.

…fic node. 1. define PreferAvoidPods annotation 2. add PreferAvoidPodsPriority 3. validate AvoidPods in node annotations

k8s-bot · 2016-07-13T09:45:33Z

GCE e2e build/test passed for commit 4e91166.

jiangyaoguo · 2016-07-13T11:24:55Z

@davidopp

wojtek-t · 2016-07-13T11:36:30Z

plugin/pkg/scheduler/algorithm/priorities/priorities.go

+	return nodePreferAvoid.CalculateNodePreferAvoidPodsPriority
+}
+
+func (npa *NodePreferAvoidPod) CalculateNodePreferAvoidPodsPriority(pod *api.Pod, nodeNameToInfo map[string]*schedulercache.NodeInfo, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) {


@davidopp @jiangyaoguo - do we know the preformance impact of it? Since you are switching it on by default, it would be good to know...

No, we haven't tested performance impact. What would you think of the following:

for now, at the beginning of the function put a loop over all nodes, that checks to see if

api.GetAvoidPodsFromNodeAnnotations(node.Annotations)

has a value for any of the nodes. If it is not set for any nodes, the function immediately returns a HostPiorityList with all nodes having same priority. This way the significant performance impact only happens if you actually use the feature, and we can warn users about it.

hopefully ownerReference will be implemented for ReplicaSet and ReplicationController by the time of code freeze for 1.4 (RC is already in the final stages of review, will merge very soon), and then we can use it instead of using rcs, err := npa.controllerLister.GetPodControllers(pod) (and similar one for ReplicaSets).

Does that sound reasonable?

So let's do this way:

let's merge as it is now (basically in selector spreading we are still calling all the functions that we have here, so it shouldn't be worse)

Once this is merged, I will do a pass over it and fix some issues that I already fixed for other priorities/predicates

once ControllerRef is ready, I will switch to that (together with selector_spreading)

[Generally, the asymptotical complexity of it is the same as selector spreading, We should probably solve both of them in the same way in the future.]

SGTM. Thanks!

davidopp · 2016-07-14T06:48:35Z

LGTM

k8s-github-robot · 2016-07-14T09:53:13Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-07-14T09:55:01Z

@k8s-bot test this issue: #IGNORE

Tests have been pending for 24 hours

k8s-bot · 2016-07-14T10:30:14Z

GCE e2e build/test passed for commit 4e91166.

k8s-github-robot · 2016-07-14T10:30:25Z

Automatic merge from submit-queue

k8s-bot · 2016-07-14T10:31:26Z

GCE e2e build/test passed for commit 4e91166.

vishh · 2017-06-21T18:56:59Z

@davidopp Is this feature documented anywhere?

googlebot added the cla: yes label Feb 5, 2016

k8s-github-robot assigned thockin Feb 5, 2016

k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 5, 2016

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from c9bc79d to 449d956 Compare February 5, 2016 09:33

k8s-github-robot added the needs-ok-to-merge label Feb 5, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 5, 2016

thockin assigned davidopp and unassigned thockin Feb 5, 2016

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from 449d956 to 25e45b4 Compare February 6, 2016 01:29

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2016

davidopp reviewed Feb 7, 2016
View reviewed changes

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from 25e45b4 to 162c832 Compare February 8, 2016 03:06

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2016

davidopp changed the title ~~Add AvoidPreviousNodeto api.DeleteOptions~~ Add AvoidPreviousNode to api.DeleteOptions Feb 10, 2016

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from 162c832 to 8dd3bbd Compare February 15, 2016 06:27

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2016

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from d409168 to 1dacdd3 Compare July 4, 2016 12:53

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 4, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2016

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from 1dacdd3 to abd777f Compare July 12, 2016 05:43

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from abd777f to 658ed48 Compare July 13, 2016 08:16

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2016

Use PreferAvoidPods annotation to avoid pods being scheduled to speci…

4e91166

…fic node. 1. define PreferAvoidPods annotation 2. add PreferAvoidPodsPriority 3. validate AvoidPods in node annotations

jiangyaoguo force-pushed the add-AvoidPreviousNode branch from 658ed48 to 4e91166 Compare July 13, 2016 09:13

wojtek-t reviewed Jul 13, 2016
View reviewed changes

davidopp added release-note Denotes a PR that will be considered when it comes time to generate release notes. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 14, 2016

k8s-github-robot merged commit 06939c5 into kubernetes:master Jul 14, 2016

davidopp mentioned this pull request Mar 18, 2017

Support reason parameter on pod delete #1462

Closed

davidopp mentioned this pull request Jun 21, 2017

Design proposal for pod preemption in Kubernetes kubernetes/community#745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement alpha version of PreferAvoidPods #20699

Implement alpha version of PreferAvoidPods #20699

jiangyaoguo commented Feb 5, 2016 •

edited by davidopp

Loading

k8s-github-robot commented Feb 5, 2016

k8s-github-robot commented Feb 5, 2016

k8s-github-robot commented Feb 6, 2016

davidopp commented Feb 7, 2016

davidopp Feb 7, 2016

smarterclayton commented Feb 7, 2016 via email

davidopp commented Feb 7, 2016

smarterclayton commented Feb 7, 2016

k8s-github-robot commented Feb 8, 2016

k8s-github-robot commented Feb 8, 2016

lavalamp commented Feb 9, 2016

davidopp commented Feb 10, 2016

jiangyaoguo commented Jul 5, 2016

davidopp commented Jul 9, 2016

davidopp commented Jul 12, 2016

davidopp commented Jul 13, 2016

jiangyaoguo commented Jul 13, 2016

k8s-bot commented Jul 13, 2016

jiangyaoguo commented Jul 13, 2016

wojtek-t Jul 13, 2016

davidopp Jul 13, 2016

wojtek-t Jul 14, 2016

davidopp Jul 14, 2016

davidopp commented Jul 14, 2016

k8s-github-robot commented Jul 14, 2016

k8s-github-robot commented Jul 14, 2016

k8s-bot commented Jul 14, 2016

k8s-github-robot commented Jul 14, 2016

k8s-bot commented Jul 14, 2016

vishh commented Jun 21, 2017

Implement alpha version of PreferAvoidPods #20699

Implement alpha version of PreferAvoidPods #20699

Conversation

jiangyaoguo commented Feb 5, 2016 • edited by davidopp Loading

k8s-github-robot commented Feb 5, 2016

k8s-github-robot commented Feb 5, 2016

k8s-github-robot commented Feb 6, 2016

davidopp commented Feb 7, 2016

davidopp Feb 7, 2016

Choose a reason for hiding this comment

smarterclayton commented Feb 7, 2016 via email

davidopp commented Feb 7, 2016

smarterclayton commented Feb 7, 2016

k8s-github-robot commented Feb 8, 2016

k8s-github-robot commented Feb 8, 2016

lavalamp commented Feb 9, 2016

davidopp commented Feb 10, 2016

jiangyaoguo commented Jul 5, 2016

davidopp commented Jul 9, 2016

davidopp commented Jul 12, 2016

davidopp commented Jul 13, 2016

jiangyaoguo commented Jul 13, 2016

k8s-bot commented Jul 13, 2016

jiangyaoguo commented Jul 13, 2016

wojtek-t Jul 13, 2016

Choose a reason for hiding this comment

davidopp Jul 13, 2016

Choose a reason for hiding this comment

wojtek-t Jul 14, 2016

Choose a reason for hiding this comment

davidopp Jul 14, 2016

Choose a reason for hiding this comment

davidopp commented Jul 14, 2016

k8s-github-robot commented Jul 14, 2016

k8s-github-robot commented Jul 14, 2016

k8s-bot commented Jul 14, 2016

k8s-github-robot commented Jul 14, 2016

k8s-bot commented Jul 14, 2016

vishh commented Jun 21, 2017

jiangyaoguo commented Feb 5, 2016 •

edited by davidopp

Loading