-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement alpha version of PreferAvoidPods #20699
Implement alpha version of PreferAvoidPods #20699
Conversation
c9bc79d
to
449d956
Compare
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
PR needs rebase |
449d956
to
25e45b4
Compare
PR needs rebase |
@@ -2229,6 +2229,11 @@ type DeleteOptions struct { | |||
// specified type will be used. | |||
// Defaults to a per object value if not specified. zero means delete immediately. | |||
GracePeriodSeconds *int64 `json:"gracePeriodSeconds"` | |||
|
|||
// If true, when deleting the pod managed by a replication controller, scheduler will try to put the replacement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly different wording that might be a bit clearer: "If the object to be deleted is a pod managed by a controller, setting to true requests that the replacement pod that the controller creates (if the controller decides to create such a pod) be scheduled to a different node than the one where it is currently running. Otherwise has no effect."
I thought we said in the other thread this needs to be on a new API
object and that we have to be able to convert both DeleteOptions and
PodDeleteOptions to the internal type?
|
Here are the two comments I think you're referring to: #18853 (comment) I interpreted these comments to mean that anything beyond what @jiangyaoguo was a "nice to have." Do we foresee other Pod-specific delete options? It doesn't seem so bad to just put in a comment "has no effect when applied to non-Pod" unless there are going to be multiple other scenarios like this. cc/ @lavalamp |
We can always change the schema in the future to take away the flag since On Sun, Feb 7, 2016 at 2:55 PM, David Oppenheimer notifications@github.com
|
25e45b4
to
162c832
Compare
@jiangyaoguo |
PR needs rebase |
I would really prefer to make a specific PodDeletionOptions object. How urgent is this? |
Quick notes from conversation with @lavalamp today:
I'm pretty opposed to the "more controversial alternatives." It seems overly complicated, at least in the sense that associating the "avoid previous machine" information with the pod that is being deleted avoids any concern about garbage collecting the information, as compared to storing it in some global structure (as in either of the two "more controversial alternatives"). I'm on the fence about the first bullet. Urgency: This is not super-urgent; it blocks deployment of rescheduler, but we haven't even started implementing rescheduler. |
162c832
to
8dd3bbd
Compare
d409168
to
1dacdd3
Compare
@davidopp rebased. |
LGTM Very sorry for the delay in reviewing. Please rebase one more time, and then I will add LGTM label. Thanks for your work on this! |
Automatic merge from submit-queue [WIP/RFC] Rescheduling in Kubernetes design proposal Proposal by @bgrant0607 and @davidopp (and inspired by years of discussion and experience from folks who worked on Borg and Omega). This doc is a proposal for a set of inter-related concepts related to "rescheduling" -- that is, "moving" an already-running pod to a new node in order to improve where it is running. (Specific concepts discussed are priority, preemption, disruption budget, quota, `/evict` subresource, and rescheduler.) Feedback on the proposal is very welcome. For now, please stick to comments about the design, not spelling, punctuation, grammar, broken links, etc., so we can keep the doc uncluttered enough to make it easy for folks to comment on the more important things. ref/ #22054 #18724 #19080 #12611 #20699 #17393 #12140 #22212 @HaiyangDING @mqliang @derekwaynecarr @kubernetes/sig-scheduling @kubernetes/huawei @timothysc @mml @dchen1107
1dacdd3
to
abd777f
Compare
Please squash the commits to one commit. |
abd777f
to
658ed48
Compare
@jiangyaoguo your new commit has a compiler error:
I didn't look carefully, but it might be that you need to update your code to incorporate the change made in #28781 |
@davidopp I encounter some network problem now. Will update soon. |
…fic node. 1. define PreferAvoidPods annotation 2. add PreferAvoidPodsPriority 3. validate AvoidPods in node annotations
658ed48
to
4e91166
Compare
GCE e2e build/test passed for commit 4e91166. |
return nodePreferAvoid.CalculateNodePreferAvoidPodsPriority | ||
} | ||
|
||
func (npa *NodePreferAvoidPod) CalculateNodePreferAvoidPodsPriority(pod *api.Pod, nodeNameToInfo map[string]*schedulercache.NodeInfo, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidopp @jiangyaoguo - do we know the preformance impact of it? Since you are switching it on by default, it would be good to know...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we haven't tested performance impact. What would you think of the following:
- for now, at the beginning of the function put a loop over all nodes, that checks to see if
api.GetAvoidPodsFromNodeAnnotations(node.Annotations)
has a value for any of the nodes. If it is not set for any nodes, the function immediately returns a HostPiorityList with all nodes having same priority. This way the significant performance impact only happens if you actually use the feature, and we can warn users about it.
- hopefully ownerReference will be implemented for ReplicaSet and ReplicationController by the time of code freeze for 1.4 (RC is already in the final stages of review, will merge very soon), and then we can use it instead of using
rcs, err := npa.controllerLister.GetPodControllers(pod)
(and similar one for ReplicaSets).
Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's do this way:
- let's merge as it is now (basically in selector spreading we are still calling all the functions that we have here, so it shouldn't be worse)
- Once this is merged, I will do a pass over it and fix some issues that I already fixed for other priorities/predicates
- once ControllerRef is ready, I will switch to that (together with selector_spreading)
[Generally, the asymptotical complexity of it is the same as selector spreading, We should probably solve both of them in the same way in the future.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Thanks!
LGTM |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
@k8s-bot test this issue: #IGNORE Tests have been pending for 24 hours |
GCE e2e build/test passed for commit 4e91166. |
Automatic merge from submit-queue |
GCE e2e build/test passed for commit 4e91166. |
@davidopp Is this feature documented anywhere? |
This is part of #18853
This change is