-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization of on-wire information sent to scheduler extender interfaces that are stateful #41119
Conversation
Hi @sarat-k. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED The following people have approved this PR: sarat-k Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
I signed the CLA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments.
cc @davidopp for scheduler annotation changes
plugin/pkg/scheduler/api/types.go
Outdated
@@ -136,16 +140,27 @@ type ExtenderArgs struct { | |||
// Pod being scheduled | |||
Pod v1.Pod | |||
// List of candidate nodes where the pod can be scheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this comment too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plugin/pkg/scheduler/api/types.go
Outdated
} | ||
|
||
// FailedNodesMap represents the filtered out nodes, with node names and failure messages | ||
type FailedNodesMap map[string]string | ||
|
||
// Annotations represents the annotations | ||
type Annotations map[string]string | ||
|
||
// ExtenderFilterResult represents the results of a filter call to an extender | ||
type ExtenderFilterResult struct { | ||
// Filtered set of nodes where the pod can be scheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
// ExtenderArgs represents the arguments needed by the extender to filter/prioritize | ||
// nodes for a pod. | ||
type ExtenderArgs struct { | ||
// Pod being scheduled | ||
Pod apiv1.Pod `json:"pod"` | ||
// List of candidate nodes where the pod can be scheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment
plugin/pkg/scheduler/api/v1/types.go
Outdated
} | ||
|
||
// FailedNodesMap represents the filtered out nodes, with node names and failure messages | ||
type FailedNodesMap map[string]string | ||
|
||
// Annotations represents the annotations | ||
type Annotations map[string]string | ||
|
||
// ExtenderFilterResult represents the results of a filter call to an extender | ||
type ExtenderFilterResult struct { | ||
// Filtered set of nodes where the pod can be scheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment here too
plugin/pkg/scheduler/extender.go
Outdated
Pod: *pod, | ||
Nodes: v1.NodeList{Items: nodeItems}, | ||
Pod: *pod, | ||
Nodes: &v1.NodeList{Items: nodeItems}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes should be nil when node cache is enabled, rather than an empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it.
plugin/pkg/scheduler/extender.go
Outdated
Pod: *pod, | ||
Nodes: v1.NodeList{Items: nodeItems}, | ||
Pod: *pod, | ||
Nodes: &v1.NodeList{Items: nodeItems}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: send nil when cache is enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it
plugin/pkg/scheduler/extender.go
Outdated
filterVerb string | ||
prioritizeVerb string | ||
weight int | ||
apiVersion string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we get rid of the apiVersion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes .. cleaned this up in the latest commit.
@@ -109,15 +109,15 @@ type FakeExtender struct { | |||
weight int | |||
} | |||
|
|||
func (f *FakeExtender) Filter(pod *v1.Pod, nodes []*v1.Node) ([]*v1.Node, schedulerapi.FailedNodesMap, error) { | |||
func (f *FakeExtender) Filter(pod *v1.Pod, nodes []*v1.Node, nodeNameToInfo map[string]*schedulercache.NodeInfo) ([]*v1.Node, schedulerapi.FailedNodesMap, *schedulerapi.Annotations, error) { | |||
filtered := []*v1.Node{} | |||
failedNodesMap := schedulerapi.FailedNodesMap{} | |||
for _, node := range nodes { | |||
fits := true | |||
for _, predicate := range f.predicates { | |||
fit, err := predicate(pod, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an annotation test for extender?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added one in the latest commit. please check if that is what you are suggesting.
} | ||
} | ||
|
||
func (e *Extender) Prioritize(args *schedulerapi.ExtenderArgs) (schedulerapi.HostPriorityList, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the signature change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not intentional .. fixed it.
plugin/pkg/scheduler/scheduler.go
Outdated
@@ -148,6 +148,7 @@ func (s *Scheduler) Run() { | |||
|
|||
func (s *Scheduler) scheduleOne() { | |||
var podAnnotations schedulerapi.Annotations | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line
@k8s-bot ok to test |
@ravigadde is this ready for final review? |
plugin/pkg/scheduler/extender.go
Outdated
@@ -104,6 +102,7 @@ func (h *HTTPExtender) Filter(pod *v1.Pod, nodes []*v1.Node, nodeNameToInfo map[ | |||
nodeNames *[]string | |||
nodeResult []*v1.Node | |||
podAnnotations schedulerapi.Annotations | |||
args *schedulerapi.ExtenderArgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of nodeItems, have a nodeList *v1.NodeList here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plugin/pkg/scheduler/extender.go
Outdated
@@ -123,25 +122,33 @@ func (h *HTTPExtender) Filter(pod *v1.Pod, nodes []*v1.Node, nodeNameToInfo map[ | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Populate nodeList here with nodeItems.
plugin/pkg/scheduler/extender.go
Outdated
Pod: *pod, | ||
Nodes: &v1.NodeList{Items: nodeItems}, | ||
NodeNames: nodeNames, | ||
if h.nodeCacheCapable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assign nodeList to Nodes. You dont need the if check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plugin/pkg/scheduler/extender.go
Outdated
return nil, nil, nil, err | ||
} | ||
if result.Error != "" { | ||
return nil, nil, nil, fmt.Errorf(result.Error) | ||
} | ||
|
||
if h.nodeCacheCapable { | ||
if h.nodeCacheCapable && (result.NodeNames != nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lose the ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plugin/pkg/scheduler/extender.go
Outdated
@@ -165,6 +172,7 @@ func (h *HTTPExtender) Prioritize(pod *v1.Pod, nodes []*v1.Node) (*schedulerapi. | |||
result schedulerapi.HostPriorityList | |||
nodeItems []v1.Node | |||
nodeNames *[]string | |||
args *schedulerapi.ExtenderArgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done .. latest commit addressed all the changes suggested
@sarat-k LGTM other than the minor comments. Please make the changes (and may be squash) for @timstclair to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So several big questions that leaves other parts of the PR in question that we need answers on 1st.
plugin/pkg/scheduler/api/types.go
Outdated
Nodes v1.NodeList | ||
// List of candidate nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're changing the api here. I don't know what type of policy enforcement we have around this, but there would need to be some global notification for anyone currently using it. I know @ravigadde introduced this mechanism, but do we have details on who else may use this?
/cc @davidopp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothysc these changes are backward compatible. Do you see anything that breaks backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *) (At least I assume that's the implication of changing from Nodes to Nodes * -- I guess we're not using the JSON annotations here that say "omitempty" and such.)
plugin/pkg/scheduler/api/types.go
Outdated
Nodes v1.NodeList | ||
// Filtered set of nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *)
plugin/pkg/scheduler/api/types.go
Outdated
// only if extender is capable of maintaining its own node cache | ||
NodeNames *[]string | ||
// Annotations are the pod annotations returned by the extender, if any | ||
PodAnnotations *Annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are ridding most of these, are there other annotations that you are using, or parsing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't know that. It should be OK to not have this in the extender imo.
plugin/pkg/scheduler/extender.go
Outdated
for _, node := range nodes { | ||
nodeNameSlice = append(nodeNameSlice, node.Name) | ||
} | ||
nodeNames = &nodeNameSlice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So slightly confused, does the extender maintain it's own watch then? and now you're only passing in nodename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes to both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothysc @ravigadde synced up with Ravi offline ... backed out pod annotations related changes and pushed the updated changes. Thanks for the review.
plugin/pkg/scheduler/extender.go
Outdated
return nodeResult, result.FailedNodes, nil | ||
|
||
if result.PodAnnotations != nil { | ||
podAnnotations = *result.PodAnnotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment re: we are ridding annotations in 1.6...
Are there other things you are passing using this mechanism?
plugin/pkg/scheduler/api/types.go
Outdated
// Filtered set of nodes where the pod can be scheduled; to be populated | ||
// only if extender is capable of maintaining its own node cache | ||
NodeNames *[]string | ||
// Annotations are the pod annotations returned by the extender, if any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs.
to be populated only if extender is capable of maintaining its own node cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plugin/pkg/scheduler/extender.go
Outdated
@@ -160,7 +200,7 @@ func (h *HTTPExtender) send(action string, args interface{}, result interface{}) | |||
return err | |||
} | |||
|
|||
url := h.extenderURL + "/" + h.apiVersion + "/" + action | |||
url := h.extenderURL + "/" + action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you removed the apiVersion from the endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apiVersion is always empty string and so the URL is coming out to be something like http://127.0.0.1:9999/api/scheduler//schedule with double-slash in between where apiVersion was supposed to be. Due to this double-slash, multiple 301 redirects are seen (sent by go router mostly) and extender doesn't even see these scheduling requests, eventually requests fail.
The test failure is a known issue, same as the one reported in #41419. Can you please let me know if there is anything more that needs to be done for this PR or if it can be merged ? Thanks, |
plugin/pkg/scheduler/api/types.go
Outdated
Nodes v1.NodeList | ||
// List of candidate nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"
plugin/pkg/scheduler/api/types.go
Outdated
// List of candidate nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList | ||
// List of candidate node names where the pod can be scheduled; to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == true"
plugin/pkg/scheduler/api/types.go
Outdated
Nodes v1.NodeList | ||
// List of candidate nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *) (At least I assume that's the implication of changing from Nodes to Nodes * -- I guess we're not using the JSON annotations here that say "omitempty" and such.)
plugin/pkg/scheduler/api/types.go
Outdated
Nodes v1.NodeList | ||
// Filtered set of nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"
plugin/pkg/scheduler/api/types.go
Outdated
Nodes v1.NodeList | ||
// Filtered set of nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *)
plugin/pkg/scheduler/api/types.go
Outdated
// Filtered set of nodes where the pod can be scheduled; to be populated | ||
// only if extender is not capable of maintaining its own node cache | ||
Nodes *v1.NodeList | ||
// Filtered set of nodes where the pod can be scheduled; to be populated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == true"
@@ -128,24 +128,36 @@ type ExtenderConfig struct { | |||
// HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize | |||
// timeout is ignored, k8s/other extenders priorities are used to select the node. | |||
HTTPTimeout time.Duration `json:"httpTimeout,omitempty"` | |||
// NodeCacheCapable specifies that the extender is capable of caching node information, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See all the aforementioned comments for the other types.go file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothysc @davidopp @ravigadde
I fixed all the comments as suggested and pushed the changes. Also, added omitempty tag for node list in ExtenderArgs. When the node list is empty, older implementations will just see empty list of nodes and they should be able to handle that.
node cache don't need to get all the information about every candidate node. For huge clusters, sending full node information of all nodes in the cluster on the wire every time a pod is scheduled is expensive. If the scheduler is capable of caching node information along with its capabilities, sending node name alone is sufficient. These changes provide that optimization in a backward compatible way - removed the inadvertent signature change of Prioritize() function - added defensive checks as suggested - added annotation specific test case - updated the comments in the scheduler types - got rid of apiVersion thats unused - using *v1.NodeList as suggested - backing out pod annotation update related changes made in the 1st commit - Adjusted the comments in types.go and v1/types.go as suggested in the code review
Seems OK |
@timothysc Can you please help with merging this ? |
I don't know why this is stuck in the submit queue... I can't merge directly. |
@timothysc @sarat-k That looks good to me, it's in the submit-queue. It will be picked-up eventually. |
Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562) |
The changes are to address the issue described in #40991
cc @ravigadde