Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of on-wire information sent to scheduler extender interfaces that are stateful #41119

Merged
merged 1 commit into from
Feb 26, 2017

Conversation

sarat-k
Copy link
Contributor

@sarat-k sarat-k commented Feb 8, 2017

The changes are to address the issue described in #40991

Added support to minimize sending verbose node information to scheduler extender by sending only node names and expecting extenders to cache the rest of the node information

cc @ravigadde

@k8s-ci-robot
Copy link
Contributor

Hi @sarat-k. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 8, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

The following people have approved this PR: sarat-k

Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @wojtek-t @madhusudancs @brendandburns
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Feb 8, 2017
@sarat-k
Copy link
Contributor Author

sarat-k commented Feb 8, 2017

I signed the CLA.

Copy link
Contributor

@ravigadde ravigadde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

cc @davidopp for scheduler annotation changes

@@ -136,16 +140,27 @@ type ExtenderArgs struct {
// Pod being scheduled
Pod v1.Pod
// List of candidate nodes where the pod can be scheduled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this comment too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// FailedNodesMap represents the filtered out nodes, with node names and failure messages
type FailedNodesMap map[string]string

// Annotations represents the annotations
type Annotations map[string]string

// ExtenderFilterResult represents the results of a filter call to an extender
type ExtenderFilterResult struct {
// Filtered set of nodes where the pod can be scheduled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// ExtenderArgs represents the arguments needed by the extender to filter/prioritize
// nodes for a pod.
type ExtenderArgs struct {
// Pod being scheduled
Pod apiv1.Pod `json:"pod"`
// List of candidate nodes where the pod can be scheduled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment

}

// FailedNodesMap represents the filtered out nodes, with node names and failure messages
type FailedNodesMap map[string]string

// Annotations represents the annotations
type Annotations map[string]string

// ExtenderFilterResult represents the results of a filter call to an extender
type ExtenderFilterResult struct {
// Filtered set of nodes where the pod can be scheduled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment here too

Pod: *pod,
Nodes: v1.NodeList{Items: nodeItems},
Pod: *pod,
Nodes: &v1.NodeList{Items: nodeItems},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes should be nil when node cache is enabled, rather than an empty list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

Pod: *pod,
Nodes: v1.NodeList{Items: nodeItems},
Pod: *pod,
Nodes: &v1.NodeList{Items: nodeItems},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: send nil when cache is enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

filterVerb string
prioritizeVerb string
weight int
apiVersion string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get rid of the apiVersion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes .. cleaned this up in the latest commit.

@@ -109,15 +109,15 @@ type FakeExtender struct {
weight int
}

func (f *FakeExtender) Filter(pod *v1.Pod, nodes []*v1.Node) ([]*v1.Node, schedulerapi.FailedNodesMap, error) {
func (f *FakeExtender) Filter(pod *v1.Pod, nodes []*v1.Node, nodeNameToInfo map[string]*schedulercache.NodeInfo) ([]*v1.Node, schedulerapi.FailedNodesMap, *schedulerapi.Annotations, error) {
filtered := []*v1.Node{}
failedNodesMap := schedulerapi.FailedNodesMap{}
for _, node := range nodes {
fits := true
for _, predicate := range f.predicates {
fit, err := predicate(pod, node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an annotation test for extender?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one in the latest commit. please check if that is what you are suggesting.

}
}

func (e *Extender) Prioritize(args *schedulerapi.ExtenderArgs) (schedulerapi.HostPriorityList, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the signature change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intentional .. fixed it.

@@ -148,6 +148,7 @@ func (s *Scheduler) Run() {

func (s *Scheduler) scheduleOne() {
var podAnnotations schedulerapi.Annotations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line

@rmmh rmmh removed their assignment Feb 8, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 9, 2017
@resouer
Copy link
Contributor

resouer commented Feb 10, 2017

@k8s-bot ok to test

@timothysc
Copy link
Member

timothysc commented Feb 10, 2017

@ravigadde is this ready for final review?

@@ -104,6 +102,7 @@ func (h *HTTPExtender) Filter(pod *v1.Pod, nodes []*v1.Node, nodeNameToInfo map[
nodeNames *[]string
nodeResult []*v1.Node
podAnnotations schedulerapi.Annotations
args *schedulerapi.ExtenderArgs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of nodeItems, have a nodeList *v1.NodeList here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -123,25 +122,33 @@ func (h *HTTPExtender) Filter(pod *v1.Pod, nodes []*v1.Node, nodeNameToInfo map[
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Populate nodeList here with nodeItems.

Pod: *pod,
Nodes: &v1.NodeList{Items: nodeItems},
NodeNames: nodeNames,
if h.nodeCacheCapable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign nodeList to Nodes. You dont need the if check here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return nil, nil, nil, err
}
if result.Error != "" {
return nil, nil, nil, fmt.Errorf(result.Error)
}

if h.nodeCacheCapable {
if h.nodeCacheCapable && (result.NodeNames != nil) {
Copy link
Contributor

@ravigadde ravigadde Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lose the ()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -165,6 +172,7 @@ func (h *HTTPExtender) Prioritize(pod *v1.Pod, nodes []*v1.Node) (*schedulerapi.
result schedulerapi.HostPriorityList
nodeItems []v1.Node
nodeNames *[]string
args *schedulerapi.ExtenderArgs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done .. latest commit addressed all the changes suggested

@ravigadde
Copy link
Contributor

@sarat-k LGTM other than the minor comments. Please make the changes (and may be squash) for @timstclair to review.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So several big questions that leaves other parts of the PR in question that we need answers on 1st.

Nodes v1.NodeList
// List of candidate nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're changing the api here. I don't know what type of policy enforcement we have around this, but there would need to be some global notification for anyone currently using it. I know @ravigadde introduced this mechanism, but do we have details on who else may use this?

/cc @davidopp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothysc these changes are backward compatible. Do you see anything that breaks backward compatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *) (At least I assume that's the implication of changing from Nodes to Nodes * -- I guess we're not using the JSON annotations here that say "omitempty" and such.)

Nodes v1.NodeList
// Filtered set of nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *)

// only if extender is capable of maintaining its own node cache
NodeNames *[]string
// Annotations are the pod annotations returned by the extender, if any
PodAnnotations *Annotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are ridding most of these, are there other annotations that you are using, or parsing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know that. It should be OK to not have this in the extender imo.

for _, node := range nodes {
nodeNameSlice = append(nodeNameSlice, node.Name)
}
nodeNames = &nodeNameSlice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So slightly confused, does the extender maintain it's own watch then? and now you're only passing in nodename?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothysc @ravigadde synced up with Ravi offline ... backed out pod annotations related changes and pushed the updated changes. Thanks for the review.

return nodeResult, result.FailedNodes, nil

if result.PodAnnotations != nil {
podAnnotations = *result.PodAnnotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment re: we are ridding annotations in 1.6...

Are there other things you are passing using this mechanism?

// Filtered set of nodes where the pod can be scheduled; to be populated
// only if extender is capable of maintaining its own node cache
NodeNames *[]string
// Annotations are the pod annotations returned by the extender, if any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs.

to be populated only if extender is capable of maintaining its own node cache

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, I don't see anything major.

@sarat-k could you add a release note block on your initial PR comment please. This will get picked up by the bots based on the tagging.

@davidopp did you want to review?

@@ -160,7 +200,7 @@ func (h *HTTPExtender) send(action string, args interface{}, result interface{})
return err
}

url := h.extenderURL + "/" + h.apiVersion + "/" + action
url := h.extenderURL + "/" + action
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you removed the apiVersion from the endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiVersion is always empty string and so the URL is coming out to be something like http://127.0.0.1:9999/api/scheduler//schedule with double-slash in between where apiVersion was supposed to be. Due to this double-slash, multiple 301 redirects are seen (sent by go router mostly) and extender doesn't even see these scheduling requests, eventually requests fail.

@timothysc timothysc added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 14, 2017
@k8s-github-robot k8s-github-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Feb 14, 2017
@timothysc timothysc added this to the v1.6 milestone Feb 15, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2017
@sarat-k
Copy link
Contributor Author

sarat-k commented Feb 21, 2017

@timothysc @davidopp

The test failure is a known issue, same as the one reported in #41419.

Can you please let me know if there is anything more that needs to be done for this PR or if it can be merged ?

Thanks,
Sarat

Nodes v1.NodeList
// List of candidate nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"

// List of candidate nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
// List of candidate node names where the pod can be scheduled; to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == true"

Nodes v1.NodeList
// List of candidate nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *) (At least I assume that's the implication of changing from Nodes to Nodes * -- I guess we're not using the JSON annotations here that say "omitempty" and such.)

Nodes v1.NodeList
// Filtered set of nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == false"

Nodes v1.NodeList
// Filtered set of nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backward compatibility concern might be that you changed the field from required to optional (Nodes to Nodes *)

// Filtered set of nodes where the pod can be scheduled; to be populated
// only if extender is not capable of maintaining its own node cache
Nodes *v1.NodeList
// Filtered set of nodes where the pod can be scheduled; to be populated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "to be populated only if extender is not capable of maintaining its own node cache" say "to be populated only if ExtenderConfig.NodeCacheCapable == true"

@@ -128,24 +128,36 @@ type ExtenderConfig struct {
// HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize
// timeout is ignored, k8s/other extenders priorities are used to select the node.
HTTPTimeout time.Duration `json:"httpTimeout,omitempty"`
// NodeCacheCapable specifies that the extender is capable of caching node information,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See all the aforementioned comments for the other types.go file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothysc @davidopp @ravigadde
I fixed all the comments as suggested and pushed the changes. Also, added omitempty tag for node list in ExtenderArgs. When the node list is empty, older implementations will just see empty list of nodes and they should be able to handle that.

@timothysc
Copy link
Member

I have no more comments on this PR. I see @davidopp has a couple.

@k8s-bot unit test this

  node cache don't need to get all the information about every
  candidate node. For huge clusters, sending full node information
  of all nodes in the cluster on the wire every time a pod is scheduled
  is expensive. If the scheduler is capable of caching node information
  along with its capabilities, sending node name alone is sufficient.
  These changes provide that optimization in a backward compatible way

- removed the inadvertent signature change of Prioritize() function
- added defensive checks as suggested
-  added annotation specific test case
- updated the comments in the scheduler types
- got rid of apiVersion thats unused
- using *v1.NodeList as suggested
- backing out pod annotation update related changes made in the
  1st commit
- Adjusted the comments in types.go and v1/types.go as suggested
  in the code review
@timothysc timothysc added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 23, 2017
@davidopp
Copy link
Member

Seems OK

@sarat-k
Copy link
Contributor Author

sarat-k commented Feb 24, 2017

@timothysc Can you please help with merging this ?

@timothysc
Copy link
Member

I don't know why this is stuck in the submit queue... I can't merge directly.
/cc @kubernetes/sig-contributor-experience-bugs

@apelisse
Copy link
Member

@timothysc @sarat-k That looks good to me, it's in the submit-queue. It will be picked-up eventually.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562)

@k8s-github-robot k8s-github-robot merged commit 11b9a2d into kubernetes:master Feb 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants