Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to modify precomputed predicate metadata upon adding/removal of a pod #50805

Merged
merged 2 commits into from
Aug 28, 2017

Conversation

bsalamat
Copy link
Member

What this PR does / why we need it: This PR adds capability to change precomputed predicate metadata and let's us add/remove pods to the precomputed metadata efficiently without the need ot recomputing everything upon addition/removal of pods. This PR is needed as a part of adding preemption logic to the scheduler.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:
To make the review process a bit easier, there are three commits. The cleanup commit is only moving code and renaming some functions, without logic changes.

Release note:

NONE

ref/ #47604
ref/ #48646

/assign @wojtek-t

@kubernetes/sig-scheduling-pr-reviews @davidopp

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 16, 2017
@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 16, 2017
@bsalamat bsalamat force-pushed the preemption_metacompute branch 2 times, most recently from a8b6194 to 3c59233 Compare August 17, 2017 02:27
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through the PR and it seems reasonable to me.
I will try to review it in the upcoming days.

podBestEffort bool
podRequest *schedulercache.Resource
podPorts map[int]bool
matchingAntiAffinityTerms map[types.UID][]matchingPodAntiAffinityTerm //key is a pod UID with the anti-affinity rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please avoid such long lines

podRequest *schedulercache.Resource
podPorts map[int]bool
matchingAntiAffinityTerms map[types.UID][]matchingPodAntiAffinityTerm //key is a pod UID with the anti-affinity rule.
serviceAffinityInUse bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this bool?
If the two slices below are nulls, it should have the same effect as setting this to false. In that case, I would prefer to remove this to avoid growing this structure infinitely - it's already pretty big.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bool tells us whether the service affinity predicate is actually enabled. The affinity predicate may be enabled, but the two slices may still be empty because the pod matches no services and no other pods with the same labels exist.

deletedPodIndex := -1
for i, pod := range meta.serviceAffinityMatchingPodList {
if pod.GetUID() == deletedPod.GetUID() {
deletedPodIndex = i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overcomplicated. You simply should move the body of the below if here and get rid of "deletedPodIndex".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I was trying to avoid intervening with the for loop iterator, but it doesn't matter as we are breaking right after removing the element.


// AddPod changes predicateMetadata assuming that `newPod` is added to the
// system.
func (meta *predicateMetadata) AddPod(addedPod *v1.Pod, nodeInfo *schedulercache.NodeInfo) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this method?

My personal feeling is that RemovePod is the only think you need to implement preemption. Why do you want to add pods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need this. The preemption first removes all the lower priority pods and after adding the pending pod, tries to reprieve as many removed pods as possible. This method is useful when adding the pending pod and adding those removed pods back.

// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
if len(podMatchingTerms) > 0 {
meta.matchingAntiAffinityTerms[addedPodUID] = append(meta.matchingAntiAffinityTerms[addedPodUID], podMatchingTerms...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be append in my opinion - there should be exactly one pod with a given UID - if there is already something for the uid, this seems like a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is going to be only one pod, but that one pod may have a few matching anti-affinity terms.

return false
}
}
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for lists without duplicated. It will return true for [1, 1, 2] and [1, 2, 2] (which may or may not be a problem in that case, but definitely deserves a comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the function. It was unused.

// predicateMetadataEquivalent returns true if the two metadata are equivalent.
// Note: this function does not compare podRequest.
func predicateMetadataEquivalent(meta1, meta2 *predicateMetadata) error {
if meta1.pod != meta2.pod {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really compare pointers to pods here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be DeepEqual?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test environment where we pass the same pod pointer to the metadata producers, this is fine. I changed it to DeepEqual anyway to avoid confusion.

Copy link
Member Author

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @wojtek-t for your prompt review.

podRequest *schedulercache.Resource
podPorts map[int]bool
matchingAntiAffinityTerms map[types.UID][]matchingPodAntiAffinityTerm //key is a pod UID with the anti-affinity rule.
serviceAffinityInUse bool
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bool tells us whether the service affinity predicate is actually enabled. The affinity predicate may be enabled, but the two slices may still be empty because the pod matches no services and no other pods with the same labels exist.

deletedPodIndex := -1
for i, pod := range meta.serviceAffinityMatchingPodList {
if pod.GetUID() == deletedPod.GetUID() {
deletedPodIndex = i
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I was trying to avoid intervening with the for loop iterator, but it doesn't matter as we are breaking right after removing the element.


// AddPod changes predicateMetadata assuming that `newPod` is added to the
// system.
func (meta *predicateMetadata) AddPod(addedPod *v1.Pod, nodeInfo *schedulercache.NodeInfo) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need this. The preemption first removes all the lower priority pods and after adding the pending pod, tries to reprieve as many removed pods as possible. This method is useful when adding the pending pod and adding those removed pods back.

// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
if len(podMatchingTerms) > 0 {
meta.matchingAntiAffinityTerms[addedPodUID] = append(meta.matchingAntiAffinityTerms[addedPodUID], podMatchingTerms...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is going to be only one pod, but that one pod may have a few matching anti-affinity terms.

return false
}
}
return true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the function. It was unused.

// predicateMetadataEquivalent returns true if the two metadata are equivalent.
// Note: this function does not compare podRequest.
func predicateMetadataEquivalent(meta1, meta2 *predicateMetadata) error {
if meta1.pod != meta2.pod {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test environment where we pass the same pod pointer to the metadata producers, this is fine. I changed it to DeepEqual anyway to avoid confusion.

@bsalamat bsalamat force-pushed the preemption_metacompute branch 5 times, most recently from d466832 to 29a8d2e Compare August 18, 2017 03:23
@bsalamat
Copy link
Member Author

@wojtek-t Other than addressing the comments, I also changed the key of the anti-affinity terms from pod UID to pod fullname, in a separate commit. Pod UID was not consistent with the rest of the codebase and made writing tests a bit more difficult.
Could you please take another look?

@bsalamat
Copy link
Member Author

ping @wojtek-t. Could you please take another look at your earliest convenience. All of my other PRs to add preemption depend on this one.

@wojtek-t
Copy link
Member

Yes, sorry. Will take a look into it today.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments.

Also pleas squash commits - I'm reviewing the whole PR anyway, because I never remember what I did before :P

podPorts map[int]bool
//key is a pod full name with the anti-affinity rules.
matchingAntiAffinityTerms map[string][]matchingPodAntiAffinityTerm
serviceAffinityInUse bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bool tells us whether the service affinity predicate is actually enabled. The affinity predicate may be enabled, but the two slices may still be empty because the pod matches no services and no other pods with the same labels exist.

Correct. But in that case, I think that if both slices are empty no matter if this bool is true or false, the results will be exactly the same.
So basically, what I'm saying is that removing this doesn't change anything and will make the code a bit easier :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still need to process a new pod in AddPod even when the two slices were empty if we didn't have this bool. This bool helps avoid that.
Since service affinity is not one of the default predicates and is not enabled in most deployments, we would like to avoid wasting cpu cycles processing the rules unnecessarily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - the argument with AddPod is reasonable.

SGTM

// So, if the namespace of the first one is not the same as the namespace of the
// deletedPod, we don't need to check the list, as deletedPod isn't in the list.
if meta.serviceAffinityInUse &&
len(meta.serviceAffinityMatchingPodList) > 0 &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is not needed - if the length is zero, the for inside if will simply be no-op.
Let's remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[In Go you can safely iterate over nil slices].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is there to protect against indexing an empty slice in the same "if" condition.

deletedPod.Namespace == meta.serviceAffinityMatchingPodList[0].Namespace {
for i, pod := range meta.serviceAffinityMatchingPodList {
if schedutil.GetPodFullName(pod) == deletedPodFullName {
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList[:i], meta.serviceAffinityMatchingPodList[i+1:]...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please split this line - it's too long now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
if len(podMatchingTerms) > 0 {
meta.matchingAntiAffinityTerms[addedPodFullName] = append(meta.matchingAntiAffinityTerms[addedPodFullName], podMatchingTerms...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please split this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
if len(podMatchingTerms) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this if - it's not neded, you can do it unconditionally:

package main

import (
	"fmt"
)

func main() {
	a := []string{"aa"}
	var b []string
	c := []string{}
	d := []string{"ee"}
	
	x := append(a, b...)
	x = append(x, c...)
	x = append(x, d...)
	fmt.Printf("%#v", x)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the tests, the entry may not exist in the map and that's why I had added that initial check. I changed the code to make it clear.

return fmt.Errorf("podPorts are not equal.")
}
}
for k1, v1 := range meta1.matchingAntiAffinityTerms {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test - we don't need to be super efficient here - simpler code is more important here.

So instead of that, we can compare two slices by:

  • sorting both of them
  • and then calling DeepEqual

[Yes - I know you need to write comparator for sorting, but this method will be so much simpler then].

The same below for service-afffinity things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be, but sort interface needs three functions. It should be done for the service affinity as well. We should also be careful that the comparison function almost never generates equal cases, as the order of equal elements in the expected list may not necessarily be the same as what the tests generate and could cause false positives. Given all this, probably sorting and doing DeepEqual may need even more code.

allPodLister := schedulertesting.FakePodLister(append(test.existingPods, test.addedPod))
// getMeta creates predicate meta data given the list of pods.
getMeta := func(lister schedulertesting.FakePodLister) (*predicateMetadata, map[string]*schedulercache.NodeInfo) {
nodeInfoMap := map[string]*schedulercache.NodeInfo{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this into a separate utility function (in this file).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean splitting the part computing nodeInfoMap.

I think that we used to have such function in the past, but I can't find it now...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the function and used it.

@@ -1105,7 +1077,8 @@ func getMatchingAntiAffinityTerms(pod *v1.Pod, nodeInfoMap map[string]*scheduler
return
}
if priorityutil.PodMatchesTermsNamespaceAndSelector(pod, namespaces, selector) {
nodeResult = append(nodeResult, matchingPodAntiAffinityTerm{term: &term, node: node})
existingPodFullName := schedutil.GetPodFullName(existingPod)
nodeResult[existingPodFullName] = append(nodeResult[existingPodFullName], matchingPodAntiAffinityTerm{term: &term, node: node})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please split the line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if predicateMeta, ok := meta.(*predicateMetadata); ok {
matchingTerms = predicateMeta.matchingAntiAffinityTerms
} else {
allPods, err := c.podLister.List(labels.Everything())
filteredPods, err := c.podLister.FilteredList(nodeInfo, labels.Everything())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those actually are "allPods" - labels.Everything won't filter anything, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are filtered by nodeInfo. Only the pods that exist in the nodeInfo are returned here.

// PodFilter is a simple interface that helps filter out pods.
// This is here only to avoid circular dependency between schedulercache and
// algorithm. Ideally, it should have been in algorithm/types.go.
type PodFilter interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this approach. I would prefer defining:

type func(*v1.Pod) bool PodFilter

and passing explicitly nodeInfo.Filter() to FilteredList.

Or do you have any strong reason of not doing that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason. Done as you suggested.

@bsalamat bsalamat force-pushed the preemption_metacompute branch from 29a8d2e to 0b5fcaf Compare August 22, 2017 19:16
Copy link
Member Author

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @wojtek-t!
Please take another look.

podPorts map[int]bool
//key is a pod full name with the anti-affinity rules.
matchingAntiAffinityTerms map[string][]matchingPodAntiAffinityTerm
serviceAffinityInUse bool
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still need to process a new pod in AddPod even when the two slices were empty if we didn't have this bool. This bool helps avoid that.
Since service affinity is not one of the default predicates and is not enabled in most deployments, we would like to avoid wasting cpu cycles processing the rules unnecessarily.

// So, if the namespace of the first one is not the same as the namespace of the
// deletedPod, we don't need to check the list, as deletedPod isn't in the list.
if meta.serviceAffinityInUse &&
len(meta.serviceAffinityMatchingPodList) > 0 &&
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is there to protect against indexing an empty slice in the same "if" condition.

deletedPod.Namespace == meta.serviceAffinityMatchingPodList[0].Namespace {
for i, pod := range meta.serviceAffinityMatchingPodList {
if schedutil.GetPodFullName(pod) == deletedPodFullName {
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList[:i], meta.serviceAffinityMatchingPodList[i+1:]...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
if len(podMatchingTerms) > 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
if len(podMatchingTerms) > 0 {
meta.matchingAntiAffinityTerms[addedPodFullName] = append(meta.matchingAntiAffinityTerms[addedPodFullName], podMatchingTerms...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

allPodLister := schedulertesting.FakePodLister(append(test.existingPods, test.addedPod))
// getMeta creates predicate meta data given the list of pods.
getMeta := func(lister schedulertesting.FakePodLister) (*predicateMetadata, map[string]*schedulercache.NodeInfo) {
nodeInfoMap := map[string]*schedulercache.NodeInfo{}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the function and used it.

@@ -1105,7 +1077,8 @@ func getMatchingAntiAffinityTerms(pod *v1.Pod, nodeInfoMap map[string]*scheduler
return
}
if priorityutil.PodMatchesTermsNamespaceAndSelector(pod, namespaces, selector) {
nodeResult = append(nodeResult, matchingPodAntiAffinityTerm{term: &term, node: node})
existingPodFullName := schedutil.GetPodFullName(existingPod)
nodeResult[existingPodFullName] = append(nodeResult[existingPodFullName], matchingPodAntiAffinityTerm{term: &term, node: node})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if predicateMeta, ok := meta.(*predicateMetadata); ok {
matchingTerms = predicateMeta.matchingAntiAffinityTerms
} else {
allPods, err := c.podLister.List(labels.Everything())
filteredPods, err := c.podLister.FilteredList(nodeInfo, labels.Everything())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are filtered by nodeInfo. Only the pods that exist in the nodeInfo are returned here.

return fmt.Errorf("podPorts are not equal.")
}
}
for k1, v1 := range meta1.matchingAntiAffinityTerms {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be, but sort interface needs three functions. It should be done for the service affinity as well. We should also be careful that the comparison function almost never generates equal cases, as the order of equal elements in the expected list may not necessarily be the same as what the tests generate and could cause false positives. Given all this, probably sorting and doing DeepEqual may need even more code.

// PodFilter is a simple interface that helps filter out pods.
// This is here only to avoid circular dependency between schedulercache and
// algorithm. Ideally, it should have been in algorithm/types.go.
type PodFilter interface {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason. Done as you suggested.

@bsalamat bsalamat force-pushed the preemption_metacompute branch from 0b5fcaf to c818b69 Compare August 22, 2017 21:02
@bsalamat
Copy link
Member Author

/retest

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another portion of comments. Most of them are relativley minor, there is no bigger one.

podPorts map[int]bool
//key is a pod full name with the anti-affinity rules.
matchingAntiAffinityTerms map[string][]matchingPodAntiAffinityTerm
serviceAffinityInUse bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - the argument with AddPod is reasonable.

SGTM

if found {
existingTerms = append(existingTerms, podMatchingTerms...)
} else {
if len(podMatchingTerms) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checked in the outer if - there is no point in doing all those operations if this is empty.

if len(podMatchingTerms) > 0 {
  if existingTerms, found := meta.matchingAntiAffinityTerms[addedPodFullName]; found {
    existingTerms = append(existingTerms, podMatchingTerms...)
  } else {
    meta.matchingAntiAffinityTerms[addedPodFullName] = podMatchingTerms
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return fmt.Errorf("Invalid node in nodeInfo.")
}
// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to limit the amount of nested ifs. So it would be easier to follow (and more standard) to do:

podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node());
if err != nil {
return err
}
// The rest of code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// of matching pods if applicable.
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace {
selector := CreateSelectorFromLabels(meta.pod.Labels)
if selector.Matches(labels.Set(addedPod.Labels)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - sorry.

This predicate surprises me every time I'm seeing it....

for !reflect.DeepEqual(meta1.podPorts, meta2.podPorts) {
return fmt.Errorf("podPorts are not equal.")
}
for k1, v1 := range meta1.matchingAntiAffinityTerms {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be, but sort interface needs three functions. It should be done for the service affinity as well. We should also be careful that the comparison function almost never generates equal cases, as the order of equal elements in the expected list may not necessarily be the same as what the tests generate and could cause false positives. Given all this, probably sorting and doing DeepEqual may need even more code.

I agree that sort needs 3 function. But those are super trivial function and well separated.

And yes - I agree that in total this might be a bit more code, but it will be better structured and easier to follow.
This function now has 70 lines of code and it's hard to follow it.
And what if we add few more fields to predicateMetadata? We may end up with comparison function having hundreds of line.

If you really don't want to do the sorting (though i think it's easier and less error prone), at the very least please extract comparing those slices to indviidual functions with a TODO to simplify them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Done. As expected there is more code now, but I agree that it is probably easier to follow.

func (c *PodAffinityChecker) getMatchingAntiAffinityTerms(pod *v1.Pod, allPods []*v1.Pod) (map[string][]matchingPodAntiAffinityTerm, error) {
result := make(map[string][]matchingPodAntiAffinityTerm)
for _, existingPod := range allPods {
existingPodNode, err := c.info.GetNodeInfo(existingPod.Spec.NodeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may affect performance, because with this change you're computing it for every pod, and before we were doing it only for pods with affinity/anti-affinity.

I think we should change it back to how it was.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added the check back.

if err != nil {
return nil, err
}
existingPodFullName := schedutil.GetPodFullName(existingPod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(existingPodMatchingTerms) > 0 {
...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

podName(pod), node.Name, term.term)
return false
for _, terms := range matchingTerms {
for _, term := range terms {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in creating a copy of each of term.

Instead you should do:

for i := range terms {
  term := &terms[i]
   ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it as you suggested. However, each "term" contains only two pointers. So copying it is more or less as efficient as getting the address, but getting the address is better if the struct grows in the future.

@@ -106,6 +106,20 @@ func (cache *schedulerCache) List(selector labels.Selector) ([]*v1.Pod, error) {
return pods, nil
}

func (cache *schedulerCache) FilteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating the code please do:

func (cache *schedulerCache) List(selector labels.Selector) ([]*v1.Pod, error) {
  return cache.filteredList(selector, alwaysTrue)
}

func (cache *schedulerCache) FilteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) {
  return cache.filteredList(selector, podFilter)
}

func (cache *schedulerCache) filteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) {
  // as you FilterList is now
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I didn't add filteredList. I just called FilteredList from List. Please let me know if you think adding filteredList is needed?

if predicateMeta, ok := meta.(*predicateMetadata); ok {
matchingTerms = predicateMeta.matchingAntiAffinityTerms
} else {
allPods, err := c.podLister.List(labels.Everything())
filteredPods, err := c.podLister.FilteredList(nodeInfo.Filter, labels.Everything())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait - this is completely changing the logic.

With nodeInfo.Filter you will only get pods from a given node, not all pods. That is not correct in my opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the nodeInfo.Filter "passes" all the pods whose nodeName is not the same as the NodeInfo node name. It only filters out the pods that are on this node (according to their nodeName), but are not present in the NodeInfo.Pods. I added a comment to clarify that.
Basically, we use this function to simulate removal of pods from the system by creating a copy of the real NodeInfo and removing the pods from the NodeInfo copy and then passing the copy to this function.

Copy link
Member Author

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @wojtek-t! PTAL.

if found {
existingTerms = append(existingTerms, podMatchingTerms...)
} else {
if len(podMatchingTerms) > 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return fmt.Errorf("Invalid node in nodeInfo.")
}
// Add matching anti-affinity terms of the addedPod to the map.
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// of matching pods if applicable.
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace {
selector := CreateSelectorFromLabels(meta.pod.Labels)
if selector.Matches(labels.Set(addedPod.Labels)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. I was actually telling davidopp the same thing several days ago.

@@ -106,6 +106,20 @@ func (cache *schedulerCache) List(selector labels.Selector) ([]*v1.Pod, error) {
return pods, nil
}

func (cache *schedulerCache) FilteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I didn't add filteredList. I just called FilteredList from List. Please let me know if you think adding filteredList is needed?

func (c *PodAffinityChecker) getMatchingAntiAffinityTerms(pod *v1.Pod, allPods []*v1.Pod) (map[string][]matchingPodAntiAffinityTerm, error) {
result := make(map[string][]matchingPodAntiAffinityTerm)
for _, existingPod := range allPods {
existingPodNode, err := c.info.GetNodeInfo(existingPod.Spec.NodeName)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added the check back.

if err != nil {
return nil, err
}
existingPodFullName := schedutil.GetPodFullName(existingPod)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if predicateMeta, ok := meta.(*predicateMetadata); ok {
matchingTerms = predicateMeta.matchingAntiAffinityTerms
} else {
allPods, err := c.podLister.List(labels.Everything())
filteredPods, err := c.podLister.FilteredList(nodeInfo.Filter, labels.Everything())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the nodeInfo.Filter "passes" all the pods whose nodeName is not the same as the NodeInfo node name. It only filters out the pods that are on this node (according to their nodeName), but are not present in the NodeInfo.Pods. I added a comment to clarify that.
Basically, we use this function to simulate removal of pods from the system by creating a copy of the real NodeInfo and removing the pods from the NodeInfo copy and then passing the copy to this function.

podName(pod), node.Name, term.term)
return false
for _, terms := range matchingTerms {
for _, term := range terms {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it as you suggested. However, each "term" contains only two pointers. So copying it is more or less as efficient as getting the address, but getting the address is better if the struct grows in the future.

for !reflect.DeepEqual(meta1.podPorts, meta2.podPorts) {
return fmt.Errorf("podPorts are not equal.")
}
for k1, v1 := range meta1.matchingAntiAffinityTerms {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Done. As expected there is more code now, but I agree that it is probably easier to follow.

@bsalamat bsalamat force-pushed the preemption_metacompute branch 2 times, most recently from fcf2d0d to dc57ce8 Compare August 24, 2017 04:59
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor comments. I will most probably LGTM once applied. Thanks for bothering with me!

if len(podMatchingTerms) > 0 {
existingTerms, found := meta.matchingAntiAffinityTerms[addedPodFullName]
if found {
meta.matchingAntiAffinityTerms[addedPodFullName] = append(existingTerms, podMatchingTerms...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please split the line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// of matching pods if applicable.
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace {
selector := CreateSelectorFromLabels(meta.pod.Labels)
if selector.Matches(labels.Set(addedPod.Labels)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think we should get rid of this predicate. I'ts unfortunate that we can't do it due to backward compatibility reasons...

if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace {
selector := CreateSelectorFromLabels(meta.pod.Labels)
if selector.Matches(labels.Set(addedPod.Labels)) {
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList, addedPod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please split the line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any guidelines on the length of lines? Our code-base is full of lines which are way longer than this. I almost never break lines to be consistent with the rest of the code-base.

)

// sortableAntiAffinityTerms lets us to sort anti-affinity terms.
type sortableAntiAffinityTerms struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply define this as:

type sortableAntiAffinityTerms []matchinPodAntiAffinityTerm

Will be simpler. Same for pods below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done!

}

func (s *sortablePods) Less(i, j int) bool {
if s.Items[i].Name < s.Items[j].Name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily good sorting function.
You can have two pods with the same name but in different namespaces. And those aren't equal.

Should be:

return s[i].Namespace < s[j].Namespace || (s[i].Namespace == s[j].Namespace && s[i].Name < s[j].Name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done

}

func (s *sortableServices) Less(i, j int) bool {
if s.Items[i].Name < s.Items[j].Name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

func (s *sortableAntiAffinityTerms) Less(i, j int) bool {
t1, t2 := s.Items[i], s.Items[j]
return t1.node.Name < t2.node.Name ||
(t1.node.Name == t2.node.Name && len(t1.term.Namespaces) < len(t2.term.Namespaces)) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unfortunate.
To make it a bit more readable, we can try changing to:

if t1.node.Name != t2.node.Name {
  return t1.node.Name < t2.node.Name
}
// and now compare the rest similarly

It's effectively the same, but simpler to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for !reflect.DeepEqual(meta1.podPorts, meta2.podPorts) {
return fmt.Errorf("podPorts are not equal.")
}
sortAntiAffinityTerms(meta1.matchingAntiAffinityTerms)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - it's much easier to follow now!

@bsalamat bsalamat force-pushed the preemption_metacompute branch from dc57ce8 to eb5c179 Compare August 24, 2017 19:11
Copy link
Member Author

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @wojtek-t! PTAL.

if len(podMatchingTerms) > 0 {
existingTerms, found := meta.matchingAntiAffinityTerms[addedPodFullName]
if found {
meta.matchingAntiAffinityTerms[addedPodFullName] = append(existingTerms, podMatchingTerms...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace {
selector := CreateSelectorFromLabels(meta.pod.Labels)
if selector.Matches(labels.Set(addedPod.Labels)) {
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList, addedPod)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any guidelines on the length of lines? Our code-base is full of lines which are way longer than this. I almost never break lines to be consistent with the rest of the code-base.

)

// sortableAntiAffinityTerms lets us to sort anti-affinity terms.
type sortableAntiAffinityTerms struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done!

}

func (s *sortablePods) Less(i, j int) bool {
if s.Items[i].Name < s.Items[j].Name {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done

}

func (s *sortableServices) Less(i, j int) bool {
if s.Items[i].Name < s.Items[j].Name {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

func (s *sortableAntiAffinityTerms) Less(i, j int) bool {
t1, t2 := s.Items[i], s.Items[j]
return t1.node.Name < t2.node.Name ||
(t1.node.Name == t2.node.Name && len(t1.term.Namespaces) < len(t2.term.Namespaces)) ||
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// of matching pods if applicable.
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace {
selector := CreateSelectorFromLabels(meta.pod.Labels)
if selector.Matches(labels.Set(addedPod.Labels)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Luckily it is not enabled by default.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last minor nit.

Please squash the last commit and I will approve.

if t1.node.Name != t2.node.Name {
return t1.node.Name < t2.node.Name
}
if t1.node.Name == t2.node.Name && len(t1.term.Namespaces) != len(t2.term.Namespaces) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to check if node.Names are equal. They are, otherwise we would return in above if.

Same for everything below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bsalamat bsalamat force-pushed the preemption_metacompute branch 2 times, most recently from 357d669 to 0e777e9 Compare August 25, 2017 15:26
@bsalamat
Copy link
Member Author

Thanks a lot @wojtek-t for taking a lot of time to review this PR.

@bsalamat bsalamat force-pushed the preemption_metacompute branch 4 times, most recently from 06ebd08 to 8dad409 Compare August 25, 2017 20:07
@davidopp
Copy link
Member

Thanks a ton for reviewing this @wojtek-t

@bsalamat bsalamat force-pushed the preemption_metacompute branch from 8dad409 to df3796f Compare August 28, 2017 07:02
@bsalamat bsalamat force-pushed the preemption_metacompute branch from df3796f to 87d4065 Compare August 28, 2017 07:13
@wojtek-t
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, wojtek-t

Associated issue: 47604

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 4ba2b62 into kubernetes:master Aug 28, 2017
@bsalamat bsalamat deleted the preemption_metacompute branch August 29, 2017 00:49
k8s-github-robot pushed a commit that referenced this pull request Sep 8, 2017
Automatic merge from submit-queue

Add pod preemption to the scheduler

**What this PR does / why we need it**:
This is the last of a series of PRs to add priority-based preemption to the scheduler. This PR connects the preemption logic to the scheduler workflow.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #48646

**Special notes for your reviewer**:
This PR includes other PRs which are under review (#50805, #50405, #50190). All the new code is located in 43627af.

**Release note**:

```release-note
Add priority-based preemption to the scheduler.
```

ref/ #47604

/assign @davidopp 

@kubernetes/sig-scheduling-pr-reviews
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants