-
Notifications
You must be signed in to change notification settings - Fork 40.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to modify precomputed predicate metadata upon adding/removal of a pod #50805
Add support to modify precomputed predicate metadata upon adding/removal of a pod #50805
Conversation
a8b6194
to
3c59233
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed through the PR and it seems reasonable to me.
I will try to review it in the upcoming days.
podBestEffort bool | ||
podRequest *schedulercache.Resource | ||
podPorts map[int]bool | ||
matchingAntiAffinityTerms map[types.UID][]matchingPodAntiAffinityTerm //key is a pod UID with the anti-affinity rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please avoid such long lines
podRequest *schedulercache.Resource | ||
podPorts map[int]bool | ||
matchingAntiAffinityTerms map[types.UID][]matchingPodAntiAffinityTerm //key is a pod UID with the anti-affinity rule. | ||
serviceAffinityInUse bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this bool?
If the two slices below are nulls, it should have the same effect as setting this to false. In that case, I would prefer to remove this to avoid growing this structure infinitely - it's already pretty big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bool tells us whether the service affinity predicate is actually enabled. The affinity predicate may be enabled, but the two slices may still be empty because the pod matches no services and no other pods with the same labels exist.
deletedPodIndex := -1 | ||
for i, pod := range meta.serviceAffinityMatchingPodList { | ||
if pod.GetUID() == deletedPod.GetUID() { | ||
deletedPodIndex = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is overcomplicated. You simply should move the body of the below if here and get rid of "deletedPodIndex".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I was trying to avoid intervening with the for loop iterator, but it doesn't matter as we are breaking right after removing the element.
|
||
// AddPod changes predicateMetadata assuming that `newPod` is added to the | ||
// system. | ||
func (meta *predicateMetadata) AddPod(addedPod *v1.Pod, nodeInfo *schedulercache.NodeInfo) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this method?
My personal feeling is that RemovePod is the only think you need to implement preemption. Why do you want to add pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need this. The preemption first removes all the lower priority pods and after adding the pending pod, tries to reprieve as many removed pods as possible. This method is useful when adding the pending pod and adding those removed pods back.
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { | ||
if len(podMatchingTerms) > 0 { | ||
meta.matchingAntiAffinityTerms[addedPodUID] = append(meta.matchingAntiAffinityTerms[addedPodUID], podMatchingTerms...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be append in my opinion - there should be exactly one pod with a given UID - if there is already something for the uid, this seems like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is going to be only one pod, but that one pod may have a few matching anti-affinity terms.
return false | ||
} | ||
} | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works for lists without duplicated. It will return true for [1, 1, 2] and [1, 2, 2] (which may or may not be a problem in that case, but definitely deserves a comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted the function. It was unused.
// predicateMetadataEquivalent returns true if the two metadata are equivalent. | ||
// Note: this function does not compare podRequest. | ||
func predicateMetadataEquivalent(meta1, meta2 *predicateMetadata) error { | ||
if meta1.pod != meta2.pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really compare pointers to pods here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be DeepEqual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test environment where we pass the same pod pointer to the metadata producers, this is fine. I changed it to DeepEqual anyway to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @wojtek-t for your prompt review.
podRequest *schedulercache.Resource | ||
podPorts map[int]bool | ||
matchingAntiAffinityTerms map[types.UID][]matchingPodAntiAffinityTerm //key is a pod UID with the anti-affinity rule. | ||
serviceAffinityInUse bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bool tells us whether the service affinity predicate is actually enabled. The affinity predicate may be enabled, but the two slices may still be empty because the pod matches no services and no other pods with the same labels exist.
deletedPodIndex := -1 | ||
for i, pod := range meta.serviceAffinityMatchingPodList { | ||
if pod.GetUID() == deletedPod.GetUID() { | ||
deletedPodIndex = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I was trying to avoid intervening with the for loop iterator, but it doesn't matter as we are breaking right after removing the element.
|
||
// AddPod changes predicateMetadata assuming that `newPod` is added to the | ||
// system. | ||
func (meta *predicateMetadata) AddPod(addedPod *v1.Pod, nodeInfo *schedulercache.NodeInfo) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need this. The preemption first removes all the lower priority pods and after adding the pending pod, tries to reprieve as many removed pods as possible. This method is useful when adding the pending pod and adding those removed pods back.
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { | ||
if len(podMatchingTerms) > 0 { | ||
meta.matchingAntiAffinityTerms[addedPodUID] = append(meta.matchingAntiAffinityTerms[addedPodUID], podMatchingTerms...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is going to be only one pod, but that one pod may have a few matching anti-affinity terms.
return false | ||
} | ||
} | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted the function. It was unused.
// predicateMetadataEquivalent returns true if the two metadata are equivalent. | ||
// Note: this function does not compare podRequest. | ||
func predicateMetadataEquivalent(meta1, meta2 *predicateMetadata) error { | ||
if meta1.pod != meta2.pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test environment where we pass the same pod pointer to the metadata producers, this is fine. I changed it to DeepEqual anyway to avoid confusion.
d466832
to
29a8d2e
Compare
@wojtek-t Other than addressing the comments, I also changed the key of the anti-affinity terms from pod UID to pod fullname, in a separate commit. Pod UID was not consistent with the rest of the codebase and made writing tests a bit more difficult. |
ping @wojtek-t. Could you please take another look at your earliest convenience. All of my other PRs to add preemption depend on this one. |
Yes, sorry. Will take a look into it today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments.
Also pleas squash commits - I'm reviewing the whole PR anyway, because I never remember what I did before :P
podPorts map[int]bool | ||
//key is a pod full name with the anti-affinity rules. | ||
matchingAntiAffinityTerms map[string][]matchingPodAntiAffinityTerm | ||
serviceAffinityInUse bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bool tells us whether the service affinity predicate is actually enabled. The affinity predicate may be enabled, but the two slices may still be empty because the pod matches no services and no other pods with the same labels exist.
Correct. But in that case, I think that if both slices are empty no matter if this bool is true or false, the results will be exactly the same.
So basically, what I'm saying is that removing this doesn't change anything and will make the code a bit easier :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would still need to process a new pod in AddPod
even when the two slices were empty if we didn't have this bool. This bool helps avoid that.
Since service affinity is not one of the default predicates and is not enabled in most deployments, we would like to avoid wasting cpu cycles processing the rules unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - the argument with AddPod is reasonable.
SGTM
// So, if the namespace of the first one is not the same as the namespace of the | ||
// deletedPod, we don't need to check the list, as deletedPod isn't in the list. | ||
if meta.serviceAffinityInUse && | ||
len(meta.serviceAffinityMatchingPodList) > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is not needed - if the length is zero, the for inside if will simply be no-op.
Let's remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[In Go you can safely iterate over nil slices].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is there to protect against indexing an empty slice in the same "if" condition.
deletedPod.Namespace == meta.serviceAffinityMatchingPodList[0].Namespace { | ||
for i, pod := range meta.serviceAffinityMatchingPodList { | ||
if schedutil.GetPodFullName(pod) == deletedPodFullName { | ||
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList[:i], meta.serviceAffinityMatchingPodList[i+1:]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please split this line - it's too long now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { | ||
if len(podMatchingTerms) > 0 { | ||
meta.matchingAntiAffinityTerms[addedPodFullName] = append(meta.matchingAntiAffinityTerms[addedPodFullName], podMatchingTerms...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please split this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { | ||
if len(podMatchingTerms) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this if - it's not neded, you can do it unconditionally:
package main
import (
"fmt"
)
func main() {
a := []string{"aa"}
var b []string
c := []string{}
d := []string{"ee"}
x := append(a, b...)
x = append(x, c...)
x = append(x, d...)
fmt.Printf("%#v", x)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running the tests, the entry may not exist in the map and that's why I had added that initial check. I changed the code to make it clear.
return fmt.Errorf("podPorts are not equal.") | ||
} | ||
} | ||
for k1, v1 := range meta1.matchingAntiAffinityTerms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is test - we don't need to be super efficient here - simpler code is more important here.
So instead of that, we can compare two slices by:
- sorting both of them
- and then calling DeepEqual
[Yes - I know you need to write comparator for sorting, but this method will be so much simpler then].
The same below for service-afffinity things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be, but sort
interface needs three functions. It should be done for the service affinity as well. We should also be careful that the comparison function almost never generates equal
cases, as the order of equal
elements in the expected list may not necessarily be the same as what the tests generate and could cause false positives. Given all this, probably sorting and doing DeepEqual may need even more code.
allPodLister := schedulertesting.FakePodLister(append(test.existingPods, test.addedPod)) | ||
// getMeta creates predicate meta data given the list of pods. | ||
getMeta := func(lister schedulertesting.FakePodLister) (*predicateMetadata, map[string]*schedulercache.NodeInfo) { | ||
nodeInfoMap := map[string]*schedulercache.NodeInfo{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split this into a separate utility function (in this file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean splitting the part computing nodeInfoMap.
I think that we used to have such function in the past, but I can't find it now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the function and used it.
@@ -1105,7 +1077,8 @@ func getMatchingAntiAffinityTerms(pod *v1.Pod, nodeInfoMap map[string]*scheduler | |||
return | |||
} | |||
if priorityutil.PodMatchesTermsNamespaceAndSelector(pod, namespaces, selector) { | |||
nodeResult = append(nodeResult, matchingPodAntiAffinityTerm{term: &term, node: node}) | |||
existingPodFullName := schedutil.GetPodFullName(existingPod) | |||
nodeResult[existingPodFullName] = append(nodeResult[existingPodFullName], matchingPodAntiAffinityTerm{term: &term, node: node}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please split the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if predicateMeta, ok := meta.(*predicateMetadata); ok { | ||
matchingTerms = predicateMeta.matchingAntiAffinityTerms | ||
} else { | ||
allPods, err := c.podLister.List(labels.Everything()) | ||
filteredPods, err := c.podLister.FilteredList(nodeInfo, labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those actually are "allPods" - labels.Everything won't filter anything, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are filtered by nodeInfo. Only the pods that exist in the nodeInfo are returned here.
// PodFilter is a simple interface that helps filter out pods. | ||
// This is here only to avoid circular dependency between schedulercache and | ||
// algorithm. Ideally, it should have been in algorithm/types.go. | ||
type PodFilter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this approach. I would prefer defining:
type func(*v1.Pod) bool PodFilter
and passing explicitly nodeInfo.Filter() to FilteredList.
Or do you have any strong reason of not doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong reason. Done as you suggested.
29a8d2e
to
0b5fcaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, @wojtek-t!
Please take another look.
podPorts map[int]bool | ||
//key is a pod full name with the anti-affinity rules. | ||
matchingAntiAffinityTerms map[string][]matchingPodAntiAffinityTerm | ||
serviceAffinityInUse bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would still need to process a new pod in AddPod
even when the two slices were empty if we didn't have this bool. This bool helps avoid that.
Since service affinity is not one of the default predicates and is not enabled in most deployments, we would like to avoid wasting cpu cycles processing the rules unnecessarily.
// So, if the namespace of the first one is not the same as the namespace of the | ||
// deletedPod, we don't need to check the list, as deletedPod isn't in the list. | ||
if meta.serviceAffinityInUse && | ||
len(meta.serviceAffinityMatchingPodList) > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is there to protect against indexing an empty slice in the same "if" condition.
deletedPod.Namespace == meta.serviceAffinityMatchingPodList[0].Namespace { | ||
for i, pod := range meta.serviceAffinityMatchingPodList { | ||
if schedutil.GetPodFullName(pod) == deletedPodFullName { | ||
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList[:i], meta.serviceAffinityMatchingPodList[i+1:]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { | ||
if len(podMatchingTerms) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { | ||
if len(podMatchingTerms) > 0 { | ||
meta.matchingAntiAffinityTerms[addedPodFullName] = append(meta.matchingAntiAffinityTerms[addedPodFullName], podMatchingTerms...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
allPodLister := schedulertesting.FakePodLister(append(test.existingPods, test.addedPod)) | ||
// getMeta creates predicate meta data given the list of pods. | ||
getMeta := func(lister schedulertesting.FakePodLister) (*predicateMetadata, map[string]*schedulercache.NodeInfo) { | ||
nodeInfoMap := map[string]*schedulercache.NodeInfo{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the function and used it.
@@ -1105,7 +1077,8 @@ func getMatchingAntiAffinityTerms(pod *v1.Pod, nodeInfoMap map[string]*scheduler | |||
return | |||
} | |||
if priorityutil.PodMatchesTermsNamespaceAndSelector(pod, namespaces, selector) { | |||
nodeResult = append(nodeResult, matchingPodAntiAffinityTerm{term: &term, node: node}) | |||
existingPodFullName := schedutil.GetPodFullName(existingPod) | |||
nodeResult[existingPodFullName] = append(nodeResult[existingPodFullName], matchingPodAntiAffinityTerm{term: &term, node: node}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if predicateMeta, ok := meta.(*predicateMetadata); ok { | ||
matchingTerms = predicateMeta.matchingAntiAffinityTerms | ||
} else { | ||
allPods, err := c.podLister.List(labels.Everything()) | ||
filteredPods, err := c.podLister.FilteredList(nodeInfo, labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are filtered by nodeInfo. Only the pods that exist in the nodeInfo are returned here.
return fmt.Errorf("podPorts are not equal.") | ||
} | ||
} | ||
for k1, v1 := range meta1.matchingAntiAffinityTerms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be, but sort
interface needs three functions. It should be done for the service affinity as well. We should also be careful that the comparison function almost never generates equal
cases, as the order of equal
elements in the expected list may not necessarily be the same as what the tests generate and could cause false positives. Given all this, probably sorting and doing DeepEqual may need even more code.
// PodFilter is a simple interface that helps filter out pods. | ||
// This is here only to avoid circular dependency between schedulercache and | ||
// algorithm. Ideally, it should have been in algorithm/types.go. | ||
type PodFilter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong reason. Done as you suggested.
0b5fcaf
to
c818b69
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another portion of comments. Most of them are relativley minor, there is no bigger one.
podPorts map[int]bool | ||
//key is a pod full name with the anti-affinity rules. | ||
matchingAntiAffinityTerms map[string][]matchingPodAntiAffinityTerm | ||
serviceAffinityInUse bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - the argument with AddPod is reasonable.
SGTM
if found { | ||
existingTerms = append(existingTerms, podMatchingTerms...) | ||
} else { | ||
if len(podMatchingTerms) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be checked in the outer if - there is no point in doing all those operations if this is empty.
if len(podMatchingTerms) > 0 {
if existingTerms, found := meta.matchingAntiAffinityTerms[addedPodFullName]; found {
existingTerms = append(existingTerms, podMatchingTerms...)
} else {
meta.matchingAntiAffinityTerms[addedPodFullName] = podMatchingTerms
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return fmt.Errorf("Invalid node in nodeInfo.") | ||
} | ||
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer to limit the amount of nested ifs. So it would be easier to follow (and more standard) to do:
podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node());
if err != nil {
return err
}
// The rest of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// of matching pods if applicable. | ||
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace { | ||
selector := CreateSelectorFromLabels(meta.pod.Labels) | ||
if selector.Matches(labels.Set(addedPod.Labels)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - sorry.
This predicate surprises me every time I'm seeing it....
for !reflect.DeepEqual(meta1.podPorts, meta2.podPorts) { | ||
return fmt.Errorf("podPorts are not equal.") | ||
} | ||
for k1, v1 := range meta1.matchingAntiAffinityTerms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be, but sort interface needs three functions. It should be done for the service affinity as well. We should also be careful that the comparison function almost never generates equal cases, as the order of equal elements in the expected list may not necessarily be the same as what the tests generate and could cause false positives. Given all this, probably sorting and doing DeepEqual may need even more code.
I agree that sort needs 3 function. But those are super trivial function and well separated.
And yes - I agree that in total this might be a bit more code, but it will be better structured and easier to follow.
This function now has 70 lines of code and it's hard to follow it.
And what if we add few more fields to predicateMetadata? We may end up with comparison function having hundreds of line.
If you really don't want to do the sorting (though i think it's easier and less error prone), at the very least please extract comparing those slices to indviidual functions with a TODO to simplify them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Done. As expected there is more code now, but I agree that it is probably easier to follow.
func (c *PodAffinityChecker) getMatchingAntiAffinityTerms(pod *v1.Pod, allPods []*v1.Pod) (map[string][]matchingPodAntiAffinityTerm, error) { | ||
result := make(map[string][]matchingPodAntiAffinityTerm) | ||
for _, existingPod := range allPods { | ||
existingPodNode, err := c.info.GetNodeInfo(existingPod.Spec.NodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may affect performance, because with this change you're computing it for every pod, and before we were doing it only for pods with affinity/anti-affinity.
I think we should change it back to how it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added the check back.
if err != nil { | ||
return nil, err | ||
} | ||
existingPodFullName := schedutil.GetPodFullName(existingPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(existingPodMatchingTerms) > 0 {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
podName(pod), node.Name, term.term) | ||
return false | ||
for _, terms := range matchingTerms { | ||
for _, term := range terms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in creating a copy of each of term.
Instead you should do:
for i := range terms {
term := &terms[i]
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it as you suggested. However, each "term" contains only two pointers. So copying it is more or less as efficient as getting the address, but getting the address is better if the struct grows in the future.
@@ -106,6 +106,20 @@ func (cache *schedulerCache) List(selector labels.Selector) ([]*v1.Pod, error) { | |||
return pods, nil | |||
} | |||
|
|||
func (cache *schedulerCache) FilteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating the code please do:
func (cache *schedulerCache) List(selector labels.Selector) ([]*v1.Pod, error) {
return cache.filteredList(selector, alwaysTrue)
}
func (cache *schedulerCache) FilteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) {
return cache.filteredList(selector, podFilter)
}
func (cache *schedulerCache) filteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) {
// as you FilterList is now
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but I didn't add filteredList
. I just called FilteredList
from List
. Please let me know if you think adding filteredList
is needed?
if predicateMeta, ok := meta.(*predicateMetadata); ok { | ||
matchingTerms = predicateMeta.matchingAntiAffinityTerms | ||
} else { | ||
allPods, err := c.podLister.List(labels.Everything()) | ||
filteredPods, err := c.podLister.FilteredList(nodeInfo.Filter, labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait - this is completely changing the logic.
With nodeInfo.Filter you will only get pods from a given node, not all pods. That is not correct in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that the nodeInfo.Filter "passes" all the pods whose nodeName is not the same as the NodeInfo node name. It only filters out the pods that are on this node (according to their nodeName), but are not present in the NodeInfo.Pods. I added a comment to clarify that.
Basically, we use this function to simulate removal of pods from the system by creating a copy of the real NodeInfo and removing the pods from the NodeInfo copy and then passing the copy to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @wojtek-t! PTAL.
if found { | ||
existingTerms = append(existingTerms, podMatchingTerms...) | ||
} else { | ||
if len(podMatchingTerms) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return fmt.Errorf("Invalid node in nodeInfo.") | ||
} | ||
// Add matching anti-affinity terms of the addedPod to the map. | ||
if podMatchingTerms, err := getMatchingAntiAffinityTermsOfExistingPod(meta.pod, addedPod, nodeInfo.Node()); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// of matching pods if applicable. | ||
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace { | ||
selector := CreateSelectorFromLabels(meta.pod.Labels) | ||
if selector.Matches(labels.Set(addedPod.Labels)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too. I was actually telling davidopp the same thing several days ago.
@@ -106,6 +106,20 @@ func (cache *schedulerCache) List(selector labels.Selector) ([]*v1.Pod, error) { | |||
return pods, nil | |||
} | |||
|
|||
func (cache *schedulerCache) FilteredList(podFilter PodFilter, selector labels.Selector) ([]*v1.Pod, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but I didn't add filteredList
. I just called FilteredList
from List
. Please let me know if you think adding filteredList
is needed?
func (c *PodAffinityChecker) getMatchingAntiAffinityTerms(pod *v1.Pod, allPods []*v1.Pod) (map[string][]matchingPodAntiAffinityTerm, error) { | ||
result := make(map[string][]matchingPodAntiAffinityTerm) | ||
for _, existingPod := range allPods { | ||
existingPodNode, err := c.info.GetNodeInfo(existingPod.Spec.NodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added the check back.
if err != nil { | ||
return nil, err | ||
} | ||
existingPodFullName := schedutil.GetPodFullName(existingPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if predicateMeta, ok := meta.(*predicateMetadata); ok { | ||
matchingTerms = predicateMeta.matchingAntiAffinityTerms | ||
} else { | ||
allPods, err := c.podLister.List(labels.Everything()) | ||
filteredPods, err := c.podLister.FilteredList(nodeInfo.Filter, labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that the nodeInfo.Filter "passes" all the pods whose nodeName is not the same as the NodeInfo node name. It only filters out the pods that are on this node (according to their nodeName), but are not present in the NodeInfo.Pods. I added a comment to clarify that.
Basically, we use this function to simulate removal of pods from the system by creating a copy of the real NodeInfo and removing the pods from the NodeInfo copy and then passing the copy to this function.
podName(pod), node.Name, term.term) | ||
return false | ||
for _, terms := range matchingTerms { | ||
for _, term := range terms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it as you suggested. However, each "term" contains only two pointers. So copying it is more or less as efficient as getting the address, but getting the address is better if the struct grows in the future.
for !reflect.DeepEqual(meta1.podPorts, meta2.podPorts) { | ||
return fmt.Errorf("podPorts are not equal.") | ||
} | ||
for k1, v1 := range meta1.matchingAntiAffinityTerms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Done. As expected there is more code now, but I agree that it is probably easier to follow.
fcf2d0d
to
dc57ce8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple minor comments. I will most probably LGTM once applied. Thanks for bothering with me!
if len(podMatchingTerms) > 0 { | ||
existingTerms, found := meta.matchingAntiAffinityTerms[addedPodFullName] | ||
if found { | ||
meta.matchingAntiAffinityTerms[addedPodFullName] = append(existingTerms, podMatchingTerms...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please split the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// of matching pods if applicable. | ||
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace { | ||
selector := CreateSelectorFromLabels(meta.pod.Labels) | ||
if selector.Matches(labels.Set(addedPod.Labels)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I think we should get rid of this predicate. I'ts unfortunate that we can't do it due to backward compatibility reasons...
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace { | ||
selector := CreateSelectorFromLabels(meta.pod.Labels) | ||
if selector.Matches(labels.Set(addedPod.Labels)) { | ||
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList, addedPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please split the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any guidelines on the length of lines? Our code-base is full of lines which are way longer than this. I almost never break lines to be consistent with the rest of the code-base.
) | ||
|
||
// sortableAntiAffinityTerms lets us to sort anti-affinity terms. | ||
type sortableAntiAffinityTerms struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply define this as:
type sortableAntiAffinityTerms []matchinPodAntiAffinityTerm
Will be simpler. Same for pods below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done!
} | ||
|
||
func (s *sortablePods) Less(i, j int) bool { | ||
if s.Items[i].Name < s.Items[j].Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessarily good sorting function.
You can have two pods with the same name but in different namespaces. And those aren't equal.
Should be:
return s[i].Namespace < s[j].Namespace || (s[i].Namespace == s[j].Namespace && s[i].Name < s[j].Name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done
} | ||
|
||
func (s *sortableServices) Less(i, j int) bool { | ||
if s.Items[i].Name < s.Items[j].Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
func (s *sortableAntiAffinityTerms) Less(i, j int) bool { | ||
t1, t2 := s.Items[i], s.Items[j] | ||
return t1.node.Name < t2.node.Name || | ||
(t1.node.Name == t2.node.Name && len(t1.term.Namespaces) < len(t2.term.Namespaces)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit unfortunate.
To make it a bit more readable, we can try changing to:
if t1.node.Name != t2.node.Name {
return t1.node.Name < t2.node.Name
}
// and now compare the rest similarly
It's effectively the same, but simpler to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for !reflect.DeepEqual(meta1.podPorts, meta2.podPorts) { | ||
return fmt.Errorf("podPorts are not equal.") | ||
} | ||
sortAntiAffinityTerms(meta1.matchingAntiAffinityTerms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - it's much easier to follow now!
dc57ce8
to
eb5c179
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @wojtek-t! PTAL.
if len(podMatchingTerms) > 0 { | ||
existingTerms, found := meta.matchingAntiAffinityTerms[addedPodFullName] | ||
if found { | ||
meta.matchingAntiAffinityTerms[addedPodFullName] = append(existingTerms, podMatchingTerms...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace { | ||
selector := CreateSelectorFromLabels(meta.pod.Labels) | ||
if selector.Matches(labels.Set(addedPod.Labels)) { | ||
meta.serviceAffinityMatchingPodList = append(meta.serviceAffinityMatchingPodList, addedPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any guidelines on the length of lines? Our code-base is full of lines which are way longer than this. I almost never break lines to be consistent with the rest of the code-base.
) | ||
|
||
// sortableAntiAffinityTerms lets us to sort anti-affinity terms. | ||
type sortableAntiAffinityTerms struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done!
} | ||
|
||
func (s *sortablePods) Less(i, j int) bool { | ||
if s.Items[i].Name < s.Items[j].Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done
} | ||
|
||
func (s *sortableServices) Less(i, j int) bool { | ||
if s.Items[i].Name < s.Items[j].Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
func (s *sortableAntiAffinityTerms) Less(i, j int) bool { | ||
t1, t2 := s.Items[i], s.Items[j] | ||
return t1.node.Name < t2.node.Name || | ||
(t1.node.Name == t2.node.Name && len(t1.term.Namespaces) < len(t2.term.Namespaces)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// of matching pods if applicable. | ||
if meta.serviceAffinityInUse && addedPod.Namespace == meta.pod.Namespace { | ||
selector := CreateSelectorFromLabels(meta.pod.Labels) | ||
if selector.Matches(labels.Set(addedPod.Labels)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Luckily it is not enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last minor nit.
Please squash the last commit and I will approve.
if t1.node.Name != t2.node.Name { | ||
return t1.node.Name < t2.node.Name | ||
} | ||
if t1.node.Name == t2.node.Name && len(t1.term.Namespaces) != len(t2.term.Namespaces) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to check if node.Names are equal. They are, otherwise we would return in above if.
Same for everything below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
357d669
to
0e777e9
Compare
Thanks a lot @wojtek-t for taking a lot of time to review this PR. |
06ebd08
to
8dad409
Compare
Thanks a ton for reviewing this @wojtek-t |
8dad409
to
df3796f
Compare
df3796f
to
87d4065
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, wojtek-t Associated issue: 47604 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Review the full test history for this PR. |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Automatic merge from submit-queue Add pod preemption to the scheduler **What this PR does / why we need it**: This is the last of a series of PRs to add priority-based preemption to the scheduler. This PR connects the preemption logic to the scheduler workflow. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #48646 **Special notes for your reviewer**: This PR includes other PRs which are under review (#50805, #50405, #50190). All the new code is located in 43627af. **Release note**: ```release-note Add priority-based preemption to the scheduler. ``` ref/ #47604 /assign @davidopp @kubernetes/sig-scheduling-pr-reviews
What this PR does / why we need it: This PR adds capability to change precomputed predicate metadata and let's us add/remove pods to the precomputed metadata efficiently without the need ot recomputing everything upon addition/removal of pods. This PR is needed as a part of adding preemption logic to the scheduler.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
To make the review process a bit easier, there are three commits. The cleanup commit is only moving code and renaming some functions, without logic changes.
Release note:
ref/ #47604
ref/ #48646
/assign @wojtek-t
@kubernetes/sig-scheduling-pr-reviews @davidopp