-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce node memory pressure condition to scheduler #25531
Introduce node memory pressure condition to scheduler #25531
Conversation
0a0591e
to
b8f551d
Compare
func isPodBestEffort(pod *api.Pod) bool { | ||
// A Pod is BestEffort iff requests & limits are not specified | ||
// for any resource across all containers. | ||
for _, container := range pod.Spec.Containers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pretty sure it has to be already implemented ;-) Thanks for the link. Updated.
/cc @vishh - we will add disk pressure as well once that condition is available. @ingvagabund - I will notify you as I push updates to the referenced PRs, I am hoping after today it just goes down to 1 commit this PR would carry after I do a final round of cleanups (assuming merge queue merges stuff). |
@ingvagabund - the out of resource killing pr is down to 1 commit, so you should be able to rebase your pr relative to that so its easier for folks to follow. #21274 |
@@ -111,6 +111,9 @@ func init() { | |||
Weight: 1, | |||
}, | |||
) | |||
|
|||
// Fit is determined by node memory pressure condition. | |||
factory.RegisterFitPredicate("NodeMemoryPressure", predicates.NodeMemoryPressurePredicate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should either put this under defaultPredicates() or GeneralPredicates(). Putting it where you did, it is disabled by default.
Put in defaultPredicates() if you only want it to run in the scheduler; put it in GeneralPredicates() if you want it to run in the scheduler and kubelet admission controller. (Only put it in one or the other, not both.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering what should be the best place to put it in and if it should be run by default. Thinking about it for the second time, it should be as the kubelet will evict all pods by default if its node is under memory pressure. @derekwaynecarr proof me wrong if we need the predicate in the kubelet admission controller. The predicate is mainly for excluding the node from scheduling best effort pods. So scheduler should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scheduler should be enough.
Please add a test for the new scheduler predicate. |
b8f551d
to
d24a3ef
Compare
return qosutil.GetPodQos(pod) == qosutil.BestEffort | ||
} | ||
|
||
// MemoryPressurePredicate checks if a pod can be scheduled on a node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/MemoryPressurePredicate/CheckNodeMemoryPressurePredicate
63c1b1d
to
14a81c9
Compare
45bbe0d
to
3126caa
Compare
3126caa
to
af1debb
Compare
af1debb
to
2db76a2
Compare
@davidopp PTAL |
the code lgtm, will let @davidopp take a pass. this needs to get in before code freeze since eviction is now enabled on node. |
@@ -2641,3 +2641,118 @@ func TestPodToleratesTaints(t *testing.T) { | |||
} | |||
} | |||
} | |||
|
|||
func makeNonBestEffortResources(milliCPU int64, memory int64) api.ResourceList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just call this getResourceList
?
see an example: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourcequota/resource_quota_controller_test.go#L33
There is nothing here that actually guaranteed its not best effort since you can pass 0 for both values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predicate_test.go defines makeAllocatableResources
function which I can use. I don't know why I did not use that before.
@ingvagabund - just had two minor nits if you want to fix them up... |
2db76a2
to
55b92f1
Compare
pod: bestEffortPod, | ||
nodeInfo: makeEmptyNodeInfo(memoryPressureNode), | ||
fits: false, | ||
name: "best-effort pod schedulable on node with memory pressure condition on", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/schedulable/not schedulable/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
… pods for nodes that reports memory pressury. Introduce unit-test for CheckNodeMemoryPressurePredicate Following work done in kubernetes#14943
55b92f1
to
b95b30b
Compare
LGTM |
GCE e2e build/test passed for commit b95b30b. |
Automatic merge from submit-queue |
…-pressure-to-scheduler Automatic merge from submit-queue Introduce node memory pressure condition to scheduler Following the work done by @derekwaynecarr at kubernetes#21274, introducing memory pressure predicate for scheduler. Missing: * write down unit-test * test the implementation At the moment this is a heads up for further discussion how the new node's memory pressure condition should be handled in the generic scheduler. **Additional info** * Based on [1], only best effort pods are subject to filtering. * Based on [2], best effort pods are those pods "iff requests & limits are not specified for any resource across all containers". [1] https://github.com/derekwaynecarr/kubernetes/blob/542668cc7998fe0acb315a43731e1f45ecdcc85b/docs/proposals/kubelet-eviction.md#scheduler [2] kubernetes#14943
Following the work done by @derekwaynecarr at #21274, introducing memory pressure predicate for scheduler.
Missing:
At the moment this is a heads up for further discussion how the new node's memory pressure condition should be handled in the generic scheduler.
Additional info
[1] https://github.com/derekwaynecarr/kubernetes/blob/542668cc7998fe0acb315a43731e1f45ecdcc85b/docs/proposals/kubelet-eviction.md#scheduler
[2] #14943