-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Critical pod priorityClass addition #58835
Critical pod priorityClass addition #58835
Conversation
/cc @bsalamat |
e444fb1
to
6aaf606
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @ravisantoshgudimetla! Looks very good. Just a couple of minor comments.
pkg/apis/core/v1/helper/helpers.go
Outdated
if ns != kubeapi.NamespaceSystem { | ||
return false | ||
} | ||
if _, ok := schedulerapi.SystemPriorityClasses[priorityClassName]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the design of Priority, we decided that internal components of K8s shouldn't care about the PriorityClassName
and should always look at the value of priority. So, please change the function to receive the integer value of priority (PodSpec.Priority) and change this condition to:
if podPriority >= schedulerapi.SystemCriticalPriority {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -91,9 +92,9 @@ func TestPriorityClassAdmission(t *testing.T) { | |||
Kind: "PriorityClass", | |||
}, | |||
ObjectMeta: metav1.ObjectMeta{ | |||
Name: "system-cluster-critical", | |||
Name: "schedulerapi.SystemClusterCritical", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this an unintended "replace all"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. Good catch. Thanks.
6aaf606
to
16cf708
Compare
Thanks. My question is more on the lines of how to in enable CI. I already have e2e test case added to preemption.go which you can find in the commit now. PTAL |
pkg/kubelet/types/pod_update.go
Outdated
return IsCritical(pod.Namespace, pod.Annotations) | ||
if IsCritical(pod.Namespace, pod.Annotations) { | ||
return true | ||
} else if pod.Spec.Priority != nil && v1helper.IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You don't need else
here. Also you can write the whole condition as:
return IsCritical(pod.Namespace, pod.Annotations) || (pod.Spec.Priority != nil && v1helper.IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The comment about return statement seems fine but I think having a one liner may cause readability problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to be too pedantic, but isn't it effectively the same one line now? ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is :). I initially wanted to have one more nested if instead of && but thought it will be difficult to read as well 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. 😃 I would still prefer the return
to directly return the condition without the if statement, but it is up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Sorry, I thought I pushed the one liner that you wanted earlier. Just updated it now.
test/e2e/scheduling/preemption.go
Outdated
// This test verifies that when a critical pod is created and no node with | ||
// enough resources is found, scheduler preempts a lower priority pod to schedule | ||
// this critical pod. | ||
It("validates lower pod preemption by critical pod", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/lower pod/lower priority pod/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
16cf708
to
1abfc56
Compare
@bsalamat Thanks for the review. I addressed your comments. PTAL. |
/lgtm |
1abfc56
to
519cef6
Compare
hack/import-restrictions.yaml
Outdated
@@ -7,6 +7,7 @@ | |||
- k8s.io/kubernetes/pkg/fieldpath | |||
- k8s.io/kubernetes/pkg/util | |||
- k8s.io/api/core/v1 | |||
- k8s.io/kubernetes/pkg/scheduler/api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency looks bad.
@sttts Caught one!
52e86cd
to
13ce943
Compare
13ce943
to
11015ac
Compare
reapplying |
@@ -166,6 +155,11 @@ func (p *PriorityPlugin) admitPod(a admission.Attributes) error { | |||
} | |||
if utilfeature.DefaultFeatureGate.Enabled(features.PodPriority) { | |||
var priority int32 | |||
// TODO: @ravig - This is for backwards compatibility to ensure that critical pods with annotations just work fine. | |||
// Remove when no longer needed. | |||
if len(pod.Spec.PriorityClassName) == 0 && kubelettypes.IsCritical(pod.Namespace, pod.Annotations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't trust the pod.Namespace
here. The user could have lied in the object he is creating. You need to check the attributes namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't trust the pod.Namespace here. The user could have lied in the object he is creating. You need to check the attributes namespace.
Since this is commonly confused: we cannot ensure the well-formedness of the object's namespace before mutation admission because a mutating admission plugin can change it. We can only confirm and/or force the namespace after mutation is completed. Another admission plugin could later "make valid" this field (though they shouldn't) and you'd end up having set this for that user.
@kubernetes/sig-api-machinery-bugs an edge we care about? I suppose it is rather unlikely....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome explanation @deads2k. I have changed the code to get namepsace using a.GetNamespace()
. As a follow up, since the mutation is not specific to pod's namespace could other subresources of pod like Annotations, Spec, tolerations etc need to be addressed the same way using a.GetSubResource()
11015ac
to
1cc1087
Compare
/retest flake #59426 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like the annotation to not have meaning in clusters that did not previously enable the critical pod annotation feature.
SystemCriticalPriority = 2 * HighestUserDefinablePriority | ||
// NOTE: In order to avoid conflict of names with user-defined priority classes, all the names must | ||
// start with scheduling.SystemPriorityClassPrefix which is by default "system-". | ||
SystemClusterCritical = "system-cluster-critical" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these priorities defined in the API storage? If not, why not? IMHO this should work just like namespaces, where the default namespace and kube-system namespace are always in storage, and therefore there is no special magic namespaces. I am worried that this is blessing magic priorities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created an issue to track this - #60178
func IsCriticalPod(pod *v1.Pod) bool { | ||
return IsCritical(pod.Namespace, pod.Annotations) | ||
return IsCritical(pod.Namespace, pod.Annotations) || (pod.Spec.Priority != nil && IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the IsCriticalPod check should still gate respecting the annotation by enabling the ExperimentalCriticalPodAnnotation feature.
This check right now would make any pod or workload API pod template that had this annotation in a cluster that did not enable this feature prior suddenly elevate the priority of the workload. I think this is a mistake and would represent a surprise during upgrades (since prior clusters that did not enable this feature never respected the annotation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsCriticalPod fn predates my changes, I did a quick search in the codebase where this fn is being called from and in every place there is featuregate enabled for ExperimentalCriticalPodAnnotation except for https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/preemption/preemption.go#L233. I need to look further why the featuregating is not needed there. While the approach may not be elegant, it still does the job of gate respecting.
But the point you raised is valid from priority admission controller standpoint, where we are calling IsCritical fn. I have updated the PR to include a check for ExperimentalCriticalPodAnnotation, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature gate is also checked there, just at the entry point function for the package: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/preemption/preemption.go#L67. The call will always exit if the feature gate is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the clarification. its a public method so i had thought centralizing the feature gate would have been preferred here as well.
1cc1087
to
7da5a2e
Compare
/retest |
/approve |
once more! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, derekwaynecarr, ravisantoshgudimetla, sjenning, yujuhong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
@bsalamat - Apologies for the delay. This PR is to ensure that all pods with priorityClassName
system-node-critical
andsystem-cluster-critical
will be critical pods while preserving backwards compatibility.Special notes for your reviewer:
xref: Move Priority and Preemption to Beta #57471
Release note: