Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical pod priorityClass addition #58835

Conversation

ravisantoshgudimetla
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla commented Jan 25, 2018

What this PR does / why we need it:
@bsalamat - Apologies for the delay. This PR is to ensure that all pods with priorityClassName system-node-critical and system-cluster-critical will be critical pods while preserving backwards compatibility.

Special notes for your reviewer:

  • Moved some constants and other data structures to scheduler/api/types.go where other constants are present.
  • An automatic assignment of critical priorities to pods based on critical pod annotation for backwards compatibility including some unit tests.
    xref: Move Priority and Preemption to Beta #57471

Release note:

Critical pods to use priorityClasses.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 25, 2018
@ravisantoshgudimetla
Copy link
Contributor Author

/cc @bsalamat

@k8s-ci-robot k8s-ci-robot requested a review from bsalamat January 25, 2018 22:53
@ravisantoshgudimetla ravisantoshgudimetla changed the title Critical pod priorityClass addition [WIP]Critical pod priorityClass addition Jan 25, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 25, 2018
@ravisantoshgudimetla ravisantoshgudimetla force-pushed the critical-pod-with-priority branch 2 times, most recently from e444fb1 to 6aaf606 Compare January 25, 2018 23:23
Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ravisantoshgudimetla! Looks very good. Just a couple of minor comments.

if ns != kubeapi.NamespaceSystem {
return false
}
if _, ok := schedulerapi.SystemPriorityClasses[priorityClassName]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the design of Priority, we decided that internal components of K8s shouldn't care about the PriorityClassName and should always look at the value of priority. So, please change the function to receive the integer value of priority (PodSpec.Priority) and change this condition to:

if podPriority >= schedulerapi.SystemCriticalPriority {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -91,9 +92,9 @@ func TestPriorityClassAdmission(t *testing.T) {
Kind: "PriorityClass",
},
ObjectMeta: metav1.ObjectMeta{
Name: "system-cluster-critical",
Name: "schedulerapi.SystemClusterCritical",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this an unintended "replace all"?

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is. Good catch. Thanks.

@bsalamat
Copy link
Member

RE e2e tests, since priority and preemption is an alpha feature, if you intend to add e2e test to CI, you will have to do something like this PR. And, here is how you define the feature in the tests.

@ravisantoshgudimetla
Copy link
Contributor Author

ravisantoshgudimetla commented Jan 26, 2018

RE e2e tests, since priority and preemption is an alpha feature, if you intend to add e2e test to CI, you will have to do something like this PR. And, here is how you define the feature in the tests.

Thanks. My question is more on the lines of how to in enable CI. I already have e2e test case added to preemption.go which you can find in the commit now. PTAL

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2018
return IsCritical(pod.Namespace, pod.Annotations)
if IsCritical(pod.Namespace, pod.Annotations) {
return true
} else if pod.Spec.Priority != nil && v1helper.IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You don't need else here. Also you can write the whole condition as:

return IsCritical(pod.Namespace, pod.Annotations) || (pod.Spec.Priority != nil && v1helper.IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority))

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Jan 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The comment about return statement seems fine but I think having a one liner may cause readability problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to be too pedantic, but isn't it effectively the same one line now? ;-)

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Jan 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is :). I initially wanted to have one more nested if instead of && but thought it will be difficult to read as well 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. 😃 I would still prefer the return to directly return the condition without the if statement, but it is up to you.

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Jan 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Sorry, I thought I pushed the one liner that you wanted earlier. Just updated it now.

// This test verifies that when a critical pod is created and no node with
// enough resources is found, scheduler preempts a lower priority pod to schedule
// this critical pod.
It("validates lower pod preemption by critical pod", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/lower pod/lower priority pod/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2018
@ravisantoshgudimetla
Copy link
Contributor Author

@bsalamat Thanks for the review. I addressed your comments. PTAL.

@bsalamat
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 27, 2018
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 27, 2018
@ravisantoshgudimetla
Copy link
Contributor Author

/assign @deads2k @lavalamp @yujuhong

@@ -7,6 +7,7 @@
- k8s.io/kubernetes/pkg/fieldpath
- k8s.io/kubernetes/pkg/util
- k8s.io/api/core/v1
- k8s.io/kubernetes/pkg/scheduler/api
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency looks bad.

@sttts Caught one!

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2018
@ravisantoshgudimetla
Copy link
Contributor Author

@bsalamat I have updated the PR as per @deads2k's comments. Now the IsCriticalPodBasedOnPriority helper fn is in pod_update.go instead of v1helper. PTAL.

@sjenning
Copy link
Contributor

reapplying
/lgtm
/assign derekwaynecarr

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2018
@@ -166,6 +155,11 @@ func (p *PriorityPlugin) admitPod(a admission.Attributes) error {
}
if utilfeature.DefaultFeatureGate.Enabled(features.PodPriority) {
var priority int32
// TODO: @ravig - This is for backwards compatibility to ensure that critical pods with annotations just work fine.
// Remove when no longer needed.
if len(pod.Spec.PriorityClassName) == 0 && kubelettypes.IsCritical(pod.Namespace, pod.Annotations) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't trust the pod.Namespace here. The user could have lied in the object he is creating. You need to check the attributes namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't trust the pod.Namespace here. The user could have lied in the object he is creating. You need to check the attributes namespace.

Since this is commonly confused: we cannot ensure the well-formedness of the object's namespace before mutation admission because a mutating admission plugin can change it. We can only confirm and/or force the namespace after mutation is completed. Another admission plugin could later "make valid" this field (though they shouldn't) and you'd end up having set this for that user.

@kubernetes/sig-api-machinery-bugs an edge we care about? I suppose it is rather unlikely....

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome explanation @deads2k. I have changed the code to get namepsace using a.GetNamespace(). As a follow up, since the mutation is not specific to pod's namespace could other subresources of pod like Annotations, Spec, tolerations etc need to be addressed the same way using a.GetSubResource()

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/bug Categorizes issue or PR as related to a bug. labels Feb 21, 2018
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2018
@ravisantoshgudimetla
Copy link
Contributor Author

ravisantoshgudimetla commented Feb 21, 2018

/retest

flake #59426

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like the annotation to not have meaning in clusters that did not previously enable the critical pod annotation feature.

SystemCriticalPriority = 2 * HighestUserDefinablePriority
// NOTE: In order to avoid conflict of names with user-defined priority classes, all the names must
// start with scheduling.SystemPriorityClassPrefix which is by default "system-".
SystemClusterCritical = "system-cluster-critical"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these priorities defined in the API storage? If not, why not? IMHO this should work just like namespaces, where the default namespace and kube-system namespace are always in storage, and therefore there is no special magic namespaces. I am worried that this is blessing magic priorities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created an issue to track this - #60178

func IsCriticalPod(pod *v1.Pod) bool {
return IsCritical(pod.Namespace, pod.Annotations)
return IsCritical(pod.Namespace, pod.Annotations) || (pod.Spec.Priority != nil && IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the IsCriticalPod check should still gate respecting the annotation by enabling the ExperimentalCriticalPodAnnotation feature.

This check right now would make any pod or workload API pod template that had this annotation in a cluster that did not enable this feature prior suddenly elevate the priority of the workload. I think this is a mistake and would represent a surprise during upgrades (since prior clusters that did not enable this feature never respected the annotation).

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsCriticalPod fn predates my changes, I did a quick search in the codebase where this fn is being called from and in every place there is featuregate enabled for ExperimentalCriticalPodAnnotation except for https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/preemption/preemption.go#L233. I need to look further why the featuregating is not needed there. While the approach may not be elegant, it still does the job of gate respecting.

But the point you raised is valid from priority admission controller standpoint, where we are calling IsCritical fn. I have updated the PR to include a check for ExperimentalCriticalPodAnnotation, PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature gate is also checked there, just at the entry point function for the package: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/preemption/preemption.go#L67. The call will always exit if the feature gate is disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the clarification. its a public method so i had thought centralizing the feature gate would have been preferred here as well.

@ravisantoshgudimetla
Copy link
Contributor Author

/retest

@derekwaynecarr
Copy link
Member

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2018
@sjenning
Copy link
Contributor

once more!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 23, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, derekwaynecarr, ravisantoshgudimetla, sjenning, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 755ab97 into kubernetes:master Feb 23, 2018
@ravisantoshgudimetla ravisantoshgudimetla deleted the critical-pod-with-priority branch February 23, 2018 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.