Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore pods for quota marked for deletion whose node is unreachable #46542

Conversation

derekwaynecarr
Copy link
Member

@derekwaynecarr derekwaynecarr commented May 26, 2017

What this PR does / why we need it:
Traditionally, we charge to quota all pods that are in a non-terminal phase. We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost. Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable. The user expected the quota to be released. If the user was at their quota limit, their application may not be able to create a new replica given the current behavior. As a result, this PR ignores pods marked for deletion that have exceeded their grace period.

Which issue this PR fixes
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743
fixes #52436

Release note:

Ignore pods marked for deletion that exceed their grace period in ResourceQuota

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 26, 2017
@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 26, 2017
@derekwaynecarr
Copy link
Member Author

@smarterclayton - I would like your input on this. Another alternative approach is to not charge the user for a pod whose deletion timestamp + grace period has passed? After opening this, I am thinking that may be a better option as it covers all "pod is stuck terminating" scenarios pretty well.

/cc @deads2k @sjenning

@k8s-ci-robot k8s-ci-robot requested review from sjenning and deads2k May 26, 2017 21:04
@derekwaynecarr derekwaynecarr added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label May 26, 2017
@derekwaynecarr derekwaynecarr force-pushed the quota-ignore-pod-whose-node-lost branch from 48f0d45 to 1bee18a Compare May 26, 2017 21:07
@sjenning
Copy link
Contributor

sjenning commented Jun 3, 2017

FYI, the 1.5 PR that change the force-delete behavior #36017

I like the "don't include pods that are beyond their graceful termination time" as that typically indicates the hold up in pod deletion is a problem on the system side, not the user side. i.e. there is nothing the user can do to create this situation... unless they can create container processes that can resist a SIGKILL (uninterruptable sleep).

@k8s-github-robot k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2017
@derekwaynecarr derekwaynecarr force-pushed the quota-ignore-pod-whose-node-lost branch from 1bee18a to d75f5b9 Compare July 3, 2017 21:01
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 3, 2017
@derekwaynecarr derekwaynecarr force-pushed the quota-ignore-pod-whose-node-lost branch from d75f5b9 to 75074a7 Compare July 3, 2017 21:02
@derekwaynecarr derekwaynecarr removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jul 3, 2017
@derekwaynecarr
Copy link
Member Author

@sjenning @smarterclayton @deads2k -- finally got around to updating this, i think it gives a more user friendly experience. once the pod status is updated as NodeLost, the quota system will see the pod update, and release the pods charge for quota.

@@ -55,7 +56,7 @@ func TestPodReplenishmentUpdateFunc(t *testing.T) {
ObjectMeta: metav1.ObjectMeta{Namespace: "test", Name: "pod"},
Status: v1.PodStatus{Phase: v1.PodFailed},
}
updateFunc := PodReplenishmentUpdateFunc(&options)
updateFunc := PodReplenishmentUpdateFunc(&options, clock.RealClock{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realclock in a test looks weird.

// - pod has been marked for deletion and grace period has expired.
func QuotaPod(pod *api.Pod, clock clock.Clock) bool {
// if pod is terminal, ignore it for quota
if api.PodFailed == pod.Status.Phase || api.PodSucceeded == pod.Status.Phase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pods have conditions now, right? Since you're rewriting the function anyway, can we update to conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pod Phase is still the state machine we must follow to know that a pod has reached the end of its life. the PodCondition is not sufficient for that need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pod Phase is still the state machine we must follow to know that a pod has reached the end of its life. the PodCondition is not sufficient for that need.

This is really weird. It's been over two years since we decided that conditions were a better choice than phases. When is the kubelet going to update?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never because of backwards API compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never because of backwards API compatibility.

Eh, it only skews two releases. I'd rather have all our code keep as current as compatibility allows so developers aren't trying to remember: well these conditions match these phases. Cruft like this is super hard to deal with in controllers and utilities.

if api.PodFailed == pod.Status.Phase || api.PodSucceeded == pod.Status.Phase {
return false
}
// if pods are stuck terminating (for example, a node is lost), we do not want
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be obvious to you, but not to me. deletiontimestamp non-nil means its terminating? I thought an update had to come back to recognize it as a terminating.

If you mean, "deleted pods that should be gone should not be charged", then this code makes sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means "deleted pods that should be gone should not be charged", i will look to clarify comment.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 18, 2017
@jsravn
Copy link
Contributor

jsravn commented Aug 7, 2017

Any progress on this PR? The existing issue in 1.5+ is really bad in clusters with quotas since replacement replicas can't spin up. We keep getting hit by this in prod. I'd be happy to pick this up and finish it if needed.

@derekwaynecarr
Copy link
Member Author

I will rebase this and get it in for 1.8

@derekwaynecarr derekwaynecarr force-pushed the quota-ignore-pod-whose-node-lost branch from 75074a7 to 2618611 Compare August 25, 2017 21:09
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, derekwaynecarr

Associated issue: 52436

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2017
@jdumars
Copy link
Member

jdumars commented Sep 13, 2017

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 13, 2017
@jdumars
Copy link
Member

jdumars commented Sep 13, 2017

/test pull-kubernetes-unit

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta
Copy link
Contributor

fejta commented Sep 14, 2017

/retest
ref kubernetes/test-infra#4568

@k8s-ci-robot
Copy link
Contributor

@derekwaynecarr: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-bazel da01c6d link /test pull-kubernetes-e2e-gce-bazel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

@k8s-github-robot k8s-github-robot merged commit 20a4112 into kubernetes:master Sep 15, 2017
@jsravn
Copy link
Contributor

jsravn commented Sep 15, 2017

Any chance for a 1.7 cherrypick?

@jdumars
Copy link
Member

jdumars commented Sep 15, 2017

@jsravn please contact @wojtek-t who is the 1.7.x patch manager.

@lavalamp
Copy link
Member

Why didn't we fix the node controller?

@jsravn
Copy link
Contributor

jsravn commented Sep 29, 2017

@jdumars What's the best way to contact him? Is it easier if I open a cherrypick PR?

@jdumars
Copy link
Member

jdumars commented Sep 29, 2017

@jsravn yes, that would be great. The cherrypick tool should get the job done.

@jsravn
Copy link
Contributor

jsravn commented Oct 2, 2017

Cherry picked @ #53332

@wojtek-t wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 3, 2017
k8s-github-robot pushed a commit that referenced this pull request Oct 4, 2017
…-upstream-release-1.7

Automatic merge from submit-queue.

Automated cherry pick of #46542

Cherry pick of #46542  on release-1.7.

#53239: Ignore pods for quota marked for deletion whose node is unreachable
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@derekwaynecarr
Copy link
Member Author

@lavalamp -- the node controller was not broken. it intentionally changed the behavior for data safety concerns.

@ehvs
Copy link

ehvs commented Apr 23, 2018

Was this backported to Kubernets 1.5 or 1.6?

@jsravn
Copy link
Contributor

jsravn commented Apr 24, 2018

@songbird159 Nope only 1.7. I think 1.5 and 1.6 are EOL now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods marked for deletion on nodes that are unreachable should not charge to quota