-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore pods for quota marked for deletion whose node is unreachable #46542
Ignore pods for quota marked for deletion whose node is unreachable #46542
Conversation
@smarterclayton - I would like your input on this. Another alternative approach is to not charge the user for a pod whose deletion timestamp + grace period has passed? After opening this, I am thinking that may be a better option as it covers all "pod is stuck terminating" scenarios pretty well. |
48f0d45
to
1bee18a
Compare
FYI, the 1.5 PR that change the force-delete behavior #36017 I like the "don't include pods that are beyond their graceful termination time" as that typically indicates the hold up in pod deletion is a problem on the system side, not the user side. i.e. there is nothing the user can do to create this situation... unless they can create container processes that can resist a SIGKILL (uninterruptable sleep). |
1bee18a
to
d75f5b9
Compare
d75f5b9
to
75074a7
Compare
@sjenning @smarterclayton @deads2k -- finally got around to updating this, i think it gives a more user friendly experience. once the pod status is updated as NodeLost, the quota system will see the pod update, and release the pods charge for quota. |
@@ -55,7 +56,7 @@ func TestPodReplenishmentUpdateFunc(t *testing.T) { | |||
ObjectMeta: metav1.ObjectMeta{Namespace: "test", Name: "pod"}, | |||
Status: v1.PodStatus{Phase: v1.PodFailed}, | |||
} | |||
updateFunc := PodReplenishmentUpdateFunc(&options) | |||
updateFunc := PodReplenishmentUpdateFunc(&options, clock.RealClock{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realclock in a test looks weird.
// - pod has been marked for deletion and grace period has expired. | ||
func QuotaPod(pod *api.Pod, clock clock.Clock) bool { | ||
// if pod is terminal, ignore it for quota | ||
if api.PodFailed == pod.Status.Phase || api.PodSucceeded == pod.Status.Phase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pods have conditions now, right? Since you're rewriting the function anyway, can we update to conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pod Phase is still the state machine we must follow to know that a pod has reached the end of its life. the PodCondition is not sufficient for that need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pod Phase is still the state machine we must follow to know that a pod has reached the end of its life. the PodCondition is not sufficient for that need.
This is really weird. It's been over two years since we decided that conditions were a better choice than phases. When is the kubelet going to update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never because of backwards API compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never because of backwards API compatibility.
Eh, it only skews two releases. I'd rather have all our code keep as current as compatibility allows so developers aren't trying to remember: well these conditions match these phases. Cruft like this is super hard to deal with in controllers and utilities.
pkg/quota/evaluator/core/pods.go
Outdated
if api.PodFailed == pod.Status.Phase || api.PodSucceeded == pod.Status.Phase { | ||
return false | ||
} | ||
// if pods are stuck terminating (for example, a node is lost), we do not want |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be obvious to you, but not to me. deletiontimestamp non-nil means its terminating? I thought an update had to come back to recognize it as a terminating.
If you mean, "deleted pods that should be gone should not be charged", then this code makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means "deleted pods that should be gone should not be charged", i will look to clarify comment.
Any progress on this PR? The existing issue in 1.5+ is really bad in clusters with quotas since replacement replicas can't spin up. We keep getting hit by this in prod. I'd be happy to pick this up and finish it if needed. |
I will rebase this and get it in for 1.8 |
75074a7
to
2618611
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, derekwaynecarr Associated issue: 52436 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/sig api-machinery |
/test pull-kubernetes-unit |
/retest Review the full test history for this PR. |
1 similar comment
/retest Review the full test history for this PR. |
/retest |
@derekwaynecarr: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781) |
Any chance for a 1.7 cherrypick? |
Why didn't we fix the node controller? |
@jdumars What's the best way to contact him? Is it easier if I open a cherrypick PR? |
@jsravn yes, that would be great. The cherrypick tool should get the job done. |
Cherry picked @ #53332 |
Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
@lavalamp -- the node controller was not broken. it intentionally changed the behavior for data safety concerns. |
Was this backported to Kubernets 1.5 or 1.6? |
@songbird159 Nope only 1.7. I think 1.5 and 1.6 are EOL now. |
What this PR does / why we need it:
Traditionally, we charge to quota all pods that are in a non-terminal phase. We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost. Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable. The user expected the quota to be released. If the user was at their quota limit, their application may not be able to create a new replica given the current behavior. As a result, this PR ignores pods marked for deletion that have exceeded their grace period.
Which issue this PR fixes
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743
fixes #52436
Release note: