-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion #91750
Conversation
…e nominated node is deleted
@@ -1023,6 +1024,93 @@ func TestNominatedNodeCleanUp(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestNominatedNodeCleanUpUponNodeDeletion(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test can be merged with existing TestNominatedNodeCleanUp
using table-test style.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/assign @ahg-g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one clarification, otherwise looks good to me.
@@ -483,7 +483,7 @@ func noPodsInNamespace(c clientset.Interface, podNamespace string) wait.Conditio | |||
// cleanupPodsInNamespace deletes the pods in the given namespace and waits for them to | |||
// be actually deleted. | |||
func cleanupPodsInNamespace(cs clientset.Interface, t *testing.T, ns string) { | |||
if err := cs.CoreV1().Pods(ns).DeleteCollection(context.TODO(), metav1.DeleteOptions{}, metav1.ListOptions{}); err != nil { | |||
if err := cs.CoreV1().Pods(ns).DeleteCollection(context.TODO(), *metav1.NewDeleteOptions(0), metav1.ListOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this change mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, this test would fail at the cleanup step: defer cleanupPodsInNamespace(cs, t, testCtx.NS.Name)
.
This puzzled me as well, and here is my investigation. In the new test, it deleted a node, and then cleaned up all the Pods using DeleteCollection
. However, one of the pods was placed on the deleted node. I think the immediate Pods deletion action resulted in the failure.
I tried to use curl to simulate this case. You can notice that pod2 keeps in terminating state instead of being deleted:
root@wei-dev:~/manifests/certs# k get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 46s 192.168.192.168 node1 <none> <none>
pod2 1/1 Running 0 3s 192.168.192.168 node2 <none> <none>
root@wei-dev:~/manifests# k delete no node2
node "node2" deleted
root@wei-dev:~/manifests/certs# curl --key key --cert cert --cacert cacert -X DELETE https://localhost:6443/api/v1/namespaces/default/pods
root@wei-dev:~/manifests/certs# k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod2 1/1 Terminating 0 56s 192.168.192.168 node2 <none> <none>
# After 30 seconds
root@wei-dev:~/manifests/certs# k get pod -o wide
No resources found in default namespace.
# But in an integration test, I guess the Pod will be kept there, that's why the test would fail without enforcing a 0 termination grace period
I'm not sure it's a bug in DeleteCollection
or as expected, but as setting it to an explicit 0 grace period won't result in any side effect, I think we can just merge it as is, WDYT?
/hold |
/hold cancel |
What type of PR is this?
/kind bug
/sig scheduling
What this PR does / why we need it:
This PR tries to fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion. Detailed reproducing steps please check #85677 and the integration test of this PR.
Which issue(s) this PR fixes:
Fixes #85677
Special notes for your reviewer:
A side effect is that upon unexpected internal errors during preemption, the nominatedNode would be returned as "", so the pod's non-nil nominatedNodeName will be cleared. It means its reserved room on a node can be occupied by other pods before it's scheduled next time.
But compared to the issue that its nominatedNode is unable to be cleared, the impact is negligible.
Does this PR introduce a user-facing change?: