Fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion #91750

Huang-Wei · 2020-06-04T00:53:14Z

What type of PR is this?

/kind bug
/sig scheduling

What this PR does / why we need it:

This PR tries to fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion. Detailed reproducing steps please check #85677 and the integration test of this PR.

Which issue(s) this PR fixes:

Fixes #85677

Special notes for your reviewer:

A side effect is that upon unexpected internal errors during preemption, the nominatedNode would be returned as "", so the pod's non-nil nominatedNodeName will be cleared. It means its reserved room on a node can be occupied by other pods before it's scheduled next time.
But compared to the issue that its nominatedNode is unable to be cleared, the impact is negligible.

Does this PR introduce a user-facing change?:

Fixed an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion.

…e nominated node is deleted

Huang-Wei · 2020-06-04T00:54:25Z

test/integration/scheduler/preemption_test.go

@@ -1023,6 +1024,93 @@ func TestNominatedNodeCleanUp(t *testing.T) {
 	}
 }

+func TestNominatedNodeCleanUpUponNodeDeletion(t *testing.T) {


This test can be merged with existing TestNominatedNodeCleanUp using table-test style.

k8s-ci-robot · 2020-06-04T00:55:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [Huang-Wei]
~~test/integration/scheduler/OWNERS~~ [Huang-Wei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Huang-Wei · 2020-06-04T01:41:45Z

/hold

Huang-Wei · 2020-06-04T15:54:18Z

/assign @ahg-g

ahg-g

one clarification, otherwise looks good to me.

ahg-g · 2020-06-05T19:18:17Z

test/integration/scheduler/util.go

@@ -483,7 +483,7 @@ func noPodsInNamespace(c clientset.Interface, podNamespace string) wait.Conditio
 // cleanupPodsInNamespace deletes the pods in the given namespace and waits for them to
 // be actually deleted.
 func cleanupPodsInNamespace(cs clientset.Interface, t *testing.T, ns string) {
-	if err := cs.CoreV1().Pods(ns).DeleteCollection(context.TODO(), metav1.DeleteOptions{}, metav1.ListOptions{}); err != nil {
+	if err := cs.CoreV1().Pods(ns).DeleteCollection(context.TODO(), *metav1.NewDeleteOptions(0), metav1.ListOptions{}); err != nil {


what does this change mean?

Without this, this test would fail at the cleanup step: defer cleanupPodsInNamespace(cs, t, testCtx.NS.Name).

This puzzled me as well, and here is my investigation. In the new test, it deleted a node, and then cleaned up all the Pods using DeleteCollection. However, one of the pods was placed on the deleted node. I think the immediate Pods deletion action resulted in the failure.

I tried to use curl to simulate this case. You can notice that pod2 keeps in terminating state instead of being deleted:

root@wei-dev:~/manifests/certs# k get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 46s 192.168.192.168 node1 <none> <none> pod2 1/1 Running 0 3s 192.168.192.168 node2 <none> <none> root@wei-dev:~/manifests# k delete no node2 node "node2" deleted root@wei-dev:~/manifests/certs# curl --key key --cert cert --cacert cacert -X DELETE https://localhost:6443/api/v1/namespaces/default/pods root@wei-dev:~/manifests/certs# k get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod2 1/1 Terminating 0 56s 192.168.192.168 node2 <none> <none> # After 30 seconds root@wei-dev:~/manifests/certs# k get pod -o wide No resources found in default namespace. # But in an integration test, I guess the Pod will be kept there, that's why the test would fail without enforcing a 0 termination grace period

I'm not sure it's a bug in DeleteCollection or as expected, but as setting it to an explicit 0 grace period won't result in any side effect, I think we can just merge it as is, WDYT?

ahg-g · 2020-06-05T19:26:26Z

/hold
/lgtm

Huang-Wei · 2020-06-09T00:32:55Z

/hold cancel

Fix an issue that a Pod's nominatedNodeName cannot be cleared when th…

369a900

…e nominated node is deleted

Huang-Wei commented Jun 4, 2020

View reviewed changes

k8s-ci-robot requested review from ahg-g and liu-cong June 4, 2020 00:54

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2020

k8s-ci-robot assigned ahg-g Jun 4, 2020

ahg-g reviewed Jun 5, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 5, 2020

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2020

k8s-ci-robot merged commit 5248bef into kubernetes:master Jun 9, 2020

k8s-ci-robot added this to the v1.19 milestone Jun 9, 2020

ahg-g mentioned this pull request Jun 9, 2020

[Failing Test] Conformance - GCE - master (ci-kubernetes-gce-conformance-latest) #91911

Closed

Huang-Wei deleted the clear-nnn branch June 9, 2020 17:09

Huang-Wei mentioned this pull request Jun 10, 2020

[sig-scheduling] SchedulerPreemption [Serial] PreemptionExecutionPath runs ReplicaSets to verify preemption running path [Conformance]Changes #91972

Closed

This was referenced Jun 10, 2020

Revert "Fix an issue that a Pod's nominatedNodeName cannot be cleared… #91973

Merged

Scheduler does not clear nominatedNodeName on Pod when node is no longer valid #85677

Closed

github-actions bot mentioned this pull request Jun 16, 2020

Week Ending June 14, 2020 dev-obs/actus#179

Open

Huang-Wei mentioned this pull request Jun 30, 2020

The Pod is eligible to preempt when previous nominanted node is UnschedulableAndUnresolvable #92604

Merged

Huang-Wei mentioned this pull request Dec 2, 2021

scheuler not always clear a preemptor's nominatedNodeName as expected #106780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion #91750

Fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion #91750

Huang-Wei commented Jun 4, 2020 •

edited

Loading

Huang-Wei Jun 4, 2020

k8s-ci-robot commented Jun 4, 2020

Huang-Wei commented Jun 4, 2020

Huang-Wei commented Jun 4, 2020

ahg-g left a comment

ahg-g Jun 5, 2020

Huang-Wei Jun 6, 2020

ahg-g commented Jun 5, 2020

Huang-Wei commented Jun 9, 2020

Fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion #91750

Fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion #91750

Conversation

Huang-Wei commented Jun 4, 2020 • edited Loading

Huang-Wei Jun 4, 2020

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 4, 2020

Huang-Wei commented Jun 4, 2020

Huang-Wei commented Jun 4, 2020

ahg-g left a comment

Choose a reason for hiding this comment

ahg-g Jun 5, 2020

Choose a reason for hiding this comment

Huang-Wei Jun 6, 2020

Choose a reason for hiding this comment

ahg-g commented Jun 5, 2020

Huang-Wei commented Jun 9, 2020

Huang-Wei commented Jun 4, 2020 •

edited

Loading