Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion #91750

Merged
merged 1 commit into from
Jun 9, 2020

Conversation

Huang-Wei
Copy link
Member

@Huang-Wei Huang-Wei commented Jun 4, 2020

What type of PR is this?

/kind bug
/sig scheduling

What this PR does / why we need it:

This PR tries to fix an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion. Detailed reproducing steps please check #85677 and the integration test of this PR.

Which issue(s) this PR fixes:

Fixes #85677

Special notes for your reviewer:

A side effect is that upon unexpected internal errors during preemption, the nominatedNode would be returned as "", so the pod's non-nil nominatedNodeName will be cleared. It means its reserved room on a node can be occupied by other pods before it's scheduled next time.
But compared to the issue that its nominatedNode is unable to be cleared, the impact is negligible.

Does this PR introduce a user-facing change?:

Fixed an issue that a Pod's nominatedNodeName cannot be cleared upon node deletion.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 4, 2020
@@ -1023,6 +1024,93 @@ func TestNominatedNodeCleanUp(t *testing.T) {
}
}

func TestNominatedNodeCleanUpUponNodeDeletion(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can be merged with existing TestNominatedNodeCleanUp using table-test style.

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and liu-cong June 4, 2020 00:54
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 4, 2020
@Huang-Wei
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2020
@Huang-Wei
Copy link
Member Author

/assign @ahg-g

Copy link
Member

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one clarification, otherwise looks good to me.

@@ -483,7 +483,7 @@ func noPodsInNamespace(c clientset.Interface, podNamespace string) wait.Conditio
// cleanupPodsInNamespace deletes the pods in the given namespace and waits for them to
// be actually deleted.
func cleanupPodsInNamespace(cs clientset.Interface, t *testing.T, ns string) {
if err := cs.CoreV1().Pods(ns).DeleteCollection(context.TODO(), metav1.DeleteOptions{}, metav1.ListOptions{}); err != nil {
if err := cs.CoreV1().Pods(ns).DeleteCollection(context.TODO(), *metav1.NewDeleteOptions(0), metav1.ListOptions{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this change mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, this test would fail at the cleanup step: defer cleanupPodsInNamespace(cs, t, testCtx.NS.Name).

This puzzled me as well, and here is my investigation. In the new test, it deleted a node, and then cleaned up all the Pods using DeleteCollection. However, one of the pods was placed on the deleted node. I think the immediate Pods deletion action resulted in the failure.

I tried to use curl to simulate this case. You can notice that pod2 keeps in terminating state instead of being deleted:

root@wei-dev:~/manifests/certs# k get po -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP                NODE    NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          46s   192.168.192.168   node1   <none>           <none>
pod2   1/1     Running   0          3s    192.168.192.168   node2   <none>           <none>

root@wei-dev:~/manifests# k delete no node2
node "node2" deleted

root@wei-dev:~/manifests/certs# curl --key key --cert cert --cacert cacert -X DELETE https://localhost:6443/api/v1/namespaces/default/pods

root@wei-dev:~/manifests/certs# k get pod -o wide
NAME   READY   STATUS        RESTARTS   AGE   IP                NODE    NOMINATED NODE   READINESS GATES
pod2   1/1     Terminating   0          56s   192.168.192.168   node2   <none>           <none>

# After 30 seconds
root@wei-dev:~/manifests/certs# k get pod -o wide
No resources found in default namespace.
# But in an integration test, I guess the Pod will be kept there, that's why the test would fail without enforcing a 0 termination grace period

I'm not sure it's a bug in DeleteCollection or as expected, but as setting it to an explicit 0 grace period won't result in any side effect, I think we can just merge it as is, WDYT?

@ahg-g
Copy link
Member

ahg-g commented Jun 5, 2020

/hold
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 5, 2020
@Huang-Wei
Copy link
Member Author

/hold cancel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler does not clear nominatedNodeName on Pod when node is no longer valid
3 participants