Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't recreate a mirror pod for static pod when node gets deleted #48339

Conversation

dixudx
Copy link
Member

@dixudx dixudx commented Jun 30, 2017

What this PR does / why we need it:

Currently kubelet will try to recreate a mirror pod for static pod when node gets deleted.

Which issue this PR fixes : fixes #48213

Special notes for your reviewer:

Release note:

don't recreate a mirror pod for static pod when node gets deleted

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 30, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @dixudx. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 30, 2017
@dixudx dixudx changed the title don't recreate static pods when node gets deleted don't recreate a mirror pod for static pod when node gets deleted Jun 30, 2017
@spiffxp
Copy link
Member

spiffxp commented Jun 30, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 30, 2017
@dixudx
Copy link
Member Author

dixudx commented Jul 2, 2017

@yujuhong @pmorie Waiting for your approval. Thanks.

@yujuhong
Copy link
Contributor

I get why this PR may be needed, but it's a bit strange to check whether the node exists in the code. If in the future, kubelet always tries to re-register itself (#9085), this change would not be useful.

@dixudx
Copy link
Member Author

dixudx commented Jul 11, 2017

@yujuhong If the kubelet keeps re-registering itself no matter what happened, then it cannot get drained and deleted any more.

@yujuhong
Copy link
Contributor

@yujuhong If the kubelet keeps re-registering itself no matter what happened, then it cannot get drained and deleted any more.

You can still drain the regular pods without a problem. As for the static pods, they are actually running on the node with or without the mirror pods. Deleting the mirror pods means nothing to the actual state on the node.
To decommission the node, you can delete the VM instance (when running in a cloud), or stops kubelet from running directly. Otherwise, if kubelet gets restarted (failing health check or node reboots), it will still register itself.

@dixudx
Copy link
Member Author

dixudx commented Aug 20, 2017

ping @tallclair @vishh

PTAL. Any comments on @yujuhong's suggestions.

@luxas
Copy link
Member

luxas commented Aug 23, 2017

cc @kubernetes/sig-node-pr-reviews PTAL

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 23, 2017
@dixudx
Copy link
Member Author

dixudx commented Sep 4, 2017

/assign @tallclair @vishh

From @yujuhong

I get why this PR may be needed, but it's a bit strange to check whether the node exists in the code. If in the future, kubelet always tries to re-register itself (#9085), this change would not be useful.

This PR would help fix #48213 from @antoineco. Current kubelet has not enabled auto re-registering (#9085 has been there for a long while) now, shall we take this PR as an alternative?

PTAL. Thanks.

@luxas
Copy link
Member

luxas commented Sep 4, 2017

@dixudx We're subject to code freeze for v1.8 which means only v1.8 bugs can be merged.
IIUC this PR is not a critical bugfix that has to get into v1.8 for it to ship, right?

@dixudx
Copy link
Member Author

dixudx commented Sep 4, 2017

@luxas Yes, this is a bug but not that critical.

@luxas
Copy link
Member

luxas commented Sep 4, 2017

Ok, then lets defer this to later

@@ -1597,7 +1597,14 @@ func (kl *Kubelet) syncPod(o syncPodOptions) error {
if err := kl.podManager.DeleteMirrorPod(podFullName); err != nil {
glog.Errorf("Failed deleting mirror pod %q: %v", format.Pod(mirrorPod), err)
} else {
deleted = true
opts := metav1.GetOptions{}
_, err := kl.kubeClient.Core().Nodes().Get(string(kl.nodeName), opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this check should go next to the CreateMirrorPod call, instead of here?

Also, is there not a cached version of this? Doesn't the Kubelet know whether it is registered as a node with the API server?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this check should go next to the CreateMirrorPod call, instead of here?

It should be here. Because we are going to delete static pods for deleted nodes, not create them.

Also, is there not a cached version of this? Doesn't the Kubelet know whether it is registered as a node with the API server?

We cannot identify from the cached version whether the node has been deleted from API server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be here. Because we are going to delete static pods for deleted nodes, not create them.

I don't understand this comment. The pod has been deleted, it seems weird to lie that it hasn't. Furthermore, if syncpod is called again it will be recreated since this block will be skipped.

We cannot identify from the cached version whether the node has been deleted from API server.

Use kl.GetNode(). Also do we need to check the DeletionTimestamp?

/cc @yujuhong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tallclair Make sense. Will update it. Thanks.

@dixudx dixudx force-pushed the dont_recreate_static_pod_when_node_deleted branch from 92583fe to 46a194f Compare September 29, 2017 07:31
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 29, 2017
glog.V(3).Infof("Creating a mirror pod for static pod %q", format.Pod(pod))
if err := kl.podManager.CreateMirrorPod(pod); err != nil {
glog.Errorf("Failed creating a mirror pod for %q: %v", format.Pod(pod), err)
if _, err := kl.GetNode(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the deletion timestamp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should support the graceful deletion. Will add it. Thanks for pointing out.

if _, err := kl.GetNode(); err != nil {
glog.V(3).Infof("No need to create a mirror pod, since node %q has been removed from the cluster", kl.nodeName)
} else {
glog.V(3).Infof("Creating a mirror pod for static pod %q", format.Pod(pod))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this and the above message should be V(4)

@dixudx dixudx force-pushed the dont_recreate_static_pod_when_node_deleted branch from 46a194f to 53c7601 Compare September 30, 2017 07:16
@dixudx
Copy link
Member Author

dixudx commented Sep 30, 2017

ping @tallclair @yujuhong Updated. PTAL. Thanks

if err != nil {
glog.V(4).Infof("No need to create a mirror pod, since node %q has been removed from the cluster", kl.nodeName)
}
if err == nil && node.DeletionTimestamp == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this an else if rather than checking err twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I would move the deletion check to the above if, and just make this the else condition. I.e.

if err != nil || node.DeletionTimestamp != nil {
...
} else {
...
}

@dixudx dixudx force-pushed the dont_recreate_static_pod_when_node_deleted branch from 53c7601 to 32199cb Compare October 3, 2017 02:29
@dixudx
Copy link
Member Author

dixudx commented Oct 3, 2017

@tallclair Updated. PTAL. Thanks

@dixudx
Copy link
Member Author

dixudx commented Oct 3, 2017

/retest

@tallclair
Copy link
Member

/lgtm
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dixudx, tallclair

Associated issue: 48213

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 53135, 52512, 48339). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 5f64769 into kubernetes:master Oct 3, 2017
@dixudx dixudx deleted the dont_recreate_static_pod_when_node_deleted branch October 4, 2017 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Local pods stay around after node deletion
9 participants