-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "add node shutdown taint" #59968
Conversation
No idea why the bot added so many people - sorry for the noise. |
/approve like i wrote in that PR @aleksandra-malinowska, right its bug. Here https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/nodelifecycle/node_lifecycle_controller.go#L685 here we ignore only cloudprovider.NotImplemented error. However, there can be other errors as well which should be ignored https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/util/node/controller_utils.go#L199 so lets revert this and modify that little bit |
possible fix zetaab@499b78a |
/retest |
@zetaab Thanks for quick response. Let's proceed with the revert then. |
/lgtm |
/approve |
cc @timothysc |
@smarterclayton - can you approve this? |
Only someone with higher cross directory OWNER privs can approve this one. |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aleksandra-malinowska, bsalamat, dims, gmarek, mikedanese, zetaab The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Possibly related to #59994? |
/test all [submit-queue is verifying that this PR is safe to merge] |
/retest Review the full test history for this PR. Silence the bot with an |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
Reverts #59323
Node becomes unready, but is never removed. I've found the following in kube-controller-manager.log from test run for one such node:
E0216 01:14:27.084923 1 node_lifecycle_controller.go:686] Error determining if node bootstrap-e2e-minion-group-01b1 shutdown in cloud: failed to get instance ID from cloud provider: instance not found
This goes on for the rest of the run (~6h). Looks like the node is stuck in Unready state because of this check: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/nodelifecycle/node_lifecycle_controller.go#L684. Previously, there was no such check and the node was removed.
Reverting as this would affect all users attempting to resize their node groups on GCE.