-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for detach volume when node is not present/ powered off #40118
Fix for detach volume when node is not present/ powered off #40118
Conversation
Hi @BaluDontu. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@k8s-bot ok to test |
🏆 |
if err = vs.DetachDisk(volPath, nodeName); err != nil { | ||
glog.V(1).Infof("Error deleting vsphere volume %s: %v", volPath, err) | ||
return attached, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even if some detachDisk fail, it should continue to detach others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually since this function is only for checking the status, maybe we should put detachdisk function inside of it. When node node no longer exists, we should return an error message instead of returning false indicating volume is already detached.
After this change, when node no longer exists due to powering off, the checking DiskAreAttached will not mark volumes as detached, then a detach request will be issued by the volume controller. Before the reason detach is not issued is because this function return false and controller will mark the volume is already detached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, detach should be issued to all disks prior to returning.
vSphereInstance, | ||
volPath) | ||
if err = vs.DetachDisk(volPath, nodeName); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually since this function is only for checking the status, maybe we should put detachdisk function inside of it. When node node no longer exists, we should return an error message instead of returning false indicating volume is already detached.
b3f0d33
to
a830e92
Compare
@jingxu97 : Can you please review the latest changes. Thanks! |
return false, nil | ||
glog.Errorf("Node %q does not exist. DiskIsAttached will throw an error as node doesn't exist.", | ||
vSphereInstance) | ||
return false, errors.New(fmt.Sprintf("Node %q does not exist. DiskIsAttached will throw an error as node doesn't exist.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error log could be " DiskIsAttached failed to determine whether disk is still attached: node %q does not exist.
a830e92
to
9653f4a
Compare
@BaluDontu lgtm. This change as is does not fully resolve #33061, k8s flow should be calling down to do detach in cloud provider when node is not present. |
@k8s-bot verify test this |
@dagnello : It is k8s flow which actually makes the detach request. I am not explicitly making any detach calls in cloud provider. Please see the latest code out for review. |
/approve |
@k8s-bot verify test this |
@k8s-bot verify test this |
glog.Errorf("DiskIsAttached failed to determine whether disk %q is still attached: node %q does not exist", | ||
volPath, | ||
vSphereInstance) | ||
return false, errors.New(fmt.Sprintf("DiskIsAttached failed to determine whether disk %q is still attached: node %q does not exist", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use fmt.Errorf instead of errors.New(fmt.Sprintf...
9653f4a
to
9fe3cf9
Compare
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: BaluDontu, kerneltime Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@k8s-bot kops aws e2e test this |
/lgtm |
@k8s-bot kops aws e2e test this |
/release-note-none |
@k8s-bot test this |
Automatic merge from submit-queue (batch tested with PRs 41037, 40118, 40959, 41084, 41092) |
Thanks @jingxu97 |
…-k8s-release-1.4 Automated cherry pick of #40118
Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Fixes #33061
When a vm is reported as no longer present in cloud provider and is deleted by node controller, there are no attempts to detach respective volumes. For example, if a VM is powered off or paused, and pods are migrated to other nodes. In the case of vSphere, the VM cannot be started again because the VM still holds mount points to volumes that are now mounted to other VMs.
In order to re-join this node again, you will have to manually detach these volumes from the powered off vm before starting it.
The current fix will make sure the mount points are deleted when the VM is powered off. Since all the mount points are deleted, the VM can be powered on again.
This is a workaround proposal only. I still don't see the kubernetes issuing a detach request to the vsphere cloud provider which should be the case. (Details in original issue #33061 )
@luomiao @kerneltime @pdhamdhere @jingxu97 @saad-ali