Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for detach volume when node is not present/ powered off #40118

Merged

Conversation

BaluDontu
Copy link
Contributor

@BaluDontu BaluDontu commented Jan 19, 2017

Fixes #33061
When a vm is reported as no longer present in cloud provider and is deleted by node controller, there are no attempts to detach respective volumes. For example, if a VM is powered off or paused, and pods are migrated to other nodes. In the case of vSphere, the VM cannot be started again because the VM still holds mount points to volumes that are now mounted to other VMs.

In order to re-join this node again, you will have to manually detach these volumes from the powered off vm before starting it.

The current fix will make sure the mount points are deleted when the VM is powered off. Since all the mount points are deleted, the VM can be powered on again.

This is a workaround proposal only. I still don't see the kubernetes issuing a detach request to the vsphere cloud provider which should be the case. (Details in original issue #33061 )

@luomiao @kerneltime @pdhamdhere @jingxu97 @saad-ali

@k8s-ci-robot
Copy link
Contributor

Hi @BaluDontu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 19, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@kerneltime
Copy link

@k8s-bot ok to test

@lsjostro
Copy link

🏆

if err = vs.DetachDisk(volPath, nodeName); err != nil {
glog.V(1).Infof("Error deleting vsphere volume %s: %v", volPath, err)
return attached, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even if some detachDisk fail, it should continue to detach others.

Copy link
Contributor

@jingxu97 jingxu97 Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually since this function is only for checking the status, maybe we should put detachdisk function inside of it. When node node no longer exists, we should return an error message instead of returning false indicating volume is already detached.

After this change, when node no longer exists due to powering off, the checking DiskAreAttached will not mark volumes as detached, then a detach request will be issued by the volume controller. Before the reason detach is not issued is because this function return false and controller will mark the volume is already detached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, detach should be issued to all disks prior to returning.

vSphereInstance,
volPath)
if err = vs.DetachDisk(volPath, nodeName); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually since this function is only for checking the status, maybe we should put detachdisk function inside of it. When node node no longer exists, we should return an error message instead of returning false indicating volume is already detached.

@BaluDontu BaluDontu force-pushed the FixdetachVolumeOnNodeOff branch from b3f0d33 to a830e92 Compare January 20, 2017 22:39
@BaluDontu
Copy link
Contributor Author

@jingxu97 : Can you please review the latest changes. Thanks!

return false, nil
glog.Errorf("Node %q does not exist. DiskIsAttached will throw an error as node doesn't exist.",
vSphereInstance)
return false, errors.New(fmt.Sprintf("Node %q does not exist. DiskIsAttached will throw an error as node doesn't exist.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error log could be " DiskIsAttached failed to determine whether disk is still attached: node %q does not exist.

@mikedanese mikedanese assigned jingxu97 and unassigned mikedanese Jan 20, 2017
@BaluDontu BaluDontu force-pushed the FixdetachVolumeOnNodeOff branch from a830e92 to 9653f4a Compare January 23, 2017 21:16
@dagnello
Copy link
Contributor

@BaluDontu lgtm. This change as is does not fully resolve #33061, k8s flow should be calling down to do detach in cloud provider when node is not present.

@jingxu97 jingxu97 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2017
@jingxu97
Copy link
Contributor

@k8s-bot verify test this

@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jan 24, 2017
@BaluDontu
Copy link
Contributor Author

@dagnello : It is k8s flow which actually makes the detach request. I am not explicitly making any detach calls in cloud provider. Please see the latest code out for review.

@kerneltime
Copy link

/approve

@kerneltime
Copy link

@k8s-bot verify test this

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2017
@kerneltime
Copy link

@k8s-bot verify test this

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 25, 2017
glog.Errorf("DiskIsAttached failed to determine whether disk %q is still attached: node %q does not exist",
volPath,
vSphereInstance)
return false, errors.New(fmt.Sprintf("DiskIsAttached failed to determine whether disk %q is still attached: node %q does not exist",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use fmt.Errorf instead of errors.New(fmt.Sprintf...

@BaluDontu BaluDontu force-pushed the FixdetachVolumeOnNodeOff branch from 9653f4a to 9fe3cf9 Compare February 7, 2017 21:37
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: BaluDontu, kerneltime

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2017
@BaluDontu
Copy link
Contributor Author

@k8s-bot kops aws e2e test this

@luomiao
Copy link

luomiao commented Feb 7, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2017
@kerneltime
Copy link

@k8s-bot kops aws e2e test this

@luomiao
Copy link

luomiao commented Feb 8, 2017

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Feb 8, 2017
@jingxu97 jingxu97 removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Feb 9, 2017
@jingxu97
Copy link
Contributor

jingxu97 commented Feb 9, 2017

@k8s-bot test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 41037, 40118, 40959, 41084, 41092)

@k8s-github-robot k8s-github-robot merged commit 052f3b9 into kubernetes:master Feb 10, 2017
@kerneltime
Copy link

Thanks @jingxu97

@jessfraz jessfraz added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 10, 2017
jessfraz added a commit that referenced this pull request Feb 10, 2017
@saad-ali saad-ali added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate labels Feb 11, 2017
@saad-ali saad-ali added this to the v1.5 milestone Feb 11, 2017
k8s-github-robot pushed a commit that referenced this pull request Feb 11, 2017
…-k8s-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #40118

Cherry pick of #40118 on release-1.5.

#40118: Fix for detach volume when node is not present/ powered off
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vSphere Cloud provider: Volume detach is not issued when VM is not present