Fix for detach volume when node is not present/ powered off #40118

BaluDontu · 2017-01-19T00:45:24Z

Fixes #33061
When a vm is reported as no longer present in cloud provider and is deleted by node controller, there are no attempts to detach respective volumes. For example, if a VM is powered off or paused, and pods are migrated to other nodes. In the case of vSphere, the VM cannot be started again because the VM still holds mount points to volumes that are now mounted to other VMs.

In order to re-join this node again, you will have to manually detach these volumes from the powered off vm before starting it.

The current fix will make sure the mount points are deleted when the VM is powered off. Since all the mount points are deleted, the VM can be powered on again.

This is a workaround proposal only. I still don't see the kubernetes issuing a detach request to the vsphere cloud provider which should be the case. (Details in original issue #33061 )

@luomiao @kerneltime @pdhamdhere @jingxu97 @saad-ali

k8s-ci-robot · 2017-01-19T00:45:25Z

Hi @BaluDontu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-reviewable · 2017-01-19T00:45:30Z

This change is

kerneltime · 2017-01-19T06:38:49Z

@k8s-bot ok to test

lsjostro · 2017-01-19T07:53:05Z

🏆

jingxu97 · 2017-01-19T18:29:02Z

pkg/cloudprovider/providers/vsphere/vsphere.go

+			if err = vs.DetachDisk(volPath, nodeName); err != nil {
+				glog.V(1).Infof("Error deleting vsphere volume %s: %v", volPath, err)
+				return attached, err
+			}


I think even if some detachDisk fail, it should continue to detach others.

Actually since this function is only for checking the status, maybe we should put detachdisk function inside of it. When node node no longer exists, we should return an error message instead of returning false indicating volume is already detached.

After this change, when node no longer exists due to powering off, the checking DiskAreAttached will not mark volumes as detached, then a detach request will be issued by the volume controller. Before the reason detach is not issued is because this function return false and controller will mark the volume is already detached.

agreed, detach should be issued to all disks prior to returning.

jingxu97 · 2017-01-19T18:35:08Z

pkg/cloudprovider/providers/vsphere/vsphere.go

 			vSphereInstance,
 			volPath)
+		if err = vs.DetachDisk(volPath, nodeName); err != nil {


Actually since this function is only for checking the status, maybe we should put detachdisk function inside of it. When node node no longer exists, we should return an error message instead of returning false indicating volume is already detached.

BaluDontu · 2017-01-20T22:48:43Z

@jingxu97 : Can you please review the latest changes. Thanks!

jingxu97 · 2017-01-20T23:37:40Z

pkg/cloudprovider/providers/vsphere/vsphere.go

-		return false, nil
+		glog.Errorf("Node %q does not exist. DiskIsAttached will throw an error as node doesn't exist.",
+			vSphereInstance)
+		return false, errors.New(fmt.Sprintf("Node %q does not exist. DiskIsAttached will throw an error as node doesn't exist.",


the error log could be " DiskIsAttached failed to determine whether disk is still attached: node %q does not exist.

dagnello · 2017-01-24T00:26:13Z

@BaluDontu lgtm. This change as is does not fully resolve #33061, k8s flow should be calling down to do detach in cloud provider when node is not present.

jingxu97 · 2017-01-24T01:26:49Z

@k8s-bot verify test this

BaluDontu · 2017-01-24T20:17:38Z

@dagnello : It is k8s flow which actually makes the detach request. I am not explicitly making any detach calls in cloud provider. Please see the latest code out for review.

kerneltime · 2017-01-24T20:36:42Z

/approve

kerneltime · 2017-01-24T20:38:54Z

@k8s-bot verify test this

kerneltime · 2017-01-25T05:25:22Z

@k8s-bot verify test this

jingxu97 · 2017-02-06T16:36:31Z

pkg/cloudprovider/providers/vsphere/vsphere.go

+		glog.Errorf("DiskIsAttached failed to determine whether disk %q is still attached: node %q does not exist",
+			volPath,
+			vSphereInstance)
+		return false, errors.New(fmt.Sprintf("DiskIsAttached failed to determine whether disk %q is still attached: node %q does not exist",


I think you can use fmt.Errorf instead of errors.New(fmt.Sprintf...

k8s-github-robot · 2017-02-07T21:37:44Z

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: BaluDontu, kerneltime

Needs approval from an approver in each of these OWNERS Files:

~~pkg/cloudprovider/providers/vsphere/OWNERS~~ [kerneltime]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

BaluDontu · 2017-02-07T22:57:15Z

@k8s-bot kops aws e2e test this

luomiao · 2017-02-07T23:27:58Z

/lgtm

kerneltime · 2017-02-08T01:41:55Z

@k8s-bot kops aws e2e test this

luomiao · 2017-02-08T06:19:42Z

/release-note-none

jingxu97 · 2017-02-09T22:46:18Z

@k8s-bot test this

k8s-github-robot · 2017-02-10T00:44:39Z

Automatic merge from submit-queue (batch tested with PRs 41037, 40118, 40959, 41084, 41092)

kerneltime · 2017-02-10T04:25:58Z

Thanks @jingxu97

…-k8s-release-1.4 Automated cherry pick of #40118

…-k8s-release-1.5 Automatic merge from submit-queue Automated cherry pick of #40118 Cherry pick of #40118 on release-1.5. #40118: Fix for detach volume when node is not present/ powered off

k8s-cherrypick-bot · 2017-02-11T05:13:37Z

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 19, 2017

BaluDontu mentioned this pull request Jan 19, 2017

vSphere Cloud provider: Volume detach is not issued when VM is not present kubernetes#33061 vmware-archive/kubernetes-archived#40

Closed

k8s-github-robot assigned mikedanese Jan 19, 2017

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jan 19, 2017

kerneltime mentioned this pull request Jan 19, 2017

vSphere Cloud provider: Volume detach is not issued when VM is not present #33061

Closed

jingxu97 reviewed Jan 19, 2017

View reviewed changes

BaluDontu force-pushed the FixdetachVolumeOnNodeOff branch from b3f0d33 to a830e92 Compare January 20, 2017 22:39

jingxu97 reviewed Jan 20, 2017

View reviewed changes

mikedanese assigned jingxu97 and unassigned mikedanese Jan 20, 2017

BaluDontu force-pushed the FixdetachVolumeOnNodeOff branch from a830e92 to 9653f4a Compare January 23, 2017 21:16

jingxu97 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2017

k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jan 24, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2017

kerneltime approved these changes Jan 25, 2017

View reviewed changes

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 25, 2017

jingxu97 reviewed Feb 6, 2017

View reviewed changes

Fix for detach volume when node is not present/ powered off

9fe3cf9

BaluDontu force-pushed the FixdetachVolumeOnNodeOff branch from 9653f4a to 9fe3cf9 Compare February 7, 2017 21:37

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2017

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Feb 8, 2017

jingxu97 removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Feb 9, 2017

k8s-github-robot merged commit 052f3b9 into kubernetes:master Feb 10, 2017

This was referenced Feb 10, 2017

Automated cherry pick of #40118 #41226

Merged

Automated cherry pick of #40118 #41227

Merged

jessfraz added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 10, 2017

jessfraz added a commit that referenced this pull request Feb 10, 2017

Merge pull request #41227 from vmware/automated-cherry-pick-of-#40118…

c27bd82

…-k8s-release-1.4 Automated cherry pick of #40118

saad-ali added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate labels Feb 11, 2017

saad-ali added this to the v1.5 milestone Feb 11, 2017

k8s-cherrypick-bot removed the cherrypick-candidate label Feb 11, 2017

BaluDontu mentioned this pull request Aug 14, 2017

Mark volume as detached when node does not exist for vsphere #50281

Merged

BaluDontu deleted the FixdetachVolumeOnNodeOff branch September 5, 2017 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for detach volume when node is not present/ powered off #40118

Fix for detach volume when node is not present/ powered off #40118

BaluDontu commented Jan 19, 2017 •

edited

Loading

k8s-ci-robot commented Jan 19, 2017

k8s-reviewable commented Jan 19, 2017

kerneltime commented Jan 19, 2017

lsjostro commented Jan 19, 2017

jingxu97 Jan 19, 2017

jingxu97 Jan 19, 2017 •

edited

Loading

dagnello Jan 19, 2017

jingxu97 Jan 19, 2017

BaluDontu commented Jan 20, 2017

jingxu97 Jan 20, 2017

dagnello commented Jan 24, 2017

jingxu97 commented Jan 24, 2017

BaluDontu commented Jan 24, 2017

kerneltime commented Jan 24, 2017

kerneltime commented Jan 24, 2017

kerneltime commented Jan 25, 2017

jingxu97 Feb 6, 2017

k8s-github-robot commented Feb 7, 2017

BaluDontu commented Feb 7, 2017

luomiao commented Feb 7, 2017

kerneltime commented Feb 8, 2017

luomiao commented Feb 8, 2017

jingxu97 commented Feb 9, 2017

k8s-github-robot commented Feb 10, 2017

kerneltime commented Feb 10, 2017

k8s-cherrypick-bot commented Feb 11, 2017

Fix for detach volume when node is not present/ powered off #40118

Fix for detach volume when node is not present/ powered off #40118

Conversation

BaluDontu commented Jan 19, 2017 • edited Loading

k8s-ci-robot commented Jan 19, 2017

k8s-reviewable commented Jan 19, 2017

kerneltime commented Jan 19, 2017

lsjostro commented Jan 19, 2017

jingxu97 Jan 19, 2017

Choose a reason for hiding this comment

jingxu97 Jan 19, 2017 • edited Loading

Choose a reason for hiding this comment

dagnello Jan 19, 2017

Choose a reason for hiding this comment

jingxu97 Jan 19, 2017

Choose a reason for hiding this comment

BaluDontu commented Jan 20, 2017

jingxu97 Jan 20, 2017

Choose a reason for hiding this comment

dagnello commented Jan 24, 2017

jingxu97 commented Jan 24, 2017

BaluDontu commented Jan 24, 2017

kerneltime commented Jan 24, 2017

kerneltime commented Jan 24, 2017

kerneltime commented Jan 25, 2017

jingxu97 Feb 6, 2017

Choose a reason for hiding this comment

k8s-github-robot commented Feb 7, 2017

BaluDontu commented Feb 7, 2017

luomiao commented Feb 7, 2017

kerneltime commented Feb 8, 2017

luomiao commented Feb 8, 2017

jingxu97 commented Feb 9, 2017

k8s-github-robot commented Feb 10, 2017

kerneltime commented Feb 10, 2017

k8s-cherrypick-bot commented Feb 11, 2017

BaluDontu commented Jan 19, 2017 •

edited

Loading

jingxu97 Jan 19, 2017 •

edited

Loading