Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in detaching disks when node is powered-off but still in inventory in vsphere CP #63413

Conversation

w-leads
Copy link
Contributor

@w-leads w-leads commented May 3, 2018

What this PR does / why we need it:
vSphere cloud provider fails to detach disks from powered-off node when the node is still in vCenter inventory. Due to this behavior, the disk needs to be detached manually before powering it on.
This issue is caused because node information is removed from nodemanager when kubernetes detects powered-off node though the information is required when VCP attempts to detach disks afterward.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 3, 2018
@k8s-ci-robot k8s-ci-robot requested review from davidopp and a user May 3, 2018 23:09
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 3, 2018
@w-leads
Copy link
Contributor Author

w-leads commented May 4, 2018

/sig vmware

@k8s-ci-robot k8s-ci-robot added the area/provider/vmware Issues or PRs related to vmware provider label May 4, 2018
@dims
Copy link
Member

dims commented May 7, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 7, 2018
@w-leads
Copy link
Contributor Author

w-leads commented May 7, 2018

/retest

@w-leads
Copy link
Contributor Author

w-leads commented May 7, 2018

/assign @abrarshivani

glog.Infof("Node %q does not exist, disk %s is already detached from node.", convertToString(nodeName), volPath)
if err == vclib.ErrNoVMFound {
// If node doesn't exist, discover the node to check if node is still in vcenter inventory
node, err := vs.nodeManager.GetRemovedNode(nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will not work if controller-manager restarts since this information will vanish. Can you please confirm this?

@@ -45,10 +45,12 @@ type NodeManager struct {
nodeInfoMap map[string]*NodeInfo
// Maps node name to node structure
registeredNodes map[string]*v1.Node
removedNodes map[string]*v1.Node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node information will still stay even if all the volumes that are attached by the cluster are detached. I would suggest to remove this information when not needed.

@w-leads
Copy link
Contributor Author

w-leads commented May 11, 2018

Hi, @abrarshivani
Thank you so much for reviewing my code!

To solve these 2 issues you mentioned, I'm thinking to take a different approach instead like

  • Add/Remove attached disk information in VSphere.AttachDisk, DetachDisk, DisksAreAttached
  • The information includes node name and volume path, and is stored in nodemanager
  • Try to detach disks using the disk info if any, when VSphere.NodeDeleted is called

Would you tell me how do you feel about this approach?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 11, 2018
@w-leads
Copy link
Contributor Author

w-leads commented May 11, 2018

@abrarshivani Would you please take another look at code?
I reverted previous commit, and took the different approach by keeping attached disk information instead of removed node info.
To solve controller-manager restart case, the attached disks information are updated in VSphere.DisksAreAttached which is called almost every minute.
Thank you,

disks, err := vs.nodeManager.GetAttachedDisks(convertToK8sType(node.ObjectMeta.Name))
if err == nil {
for _, disk := range disks {
if err := vs.DetachDisk(disk, convertToK8sType(node.ObjectMeta.Name)); err != nil {
Copy link
Contributor

@abrarshivani abrarshivani May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here DetachDisk won't be retried if it fails (example: In case of network issue to vCenter) and still the volume will be attached to power off node. When kubernetes will retry to detach disk from the deleted node it will fail since node will be unregistered here.

@abrarshivani
Copy link
Contributor

abrarshivani commented May 11, 2018

@w-leads Thanks for fixing this. I think the right approach would be to report NodeName as Virtual Machine UUID. This way vSphere Cloud Provider will not have any need to maintain the mapping of nodeName to v1.Node or keep the state of attached and detached disk.

@w-leads
Copy link
Contributor Author

w-leads commented May 11, 2018

@abrarshivani Thanks a lot for reviewing again!

So what's your saying is following, is my understanding correct?

  • Register mapping of nodeName to VM UUID(or maybe NodeInfo struct?), as done in 1st commit but instead of nodeName
  • When k8s tries to detach disks from powered-off node and gets some errors, detach disks using that info instead
  • Compare nodeVolumes and the info in VSphere.DisksAreAttached, and remove the info if there's a node which is not in nodeVolumes. Or maybe other better way to do this...?

In this case, I have no idea to get the info again in case that controller-manager is restarted after the node is powered-off. This is because only VSphere.NodeAdded, NodeDeleted receive v1.Node which contains UUID. Please correct me if I'm wrong

@abrarshivani
Copy link
Contributor

@w-leads Currently, vSphere Cloud Provider uses *v1.Node to get VM UUID and discover the nodes with the uuid. kubelet (if provided cloud-provider as vsphere) gets nodeName from vSphere Cloud Provider. If vsphere.CurrentNodeName is updated to return Virtual Machine UUID instead of hostname than nodeName will be reported as VM UUID. In this case, there is no need to store *v1.Node. Also, in case of DetachDisk if node is powered off and unregistered from nodeManager, node can still be discovered with UUID.

@w-leads
Copy link
Contributor Author

w-leads commented May 14, 2018

@abrarshivani Thanks for the info. That makes sense. Code will be more simple with vm UUID.
I suppose that the change would affect many functions and this pull request would get much bigger. So I'm wondering if we could fix this bug by keeping attached disk information for now? And once VCP is modified to use vm UUID for nodeName instead of hostname, then we can update the change in this pull request in appropriate way too.

@w-leads w-leads force-pushed the bugfix/vsphere-detach-from-removed-node branch 3 times, most recently from 5ec4340 to 9c3b9cc Compare May 14, 2018 23:01
@w-leads
Copy link
Contributor Author

w-leads commented May 14, 2018

@abrarshivani I made a commit and now detaching disks from removed node will be retried when it fails. If this attached-disks-info method is okay for now, would you take another look please?

@abrarshivani
Copy link
Contributor

@w-leads Sounds good to me. Will take another look. Can you please create an issue to track Virtual Machine UUID as Nodename approach?

if len(nm.attachedDisks[convertToString(nodeName)]) == 0 {
delete(nm.attachedDisks, convertToString(nodeName))
}
nm.attachedDisksLock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to change this to
defer nm.attachedDisksLock.Unlock()

func (nm *NodeManager) GetAttachedDisks(nodeName k8stypes.NodeName) []string {
nm.attachedDisksLock.Lock()
attachedDisks := nm.attachedDisks[convertToString(nodeName)]
nm.attachedDisksLock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think attachedDisks needs to be protected with locks. Since this is the same object which will be accessed by RegisterAttachedDisk and UnRegisterAttachedDisk

if err != nil {
return nil, err
}
func (vs *VSphere) convertVolPathsToDevicePaths(ctx context.Context, nodeName k8stypes.NodeName, volPaths []string) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we go with the earlier way where we accept nodeVolumes here? ErrVMNotFound error can be handled here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this change is necessary.
The function before this change received volumes on all nodes, and if any of nodes was not found, the function and VSphere.DisksAreAttached returned error. However attached-disks information needs to be re-registered in DisksAreAttached in case of controller manager restart. That's why I changed this func to handle volumes on a single node so that DisksAreAttached could just ignore removed nodes instead of returning error before reregistration.
Also, DisksAreAttached has a comment which says "// 5b. If VMs are removed from vSphere inventory they are ignored." So I suppose DisksAreAttached shouldn't give error at this point.

w-leads added 9 commits July 23, 2018 14:32
VCP fails to detach disk from powered-off node which is still in
vcenter inventory. That's caused because VCP tries to detach disks
after nodemanager has unregistered the node information.
This reverts commit 680ab089ce491f27c3602202761d2dd2625bec90.
This reverts commit 0738258143024a28917c4626f6474ffbee552a55.
This reverts commit eea9efdd61a25c3e74dbc54e11aae711ca499c85.
VCP fails to detach disk from powered-off node which is still in
vcenter inventory. That's caused because VCP tries to detach disks
after nodemanager has unregistered the node information.
@w-leads w-leads force-pushed the bugfix/vsphere-detach-from-removed-node branch from 673f44d to 7e323cf Compare July 23, 2018 21:43
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: w-leads
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: abrarshivani

If they are not already assigned, you can assign the PR to them by writing /assign @abrarshivani in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@w-leads: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2018
@@ -233,6 +236,19 @@ func init() {

// Initialize passes a Kubernetes clientBuilder interface to the cloud provider
func (vs *VSphere) Initialize(clientBuilder controller.ControllerClientBuilder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method getting invoked in Kubelet? If no, then we should fail the initialization if vs.informerFactory is not set.

Can you confirm if Initialize is invoked after SetInformers?

You are also introducing a vsphere-cloud-provider service account, wouldn't this break Kubernetes upgrade? How do you plan to address upgrade?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have to install corresponding rbac policies as defaults.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not get called from kubelet. Also Initialize is called after SetInformers, so I think we should be safe. Having said that, I think the code should work if it can't find configmap as well and revert to old behaviour when configmap isn't found.

return nil, err
}
if configMap.Data == nil {
nm.vmUUIDConfigMapCache = make(map[string]string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving configmap read and write permissions to vSphere Cloud Provider after launching k8s cluster will fail the volume operations. This happens due to node events are occurred before giving permissions and nodeName and vmUUID mapping is not cached at that time. Can you fix this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing the RBAC permissions via bootstrap policies should fix that right?

@gnufied
Copy link
Member

gnufied commented Oct 10, 2018

@w-leads does this PR really needs to be against master because IIRC master isn't affected by this bug.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 7, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/vmware Issues or PRs related to vmware provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants