-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark the volumes as detached when node does not exist #50239
Mark the volumes as detached when node does not exist #50239
Conversation
Hi @FengyunPan. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/assign @jingxu97 |
2715ef8
to
076d858
Compare
/unassign @justinsb |
076d858
to
82b87cf
Compare
/ok-to-test |
/test pull-kubernetes-e2e-kops-aws |
/approve |
/approve no-issue |
/lgtm |
@FengyunPan , could you follow the same pattern as GCE PD or AWS, to have the check in DiskIsAttached function? In that function, if instance is no longer exist, return false without error. |
@jingxu97 Ok, I will check it. |
/approve cancel @jingxu97 can you please remove the do-not-merge? |
@rootfs, I think the fix is when node is no longer exist (deleted by cloud provider), we can say the status of the volume is detached. Currently GCE PD and AWS have this logic. Is that the case for cinder? |
@jingxu97 there is a long list of volume status
|
pkg/volume/cinder/attacher.go
Outdated
// So mark them detached. | ||
if err == cloudprovider.InstanceNotFound { | ||
for _, spec := range specs { | ||
volumesAttachedCheck[spec] = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark it but don't detach it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot comment on line 216. Consider something like the following
for volumeID, attached := range attachedResult {
if !attached {
spec := volumeSpecMap[volumeID]
volumesAttachedCheck[spec] = false
glog.V(2).Infof("VolumesAreAttached: check volume %q (specName: %q) is no longer attached", volumeID, spec.Name())
} else
// volume attached, check if the instance is alive here
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need check if the instance is alive when volume is attached?
Can you tell me more detail message about it? Can you give me a example? Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the instance is alive is not sufficient: the cinder volume could be moved already to other nodes or in a different state.
We need both attachment info and node info to ensure that the cinder volume is not moved elsewhere and the node the volume is attached is dead, then we can safely detach the volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @rootfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rootfs |
Sorry, I can't understand. Can volume be attached to a not exist instance? I think it can't, so it is no need to check it in the case. |
82b87cf
to
8d0d995
Compare
pkg/volume/cinder/attacher.go
Outdated
@@ -185,7 +186,11 @@ func (attacher *cinderDiskAttacher) VolumesAreAttached(specs []*volume.Spec, nod | |||
} | |||
|
|||
instanceID, err := attacher.nodeInstanceID(nodeName) | |||
if err != nil { | |||
if err == cloudprovider.InstanceNotFound { | |||
glog.V(2).Infof("VolumesAreAttached: node %q does not exist.", nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print both nodeName and instanceID for better diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingxu97 should raise severity to glog.Warningf
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rootfs if instannce is deleted, openstack cloud provider can't get its instanceID, so attacher.nodeInstanceID() return ("", err). Printing nodename is enough.
Yeah, it should use glog.Warningf, thank you.
8d0d995
to
b011571
Compare
@anguslees - any thoughts on this? |
@@ -412,6 +412,14 @@ func (os *OpenStack) DiskIsAttached(instanceID, volumeID string) (bool, error) { | |||
return false, err | |||
} | |||
|
|||
if instanceID == "" { | |||
if volume.AttachedServerId != "" { | |||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the logic here is right. The function is checking whether the volume is attached to the instanced passed in as parameters. Here is the instanceId is empty, it means that the instance is deleted from the cloudprovider. If the volume.AttachedServerId is not empty, that means the volume is a attached to a different instance. But the function should return false because the volume is not attached to the original instance.
For most of the volume plugins, if the node no longer exist in cloud provider, we can safely assume the volume is detached from the node. But for cinder, could you please double check the behavior if you have the environment? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingxu97 IMO, if instanceID is empty, the DiskIsAttached() is checking whether the volume is attached. If the volume is attached, returning true means that the disk is attached, or returning false means that the disk is not attached.
But the function should return false because the volume is not attached to the original instance.
We can't get the instance id for the original instance which does not exist. If DiskIsAttached() return true when instance does not exist, that means the volume is attached to another instance and we should not mark the volume detached(as rootfs says). If DiskIsAttached() return false when instance does not exist, that means the volume is available, we can mark the volume detached. The code from line 207 already do it. And that also is my double check, I have tested it.
But for cinder, could you please double check the behavior if you have the environment?
I am sorry, I can't understand the meaning of your ‘double check’, can you tell more detail messages? thank you again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We can't get the instance id for the original instance which does not exist. If DiskIsAttached() return true when instance does not exist that means the volume is attached to another instance and we should not mark the volume detached"
Your description is not correct.
DiskIsAttached does not just check disk is attached to ANY node, it checks whether it is attached to the Node that you passed in that function. Suppose disk is currently attached to node B, the function check of DiskIsAttached (disk, nodeA) should return false because it is not attached to nodeA, it is attached to node B. Please let me know if this is clear.
Double check means that I want to confirm that node not found means volume is detached automatically. Seems like this is true for most of cloud provider. But I don't have the cinder volume setup to test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If DiskIsAttached() without my codes, DiskIsAttached() just check whether disk is attached to the instanceID that you passed in that function, and the instanceID can't be empty, right?
Now I add the codes into DiskIsAttached(), DiskIsAttached() can check whether disk is attached to the instanceID(not empty) that you passed in that function, also can check whether disk is attached when instanceID is empty.
Yes, volume will be detached when instance is not found in openstack cinder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that confused? Is it need to add another func to check whether disk is attached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are sure that if node not found in openstack cinder indicates that volume is already detached. Then I suggest we can the following changes
In cinder/attacher.go
if err == cloudprovider.InstanceNotFound {
//return false with nil error
...
}
You might have this change in the very first version. Sorry for the confusion.
@rootfs mentioned that cinder might have a more complicated status, and node not exist might not represent volume is already detached successfully from the node. If this is true, we might not return false directly. But I could not think a good way to check it. Passing an empty string for node instance seems not right because our goal is to check whether the volume is still attached to that specified node (not any other node). What does it mean if node id is empty for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingxu97 @rootfs Yeah, volume will be detached when instance is not found in openstack cinder. I am sorry for that I can't find the doc from openstack, but find the codes:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L899
nova will detach all the device from instance when user delete instance.
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L983
detach volume:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1387
If instance does not exist, nova will detach the volume too:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1389
So we can do: If the node instance is no longer exist (return cloudprovider.InstanceNotFound error), we can safely assume that volume is no longer attached to the node. So DiskIsAttached() return disk is not attached without error.
b011571
to
12ec68a
Compare
If node doesn't exist, OpenStack Nova will assume the volumes are not attached to it. So mark the volumes as detached and return false without error. Fix: kubernetes#50200
12ec68a
to
63725e3
Compare
Since we are using V2 block storage api, it is safe to say the attachment should be updated with a Nova instance id. This could change in v3 per info here. We'll revisit this issue when we pick up v3 block storage sdk @kubernetes/sig-openstack-pr-reviews |
@rootfs Yes, I agree. Cinder V3 is different. |
/approve @FengyunPan yes that's my idea too :) pass to @jingxu97 for lgtm |
@rootfs Yeah, thank you, I will focus on openstack cinder v3 later. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: FengyunPan, dims, rootfs Associated issue: 50200 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all Tests are more than 96 hours old. Re-running tests. |
Automatic merge from submit-queue (batch tested with PRs 38947, 50239, 51115, 51094, 51116) |
Automatic merge from submit-queue (batch tested with PRs 38947, 50239, 51115, 51094, 51116) Mark the volumes as detached when node does not exist If node does not exist, node's volumes will be detached automatically and become available. So mark them detached and do not return err. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # kubernetes#50200 **Release note**: ```release-note NONE ```
If node does not exist, node's volumes will be detached
automatically and become available. So mark them detached and do not return err.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes ##50200
Release note: