-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have Cinder mount tasks use new kubelet functions #49786
Conversation
Hi @jamiehannaford. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/sig openstack |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jamiehannaford Assign the PR to them by writing Associated issue: 47548 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/ok-to-test |
|
@msau42 My understanding (taken from here) is that:
The /dev/vdX disk is symlinked in all the usual places in /dev/disk/*. So to all intents and purposes, the kubelet will see it as a dir I think. If /cc @coreypobrien |
@jamiehannaford: GitHub didn't allow me to request PR reviews from the following users: coreypobrien. Note that only kubernetes members can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jamiehannaford: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Does /dev/sdX already have the filesystem mounted, or do you need to do an additional mount for the filesystem? Other volume plugins generally have a flow like:
|
@msau42 AFAIK the /dev/sdX is already there on the compute host and looks like another block device (this is handled by Nova and doesn't concern kubernetes). The flow you described is how the cinder plugin works too. KVM mounts the disk to the VM under /dev/X, and then kubelet binds the symlinks in /dev/disk/by-id to the pod's mount path. I'm not too familar with the "attacher's instructed global mount path" so I'm not sure about the second step |
I think I'm missing the step between /dev/X, which is a block device, and getting to some /path, which is the filesystem mount. |
@msau42 I was under the assumption that the Cinder plugin handled that step: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/cinder/attacher.go#L264 |
Ah ok, so then I don't think you should be hitting the issue with bind mounting directories, because this should be doing a real mount of the filesystem from the disk. |
@msau42 We're definitely running into issues with If you reboot a node and a pod is rescheduled to a different node, the volume manager applies the attach/detach logic twice to the new node (this is a separate bug tracked in #49503). This means that in quick succession, there will be different device paths (for example The kubelet however never updates to |
|
/sig storage Maybe some cinder experts here may understand better than me. |
@jungleboyj @stmcginnis Can you please take a look? |
/cc @jsafrane @anguslees |
@msau42 is correct that /dev/vd* are device paths, not mount points. The expected flow described above appears to be correct from my understanding. I think you would need to use IsNotMountPoint, but I do not have the full background on these commands, so that is just based on my <5 minute read through of the history here. |
To echo @stmcginnis the /dev/vdX paths are the raw block devices. Which means they should always end up with a true Looking at https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/cinder/attacher.go#L264 and assuming that the deviceMountPath passed into the IsNotMountPoint method is coming from I'd be curious to see exactly what paths are being used, and what the system looks like when this bug occurs. It might be that this change is "fixing" the issue by covering up some other underlying problem. |
also to clarify, the /dev/vdX is the internal libvirt device (device seen from the Instance). The attached device will infact be assigned to a new /dev/sdX device on the compute node. You can verify/coorelate these thing by looking in /dev/disk/by-path on your compute node. The device files when attached (in disk/by-path) will look like this for LVM: Note that the target iqn is set in your cinder.conf with a default: iqn.2010-10.org.openstack We then form the rest of the string above witht he tgt-ip and the volume-. The initiator iqn is going to come from /etc/iscsi/initiatorname.iscsi on the compute node. |
@stmcginnis @patrick-east Apologies for the confusion. I meant to say that the kubelet doesn't seem to be recognizing the mount paths under To rephrase the problem. This is what's typically happening:
Here are the full logs of what we're seeing. See the You'll notice some discrepencies between what's actually mounted by Nova (vdf) and what the kubelet is pointing to (vdc):
which made me think it's not handling unmounts correctly, possibly due to not handling the "is a mount" logic well for Cinder vols. |
One more thing... the "dev/vdX" is completely arbitary, it can't be forced, or controlled. Libvirt will always just grab the next letter available (a, b, c, d....) in the Instance. This is an old annoyance due to which we typcialy discourage using the device specification in the attach argument altogether. You can put whatever you like there, but Nova/Libvirt will always just grab the next available path and use it. |
Could the double attach be causing the initial mount to go stale, and then |
Seems the main discussion point is Nova's 2 attach calls via Attach_detach controller. FYI: From the following logs, the kubelet worker1 mounts Cinder disk "0328a7b2-4b0a-4e58-a396-b87247d51b18" under the deviceMountPathdir(*1) before reboot, and the kubelet worker0 mounts the same cinder disk under same deviceMountPathdir after reboot of kubelet worker1. This seems normal behavior of kubelet. (*1) deviceMountPath: /var/lib/kubelet/plugins/kubernetes.io/cinder/mounts/0328a7b2-4b0a-4e58-a396-b87247d51b18
The persistent volume pvc-1514cd73-7064-11e7-8eca-fa163e1096d9 has volumeID parameter, and the value is 0328a7b2-4b0a-4e58-a396-b87247d51b18. Thus the mount path is "/var/lib/kubelet/plugins/kubernetes.io/cinder/mounts/" + volumeID.
|
@msau42 Yeah, that's what I'm suspecting too. The initial mount goes stale (becomes RO), but the kubelet does not update its mount to point to the newest device path. So We thought this might be happening because when https://github.com/kubernetes/kubernetes/blob/v1.6.7/pkg/volume/cinder/cinder_util.go#L74-L93 Also, if you look at the logic for that
If you look at a healthy Cinder volume, the device doesn't match the parent dir either:
Is this expected behaviour? Due to the buggy logic in volume manager, we know there's a high chance of race conditions. So I'm wondering whether the detach calls are not properly deleting the |
Yes, it is definitely expected that the mount point device not match the parent. It sounds like the real underlying issue is the 2nd attach causing the 1st mount to go stale. Can we detect if the volume has already been attached and not trigger a second attach? |
I think the stale mount can be avoided if the Cinder plugin could detect that the volume is already attached to this node. |
@msau42 Yeah, looks like AWS plugin does exactly that: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1595 Another problem area is that the OpenStack plugin's AttachDisk tries to detach the disk from other nodes if it thinks the state is wrong: We should remove that because that logic should be delegated to the volume manager and is probably contributing to the race conditions. |
What this PR does / why we need it:
Updates Cinder mount and unmount tasks to use new kubelet checks introduced in #48402.
Since the current implementation relies on the old
IsLikelyNotMountPoint
, we're seeing a lot of problems because kubelet is ignoring existing mounts (it thinks they're not mount points) and therefore failing to unmount/remount them. This means when nodes are rebooted, pods with PVs enter a crash loop because they have the wrong device paths.I also added more logging to help with debugging mount/unmount.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged):Related to #47548 and #49503.
Release note: