Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race between MountVolume and UnmountDevice #71074

Merged
merged 1 commit into from
Nov 17, 2018

Conversation

jsafrane
Copy link
Member

@jsafrane jsafrane commented Nov 15, 2018

When kubelet receives a new pod while UnmountDevice is in progress, it should not enqueue
MountVolume operation - the volume device is being unmounted right now.

In addition, don't clear devicePath after UnmountDevice, because subsequent MountDevice may need it.

/kind bug
/sig storage
Fixes #65246

Does this PR introduce a user-facing change?:

Fixed kubelet reporting "resource name may not be empty" when mounting a volume very quickly after unmount.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Nov 15, 2018
@jsafrane
Copy link
Member Author

@msau42 @gnufied @jingxu97 @saad-ali @pohly PTAL.

@jsafrane
Copy link
Member Author

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 15, 2018
@@ -166,6 +166,10 @@ func (rc *reconciler) reconcile() {
// Ensure volumes that should be unmounted are unmounted.
for _, mountedVolume := range rc.actualStateOfWorld.GetMountedVolumes() {
if !rc.desiredStateOfWorld.PodExistsInVolume(mountedVolume.PodName, mountedVolume.VolumeName) {
if rc.operationExecutor.IsOperationPending(mountedVolume.VolumeName, nestedpendingoperations.EmptyUniquePodName) {
klog.V(5).Infof("Skipping UnmountVolume, device operation is in progress")
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the check that the executer will do internally? See here:

func (grm *nestedPendingOperations) Run(
volumeName v1.UniqueVolumeName,
podName types.UniquePodName,
generatedOperations types.GeneratedOperations) error {
grm.lock.Lock()
defer grm.lock.Unlock()
opExists, previousOpIndex := grm.isOperationExists(volumeName, podName)
if opExists {
previousOp := grm.operations[previousOpIndex]
// Operation already exists
if previousOp.operationPending {
// Operation is pending
operationName := getOperationName(volumeName, podName)
return NewAlreadyExistsError(operationName)
}

When I asked about it in #70319 and later in a CSI WG meeting, @saad-ali said that the executioner will prevent running multiple operations against the same volume in parallel.

It wasn't obvious how that works and I was still worried that the executioner would merely serialize operations. But after looking at the code above, my interpretation is that it will discard new operations instead of queuing them, so it should do something similar to what you propose here.

But I am not entirely sure - perhaps your check is stricter, or I misunderstood something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is podName - (Un)MountDevice operation uses empty pod name, it's global for all pods, while (Un)MountVolume uses pod names. IMO, (volumeName, podName) is the primary key in the operation map, so multiple MountVolumes can run in parallel on the same volume.

@jsafrane
Copy link
Member Author

/retest

@pohly
Copy link
Contributor

pohly commented Nov 15, 2018 via email

volumeObj.devicePath = devicePath
if devicePath != "" {
volumeObj.devicePath = devicePath
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? If we prevent concurrent operation on same volume - won't a pod that uses same volume will pretty much cause another MountDevice operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need it. Consider UnmountDevice is in progress and a new pod arrives.

  1. UnmountDevice finishes. This calls MarkDeviceAsUnmounted which clears the devicePath, i.e. calls SetVolumeGloballyMounted(..., devicePath="", ...)
  2. New MountDevice (WaitForAttach) starts and sees empty devicePath -> problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to put a comment this late. very good catch of this bug. I think to avoid such bugs, it is better to not reuse SetVolumeGloballyMounted for both MarkDeviceAsMounted and MarkDeviceAsUnmounted (this function just to set the bool globallyMounted to false so no need to pass devicePath or deviceMountPath)

@jsafrane
Copy link
Member Author

is there there ever a situation where there should be more than one
operation in flight per volume? Note that I meant "should be" as in
"should be allowed to be".

I am not sure. However, in this case we did not see two operations in parallel, we saw UnmountDevice running and MountVolume enqueued. MountVolume was started after UnmountDevice finished and that is wrong - the device is not mounted, so MountVolume has nothing to bind-mount. We can:

  • Either not enqueue MountVolume operation when UnmountDevice is in progress (that's this PR)
  • Or update MountVolume operation to check if MountDevice was completed.

@pohly
Copy link
Contributor

pohly commented Nov 16, 2018 via email

UnmountDevice must not clear devicepath, because such devicePath
may come from node.status (e.g. on AWS) and subsequent MountDevice
operation (that may be already enqueued) needs it.
@jsafrane jsafrane force-pushed the volume-manager-races branch from fe6a419 to de9689b Compare November 16, 2018 12:32
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 16, 2018
@jsafrane
Copy link
Member Author

It seems I was wrong. Only one MountVolume() operation runs when multiple pods use the same volume, because pod name is not used in the operation name:

podName := nestedpendingoperations.EmptyUniquePodName

So operation checks in this PR are useless and only devicePath check is needed. I reworked the PR, now it's one if.

@pohly
Copy link
Contributor

pohly commented Nov 16, 2018 via email

@msau42
Copy link
Member

msau42 commented Nov 16, 2018

Please add unit tests to this function and reconciler like I suggested here: #70746 (comment)

@jsafrane jsafrane force-pushed the volume-manager-races branch from de9689b to 5283537 Compare November 16, 2018 16:52
@jsafrane
Copy link
Member Author

Unit test added. It fails without changes in actual_state_of_world.go and passes with them.

@gnufied
Copy link
Member

gnufied commented Nov 16, 2018

/lgtm

but I also think that - we should also merge #71095 because while we have fixed one race here, not relying on attachID value present in actual state of the world, while calling WaitForAttach makes sense. Many other drivers as well discover the actual device path in WaitForAttach call itself.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2018
@saad-ali
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2018
@AishSundar
Copy link
Contributor

@saad-ali @msau42 looks like this is ready for merge. let me know if you want help stamping it with 1.13 milestone

@AishSundar
Copy link
Contributor

/milestone v1.13
/priority critical-urgent
/remove-priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Nov 17, 2018
@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Nov 17, 2018
@k8s-ci-robot k8s-ci-robot removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 17, 2018
@k8s-ci-robot k8s-ci-robot merged commit f877b22 into kubernetes:master Nov 17, 2018
@gnufied
Copy link
Member

gnufied commented Dec 14, 2018

Lets cherry-pick this to 1.12 and 1.11. @jsafrane @msau42 any objectios. I think this will also fix long standing bug #67342

@msau42
Copy link
Member

msau42 commented Dec 17, 2018

sgtm

k8s-ci-robot added a commit that referenced this pull request Dec 20, 2018
…4-upstream-release-1.12

Automated cherry pick of #71074: Fixed clearing of devicePath after UnmountDevice
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 2, 2019
k8s-ci-robot added a commit that referenced this pull request Jan 4, 2019
…4-upstream-release-1.11

Automated cherry pick of #71074: Fixed clearing of devicePath after UnmountDevice
@dennis-benzinger-hybris

Is there any chance of getting this merge to 1.10 ?

@jsafrane
Copy link
Member Author

Is there any chance of getting this merge to 1.10 ?

No, Kubernetes supports only 3 releases at time (1.11 - 1.13).

@dennis-benzinger-hybris

Thanks @jsafrane. That confirms what I expected from reading the docs. I thought I'll ask nevertheless because apparently #72856 was merged to 1.10.

honkiko pushed a commit to honkiko/kubernetes that referenced this pull request Dec 5, 2019
Fix race between MountVolume and UnmountDevice (non-official backport to 1.10.5)

Fix race between MountVolume and UnmountDevice (non-official backport to 1.10.5)

FYI. this issue is fixed in by kubernetes#71074

k8s version	fixed version
v1.10	        no fix
v1.11	        1.11.7
v1.12	        1.12.5
v1.13	        no such issue


See merge request !55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants