-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(vmclone): delete snapshot and restore after pvc bound #10888
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just a bit concerned about testing
Should we start talking about a periodic with ceph-wffc so we don't miss this kind of stuff in the future? ofc this would be effort outside of this PR and owned by sig-storage
Thanks @akalenyu for the review! |
a6852f6
to
c16db41
Compare
/retest |
1 similar comment
/retest |
expectEvent(TargetVMCreated) | ||
}) | ||
|
||
It("if not all the PVCs are bound, nothing should happen", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fossedihelm Hi! just want to make sure, if you run this test w/o the changes in the product code, do you get the issue reproduction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Thanks @fossedihelm !
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enp0s3 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a functional test?
The clone process involves `vmsnapshot` and `vmrestore`. Currently, these are both deleted as soon as the restore is finished. If the source vm has a pvc the `vmrestore` process creates another pvc, pointing to the volumesnapshot previously created by the `vmsnapshot`. With `Immediate` binding mode the pvc(s) created are immediately bound. When `WaitForFirstConsumer` binding mode is used the pvc will stay in "Pending" state until a pod is created(the virt-launcher pod). In the latter case, when the cloned vm is started, it is stuck in "WaitingForVolumeBinding" becuse the related PVC binding is failing, due to the snapshot source has being deleted. In other words, we cannot delete the snapshot resources until all the pvc are Bound. This patch aim to delay the snapshot and the restore deletion until we are sure that all the pvc are bound. Signed-off-by: fossedihelm <ffossemo@redhat.com>
If a pvc associated to a vm clone is using a storage class with wffc binding mode, the vmrestore and the vm snapshot resources should be deleted only after the target pvc are bound. This commit cover this case with an e2e test. Signed-off-by: fossedihelm <ffossemo@redhat.com>
c16db41
to
40100b8
Compare
@xpivarc I added the functional test and rebased. PTAL Thanks! |
/retest |
@@ -526,6 +528,10 @@ var _ = Describe("[Serial]VirtualMachineClone Tests", Serial, func() { | |||
vm, err := virtClient.VirtualMachine(vm.Namespace).Create(context.Background(), vm) | |||
Expect(err).ToNot(HaveOccurred()) | |||
|
|||
if !running && libstorage.IsStorageClassBindingModeWaitForFirstConsumer(storageClass) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should not extend the HaveSucceeded
, "hiding" the wffc binding mode behavior.
The createVMWithStorageClass
function should address the different behavior between the storage classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the more verbose variant here:
844e69b#diff-847cd06c1bf6ef597ea96894f8b32ecb4d074ad1bc785b39871e11fb6ab71b3dR538
(if statement can be dropped)
Consistently(func(g Gomega) { | ||
vmSnapshot, err := virtClient.VirtualMachineSnapshot(vmClone.Namespace).Get(context.Background(), *vmSnapshotName, v1.GetOptions{}) | ||
g.Expect(err).ShouldNot(HaveOccurred()) | ||
g.Expect(vmSnapshot).ShouldNot(BeNil()) | ||
vmRestore, err := virtClient.VirtualMachineRestore(vmClone.Namespace).Get(context.Background(), *vmRestoreName, v1.GetOptions{}) | ||
g.Expect(err).ShouldNot(HaveOccurred()) | ||
g.Expect(vmRestore).ShouldNot(BeNil()) | ||
}, 30*time.Second).Should(Succeed(), "vmsnapshot and vmrestore should not be deleted until the pvc is bound") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if we need this. Isn't it enough to check a phase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I want to be sure the vmrestore and the vmsnapshot are not deleted until the pvc is bound (bu run the vm).
We can check the phase, but I should do it Consistenly
IMO. WDYT? Thanks
Eventually(func(g Gomega) { | ||
_, err := virtClient.VirtualMachineSnapshot(vmClone.Namespace).Get(context.Background(), *vmSnapshotName, v1.GetOptions{}) | ||
g.Expect(errors.IsNotFound(err)).Should(BeTrue()) | ||
_, err = virtClient.VirtualMachineRestore(vmClone.Namespace).Get(context.Background(), *vmRestoreName, v1.GetOptions{}) | ||
g.Expect(errors.IsNotFound(err)).Should(BeTrue()) | ||
}, 1*time.Minute).Should(Succeed(), "vmsnapshot and vmrestore should be deleted once the pvc is bound") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced by using BeGone
but you do provide the description so I think this is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have several follow-up PRs, I will adjust it later if you agree. Thanks
/lgtm |
Required labels detected, running phase 2 presubmits: |
/retest-required |
1 similar comment
/retest-required |
/cherrypick release-1.1 |
@fossedihelm: once the present PR merges, I will cherry-pick it on top of release-1.1 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
2 similar comments
/retest-required |
/retest-required |
/retest |
/override pull-kubevirt-e2e-k8s-1.29-sig-monitoring This lane was triggered by prow statusreconciler but is not required for this PR |
@brianmcarey: Overrode contexts on behalf of brianmcarey: pull-kubevirt-e2e-k8s-1.29-sig-monitoring In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
/retest-required |
@fossedihelm: new pull request created: #11085 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The clone process involves
vmsnapshot
andvmrestore
.Currently, these are both deleted as soon as the restore is finished. If the source vm has a pvc the
vmrestore
process creates another pvc, pointing to the volumesnapshot previously created by thevmsnapshot
.With
Immediate
binding mode the pvc(s) created are immediately bound.When
WaitForFirstConsumer
binding mode is used the pvc will stay in "Pending" state until a pod is created(the virt-launcher pod).In the latter case, when the cloned vm is started, it is stuck in "WaitingForVolumeBinding" becuse the related PVC binding is failing, due to the snapshot source has being deleted. In other words, we cannot delete the snapshot resources until all the pvc are Bound.
This patch aim to delay the snapshot and the restore deletion until we are sure that all the pvc are bound.
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #10832
jira-ticket: https://issues.redhat.com/browse/CNV-32667
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: