[BUG] allow volume migration when volume is degraded (harvester vm) #2805

guangbochen · 2021-07-17T00:48:42Z

Is your feature request related to a problem? Please describe.
during the Harvester VM live migration(consist of 2 nodes), all LH volumes are degraded, given that the Longhorn replica is equal to 3, so the pod is failed to start correclty.

unable to attach volume pvc-83b575fa-4229-46be-a6a0-186561a903cc to bm-harv2: volume must be healthy to start migration, code=Server Error, detail=] from [http://longhorn-backend:9500/v1/volumes/pvc-83b575fa-4229-46be-a6a0-186561a903cc?action=attach

NAME                                     READY   STATUS              RESTARTS   AGE
virt-launcher-lawr-1-j75dc               1/1     Running             0          116m
virt-launcher-lawr-1-tcxvq               0/1     ContainerCreating   0          8m47s


Warning  FailedAttachVolume  3s (x3 over 9s)    attachdetach-controller  AttachVolume.Attach failed for volume "pvc-83b575fa-4229-46be-a6a0-186561a903cc" : rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [message=unable to attach volume pvc-83b575fa-4229-46be-a6a0-186561a903cc to bm-harv2: volume must be healthy to start migration, code=Server Error, detail=] from [http://longhorn-backend:9500/v1/volumes/pvc-83b575fa-4229-46be-a6a0-186561a903cc?action=attach]

Describe the solution you'd like
need a feasible way to relax the restriction on VM migration when the volume is degraded.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
harvester/harvester#798

The text was updated successfully, but these errors were encountered:

yasker · 2021-08-01T14:59:45Z

Raise the priority due to it has a high impact in case the user only has 3 nodes and lost one node.

yasker · 2021-08-17T00:51:27Z

We can fix this to v1.2.1.

longhorn-io-github-bot · 2021-09-09T12:49:50Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at: the e2e test skeleton & the above description
Is there a workaround for the issue? If so, where is it documented?
The workaround is at:
~~Does the PR include the explanation for the fix or the feature?~~
Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
The PR for the YAML change is at:
The PR for the chart change is at:
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at
Allow migration for degraded volumes longhorn-manager#1033
Fix regressions introduced by migration & rebuilding changes longhorn-manager#1039
[Backport][v1.2.x] Migration fix and concurrent rebuilding feature introducing longhorn-manager#1049
[Backport][v1.1.3] Fix migration issue and enhance the failed replica reusage & cleanup longhorn-manager#1048
Which areas/issues this PR might have potential impacts on?
Area: Volume live migration & volume live upgrade.
Issues
If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
The LEP PR is at
If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
The UI issue/PR is at
If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
The documentation issue/PR is at
If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
The automation skeleton PR is at
integration: Add test skeletons for degraded volume live migration longhorn-tests#704
[Backport][v1.2.x] Migration and concurrent rebuilding feature test skeletons longhorn-tests#719
[Backport][v1.1.3] Migration test skeleton & engine image wait function fix longhorn-tests#720
The automation test case PR is at
The issue of automation test case implementation is at (please create by the template)
If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
The engine automation PR is at
If labeled: require/manual-test-plan Has the manual test plan been documented?
The updated manual test plan is at
If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
The compatibility issue is filed at

shuo-wu · 2021-09-13T12:42:08Z

The live upgrade feature doesn't work correctly. Will have a PR fixing this regression later.

shuo-wu · 2021-09-16T11:50:53Z

The regression is fixed. Please verify it during testing.

khushboo-rancher · 2021-09-22T05:49:16Z

Validated with v1.2.1-rc1 and Longhorn-master -09/21/2021

Validation - Pass

Validated below scenarios with a Harvester setup with upgraded Longhorn.

The migration works with degraded volume.
Migration with rebuilding is in progress - First rebuilding completes and then only the migration starts and complete.
Migration with failed replica - Only healthy replicas get created first for the migration.

Note: Observed the live upgrade failure on the set up as mentioned #2805 (comment). @shuo-wu Do we have any issue to track it?

khushboo-rancher · 2021-09-22T19:10:22Z

Created #3052 for upgrade problem
Observed another problem #3053

guangbochen added the kind/feature Feature request, new feature label Jul 17, 2021

PhanLe1010 removed the kind/feature Feature request, new feature label Jul 17, 2021

yasker added this to the v1.2.0 milestone Jul 17, 2021

yasker added priority/1 Highly recommended to implement or fix in this release (managed by PO) component/longhorn-manager Longhorn manager (control plane) kind/bug labels Jul 17, 2021

innobead assigned shuo-wu Jul 19, 2021

yasker added priority/0 Must be implement or fixed in this release (managed by PO) and removed priority/1 Highly recommended to implement or fix in this release (managed by PO) labels Aug 1, 2021

innobead assigned PhanLe1010 and unassigned shuo-wu Aug 2, 2021

joshimoo changed the title ~~[FEATURE] feasible way to relax the restriction on VM migration~~ [FEATURE] allow volume migration when volume is degraded (harvester vm) Aug 2, 2021

innobead assigned shuo-wu and unassigned PhanLe1010 Aug 3, 2021

innobead changed the title ~~[FEATURE] allow volume migration when volume is degraded (harvester vm)~~ [BUG] allow volume migration when volume is degraded (harvester vm) Aug 11, 2021

yasker modified the milestones: v1.2.0, v1.2.1 Aug 17, 2021

yasker unassigned shuo-wu Aug 17, 2021

guangbochen mentioned this issue Aug 18, 2021

Cannot migrate VMs when the cluster consists of 2 nodes harvester/harvester#798

Closed

innobead assigned shuo-wu Aug 30, 2021

This was referenced Sep 8, 2021

Allow migration for degraded volumes longhorn/longhorn-manager#1033

Merged

integration: Add test skeletons for degraded volume live migration longhorn/longhorn-tests#704

Merged

shuo-wu added require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated backport/1.2.1 Require to backport to 1.2.1 release branch labels Sep 9, 2021

yasker modified the milestones: v1.2.1, v1.3.0 Sep 10, 2021

innobead added the backport/1.1.3 Require to backport to 1.1.3 release branch label Sep 16, 2021

khushboo-rancher self-assigned this Sep 20, 2021

khushboo-rancher closed this as completed Sep 22, 2021

innobead added backport-needed/1.1.x and removed backport/1.1.3 Require to backport to 1.1.3 release branch labels Oct 11, 2021

innobead added the backport/1.1.3 Require to backport to 1.1.3 release branch label Dec 10, 2021

derekbit added this to Longhorn Sprint Aug 4, 2024

github-project-automation bot moved this to New Issues in Longhorn Sprint Aug 4, 2024

derekbit moved this from New Issues to Closed in Longhorn Sprint Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] allow volume migration when volume is degraded (harvester vm) #2805

[BUG] allow volume migration when volume is degraded (harvester vm) #2805

guangbochen commented Jul 17, 2021

yasker commented Aug 1, 2021

yasker commented Aug 17, 2021

longhorn-io-github-bot commented Sep 9, 2021 •

edited by shuo-wu

Loading

shuo-wu commented Sep 13, 2021

shuo-wu commented Sep 16, 2021

khushboo-rancher commented Sep 22, 2021 •

edited

Loading

khushboo-rancher commented Sep 22, 2021

[BUG] allow volume migration when volume is degraded (harvester vm) #2805

[BUG] allow volume migration when volume is degraded (harvester vm) #2805

Comments

guangbochen commented Jul 17, 2021

yasker commented Aug 1, 2021

yasker commented Aug 17, 2021

longhorn-io-github-bot commented Sep 9, 2021 • edited by shuo-wu Loading

Pre Ready-For-Testing Checklist

shuo-wu commented Sep 13, 2021

shuo-wu commented Sep 16, 2021

khushboo-rancher commented Sep 22, 2021 • edited Loading

khushboo-rancher commented Sep 22, 2021

longhorn-io-github-bot commented Sep 9, 2021 •

edited by shuo-wu

Loading

khushboo-rancher commented Sep 22, 2021 •

edited

Loading