Add LEP for Live Upgrade For Data Engine of V2 Volumes #8814

derekbit · 2024-06-24T07:18:27Z

Which issue(s) this PR fixes:

Issue #6001

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

derekbit · 2024-06-26T16:07:09Z

@DamiaSan @shuo-wu @innobead Could you take a look at the data plane design? Thank you.

derekbit · 2024-06-26T16:07:12Z

@DamiaSan @shuo-wu @innobead Can you take a look at the data plane design? Thank you.

DamiaSan

Just a few questions, but the general concept is correct for me.

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md

shuo-wu · 2024-06-27T23:49:21Z

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md

+
+### Non-goals [optional]
+
+- Support live upgrades of the data engine for a v2 volume with a single replica.


Is there a plan for this case in the next version

No. It is a hard limit.

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md

shuo-wu · 2024-06-27T23:58:42Z

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md

+      - Delete the old target for the volume on the upgrading node.
+        ```
+        InstanceDeleteTarget()
+        |-> EngineDeleteTarget()
+        ```
+      - Resume the linear device mapper and continue IO processing.


Will Longhorn revert back to the old target if there is something wrong with the temporary target? and due to this case, it's better to resume the dm device IO before deleting the old target.

due to this case, it's better to resume the dm device IO before deleting the old target.

Yes, but I found the nvme will somehow crash if deleting the old target after resuming IO. I'm still investigating the root cause.

due to this case, it's better to resume the dm device IO before deleting the old target.

Ideally, yes. But we need to overcome #8814 (comment) first.

shuo-wu · 2024-06-28T00:01:10Z

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md

+      - If the existing instance-manager pod does not have any running engines with targets, the instance-manager and its pod will be deleted by the node controller.
+      - Replicas managed by the deleted instance-manager are marked as ERROR, causing any volume with replicas on the upgrading node to become degraded.
+      - A new instance-manager is then created and starts running.
+  - Switch Over Target Back


Similarly, will Longhorn revert to the temporary target if there is something wrong with the new target?

Same issue for now #8814 (comment)

Longhorn 6001 Signed-off-by: Derek Su <derek.su@suse.com>

derekbit · 2024-11-14T08:48:57Z

Replaced by #9807

derekbit force-pushed the v2-volume-upgrade-lep branch 4 times, most recently from 19d04f5 to e9ce49e Compare June 25, 2024 06:12

derekbit mentioned this pull request Jun 25, 2024

[FEATURE] v2 volume supports live upgrade for data plane #6001

Closed

4 tasks

derekbit force-pushed the v2-volume-upgrade-lep branch from e9ce49e to 6701dd8 Compare June 25, 2024 08:31

derekbit self-assigned this Jun 25, 2024

DamiaSan reviewed Jun 27, 2024

View reviewed changes

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md Show resolved Hide resolved

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md Show resolved Hide resolved

enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md Show resolved Hide resolved

derekbit force-pushed the v2-volume-upgrade-lep branch from 6701dd8 to f0982df Compare June 27, 2024 23:56

shuo-wu reviewed Jun 28, 2024

View reviewed changes

derekbit force-pushed the v2-volume-upgrade-lep branch from f0982df to ecd14ad Compare June 28, 2024 00:38

feat(lep): add live upgrade for data engine of v2 volumes

f06da81

Longhorn 6001 Signed-off-by: Derek Su <derek.su@suse.com>

derekbit force-pushed the v2-volume-upgrade-lep branch from ecd14ad to f06da81 Compare June 28, 2024 00:39

derekbit closed this Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LEP for Live Upgrade For Data Engine of V2 Volumes #8814

Add LEP for Live Upgrade For Data Engine of V2 Volumes #8814

derekbit commented Jun 24, 2024

derekbit commented Jun 26, 2024

derekbit commented Jun 26, 2024

DamiaSan left a comment

shuo-wu Jun 27, 2024

derekbit Jun 28, 2024 •

edited

Loading

shuo-wu Jun 27, 2024

derekbit Jun 28, 2024

derekbit Jun 28, 2024

shuo-wu Jun 28, 2024

derekbit Jun 28, 2024

derekbit commented Nov 14, 2024


		### Non-goals [optional]

		- Support live upgrades of the data engine for a v2 volume with a single replica.

Add LEP for Live Upgrade For Data Engine of V2 Volumes #8814

Add LEP for Live Upgrade For Data Engine of V2 Volumes #8814

Conversation

derekbit commented Jun 24, 2024

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

derekbit commented Jun 26, 2024

derekbit commented Jun 26, 2024

DamiaSan left a comment

Choose a reason for hiding this comment

shuo-wu Jun 27, 2024

Choose a reason for hiding this comment

derekbit Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

shuo-wu Jun 27, 2024

Choose a reason for hiding this comment

derekbit Jun 28, 2024

Choose a reason for hiding this comment

derekbit Jun 28, 2024

Choose a reason for hiding this comment

shuo-wu Jun 28, 2024

Choose a reason for hiding this comment

derekbit Jun 28, 2024

Choose a reason for hiding this comment

derekbit commented Nov 14, 2024

derekbit Jun 28, 2024 •

edited

Loading