-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LEP for Live Upgrade For Data Engine of V2 Volumes #8814
Conversation
19d04f5
to
e9ce49e
Compare
e9ce49e
to
6701dd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions, but the general concept is correct for me.
6701dd8
to
f0982df
Compare
|
||
### Non-goals [optional] | ||
|
||
- Support live upgrades of the data engine for a v2 volume with a single replica. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan for this case in the next version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It is a hard limit.
enhancements/20240624-live-upgrade-for-data-engine-of-v2-volumes.md
Outdated
Show resolved
Hide resolved
- Delete the old target for the volume on the upgrading node. | ||
``` | ||
InstanceDeleteTarget() | ||
|-> EngineDeleteTarget() | ||
``` | ||
- Resume the linear device mapper and continue IO processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Longhorn revert back to the old target if there is something wrong with the temporary target? and due to this case, it's better to resume the dm device IO before deleting the old target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to this case, it's better to resume the dm device IO before deleting the old target.
Yes, but I found the nvme will somehow crash if deleting the old target after resuming IO. I'm still investigating the root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to this case, it's better to resume the dm device IO before deleting the old target.
Ideally, yes. But we need to overcome #8814 (comment) first.
- If the existing instance-manager pod does not have any running engines with targets, the instance-manager and its pod will be deleted by the node controller. | ||
- Replicas managed by the deleted instance-manager are marked as ERROR, causing any volume with replicas on the upgrading node to become degraded. | ||
- A new instance-manager is then created and starts running. | ||
- Switch Over Target Back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, will Longhorn revert to the temporary target if there is something wrong with the new target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue for now #8814 (comment)
f0982df
to
ecd14ad
Compare
Longhorn 6001 Signed-off-by: Derek Su <derek.su@suse.com>
ecd14ad
to
f06da81
Compare
Replaced by #9807 |
Which issue(s) this PR fixes:
Issue #6001
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context