Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LEP for Live Upgrade For Data Engine of V2 Volumes #8814

Closed
wants to merge 1 commit into from

Conversation

derekbit
Copy link
Member

Which issue(s) this PR fixes:

Issue #6001

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

@derekbit derekbit force-pushed the v2-volume-upgrade-lep branch 4 times, most recently from 19d04f5 to e9ce49e Compare June 25, 2024 06:12
@derekbit derekbit force-pushed the v2-volume-upgrade-lep branch from e9ce49e to 6701dd8 Compare June 25, 2024 08:31
@derekbit derekbit self-assigned this Jun 25, 2024
@derekbit
Copy link
Member Author

@DamiaSan @shuo-wu @innobead Could you take a look at the data plane design? Thank you.

@derekbit
Copy link
Member Author

@DamiaSan @shuo-wu @innobead Can you take a look at the data plane design? Thank you.

Copy link
Contributor

@DamiaSan DamiaSan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions, but the general concept is correct for me.

@derekbit derekbit force-pushed the v2-volume-upgrade-lep branch from 6701dd8 to f0982df Compare June 27, 2024 23:56

### Non-goals [optional]

- Support live upgrades of the data engine for a v2 volume with a single replica.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan for this case in the next version

Copy link
Member Author

@derekbit derekbit Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It is a hard limit.

Comment on lines +57 to +68
- Delete the old target for the volume on the upgrading node.
```
InstanceDeleteTarget()
|-> EngineDeleteTarget()
```
- Resume the linear device mapper and continue IO processing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Longhorn revert back to the old target if there is something wrong with the temporary target? and due to this case, it's better to resume the dm device IO before deleting the old target.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to this case, it's better to resume the dm device IO before deleting the old target.

Yes, but I found the nvme will somehow crash if deleting the old target after resuming IO. I'm still investigating the root cause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to this case, it's better to resume the dm device IO before deleting the old target.

Ideally, yes. But we need to overcome #8814 (comment) first.

- If the existing instance-manager pod does not have any running engines with targets, the instance-manager and its pod will be deleted by the node controller.
- Replicas managed by the deleted instance-manager are marked as ERROR, causing any volume with replicas on the upgrading node to become degraded.
- A new instance-manager is then created and starts running.
- Switch Over Target Back
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, will Longhorn revert to the temporary target if there is something wrong with the new target?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue for now #8814 (comment)

@derekbit derekbit force-pushed the v2-volume-upgrade-lep branch from f0982df to ecd14ad Compare June 28, 2024 00:38
Longhorn 6001

Signed-off-by: Derek Su <derek.su@suse.com>
@derekbit derekbit force-pushed the v2-volume-upgrade-lep branch from ecd14ad to f06da81 Compare June 28, 2024 00:39
@derekbit
Copy link
Member Author

Replaced by #9807

@derekbit derekbit closed this Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New Issues
Development

Successfully merging this pull request may close these issues.

3 participants