-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] During volume live engine upgrade, delete replica with old engine image will make volume degraded forever #7012
Comments
@chriscchien Is this a regression from 1.5.1/1.4.3? or an existing issue? |
@PhanLe1010 Please help check this. |
It's a regression, we had a manual test case include this scenario. |
Looks like this is not a regression as I am able to reproduce it in v1.4.3 by:
In the current implementation, we don't continue the live engine upgrade when the volume is unhealthy (degraded) https://github.com/longhorn/longhorn-manager/blob/b810121b33789d145f220bfd0e41102a7801a354/controller/volume_controller.go#L2735C1-L2739C1. User would need to detach/reattach the volume to get out of this situation. Maybe we can keep this ticket to see if we can make improvement but I think this one is not a regression/release blocker |
Regarding to the error:
After instance-manager successfully replaced the engine process on old port with new port, looks like the engine controller was trying to resync the engine CR and retry the upgrade again but it wasn't aware that the engine already moved to a new port (the port
|
Test plan:
|
Pre Ready-For-Testing Checklist
|
Recommending to backport to v1.5.4. |
Hi @chriscchien This one is dependent on the new issue #7396. Let's wait for that one to merge first to fix a regression |
Verified pass on longhorn master(longhorn-manager During volume live engine upgrade, delete replica with old engine image, engine upgrade success and volume become healthy, in addition, data in volume is correct. |
Describe the bug (🐛 if you encounter this issue)
While perform volume live engine upgrade and immediately delete any old replica(replica with old engine image), the volume will kept in detached forever and stuck in the upgrading process.
Because Longhorn can not perform volume engine upgrade when volume is degraded, but can delete replica when perform volume live engine upgrade, It's a corner case and may need developer's clarify if this is expected, thanks.
To Reproduce
longhornio/longhorn-engine:v1.5.1
)(Or upgrade Longhorn from previous stable version(have volume attached) to master-head instead of previous steps)
longhornio/longhorn-engine:master-head
longhornio/longhorn-engine:v1.5.1
)Expected behavior
Prevent replica delete when engine upgrade or volume become healthy after perform reproduce steps
Support bundle for troubleshooting
Replica status (3 replicas with new engine image, 2 with old engine images(1 deleted before)), all are in running state
longhorn-manager log(can see info for Engine has been upgraded)
supportbundle_e6e3e73a-e898-4617-81ad-e21ad5fa3be4_2023-10-31T09-53-33Z.zip
Environment
Additional context
Can reproduce on v1.5.x-head
The text was updated successfully, but these errors were encountered: