-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Instance manager may not update instance status for a minute after starting #5809
Comments
@ejweber Please remember to move the issue to the right pipeline to reflect the status. I assume this is in review? |
Pre Ready-For-Testing Checklist
|
@ejweber move this to ready-for-testing and provide the testing info for @longhorn/qa. Thanks. Also, update the backport issues accordingly. |
Verified in longhorn master (longhorn-manager Instance-manager pod now will not wait 1 minutes to update status after restarting.
|
Describe the bug (🐛 if you encounter this issue)
When a node is cold rebooted, Longhorn eventually notices. After a little while, Longhorn decides all instance processes on that node should be stopped before they can be started again, however, the instance manager controller monitor is no longer running, so InstanceProcessStatuses are not updated.
When instance manager comes back online, the replica controller tries to kill replica processes. However, for a full minute, InstanceProcessStatuses are still not updated, and Longhorn is not aware that the replica processes have been killed. Full recovery can't really begin until this time is up.
To Reproduce
Steps to reproduce the behavior:
reboot now
).Expected behavior
Statuses are correct almost immediately after instance manager-r starts and processes actually start soon after that.
Log or Support bundle
See above.
Environment
Additional context
Ran across this while trying to solve #5709.
The text was updated successfully, but these errors were encountered: