[BUG] DR volume gets stuck if there is only a rebuilding replica running #2753
Open
Description
Describe the bug
To Reproduce
Steps to reproduce the behavior:
- Launch a DR volume with 2 replicas
- During the restoring, crash one replica
- When Longhorn restarts/rebuilds the crashed replica but hasn't started the restoring for the replica, crash another restoring replica. (Race here)
- After the rebuilding replica finishes restore, Longhorn cannot find another RW replica to verify the snapshot chain for this rebuilding replica. Then the DR volume gets stuck there.
Expected behavior
The restarted replica should become the healthy one even if there is no other RW replicas in the volume.
Log
Environment:
- Longhorn version: master-06/30/2021
Additional context
Metadata
Assignees
Labels
Type
Projects
Status
New Issues