Skip to content

[BUG] DR volume gets stuck if there is only a rebuilding replica running #2753

Open
@shuo-wu

Description

Describe the bug

To Reproduce
Steps to reproduce the behavior:

  1. Launch a DR volume with 2 replicas
  2. During the restoring, crash one replica
  3. When Longhorn restarts/rebuilds the crashed replica but hasn't started the restoring for the replica, crash another restoring replica. (Race here)
  4. After the rebuilding replica finishes restore, Longhorn cannot find another RW replica to verify the snapshot chain for this rebuilding replica. Then the DR volume gets stuck there.

Expected behavior
The restarted replica should become the healthy one even if there is no other RW replicas in the volume.

Log

Environment:

  • Longhorn version: master-06/30/2021

Additional context

Metadata

Assignees

Type

No type

Projects

  • Status

    New Issues

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions