Skip to content

[BUG] Replica Auto Balance repeatedly delete the local replica and trigger rebuilding #4761

Closed
@PhanLe1010

Description

Describe the bug

Replica Auto Balance repeatedly delete the local replica and trigger rebuilding

To Reproduce

Steps to reproduce the behavior:

  1. Install Longhorn 1.3.2
  2. Set Replica Auto Balance to best-effort
  3. Create a volume of 1 replica with data-locality set to best-effort
  4. Attach the volume to a node that doesn't have any replica
  5. Observer that Longhorn start rebuilding a local replica
  6. If we are lucky, the new local replica has a name alphabetically smaller than the old replica. So Longhorn delete it
  7. The cycle repeats again

Expected behavior

Longhorn shouldn't repeatedly deleted the newly rebuilt local replica

Log or Support bundle

longhorn-support-bundle_7761b134-b8e6-4ce4-b178-1048f894641c_2022-10-21T00-14-13Z.zip

Environment

  • Longhorn version: Longhorn v1.3.2

Additional context

It seems that we don't respect the local replica here https://github.com/longhorn/longhorn-manager/blob/c4e7942684cc1f8ece900854d09126a7b1f8c0b6/controller/volume_controller.go#L931-L958

Metadata

Labels

area/volume-replica-schedulingVolume replica scheduling relatedbackport/1.2.6backport/1.3.3component/longhorn-managerLonghorn manager (control plane)kind/bugpriority/0Must be implement or fixed in this release (managed by PO)require/auto-e2e-testRequire adding/updating auto e2e test cases if they can be automatedseverity/1Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade)

Type

No type

Projects

  • Status

    Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions