[BUG] Enabling replica-auto-balance tries to replicate to disabled nodes causing lots of errors in the logs and in the UI #6508

jsalatiel · 2023-08-11T21:41:19Z

Describe the bug (🐛 if you encounter this issue)

I have a 3 zone cluster and create-default-disk-labeled-nodes is set to true.
Zone 1 and Zone 2 have 2 untainted nodes labeled as node.longhorn.io/create-default-disk=true each one replicating data.
Zone 3 has one single node that does not replicate anything( so no label set ) , but it is able to mount from the longhorn storage class.
It looks like this in the UI.

All the volumes are in healthy state:

The moment I set replica-auto-balance to best-effort I start getting the volume can not be scheduled in all volumes

I suppose since node0 is on another zone, the best-effort will try to schedule there even if that node is disabled which it should not be doing.

Support bundle for troubleshooting

Environment

Longhorn version: v1.5.1
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: kubespray
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 5

Additional context

jsalatiel · 2023-09-12T19:55:07Z

@c3y1huang any progress on this?

c3y1huang · 2023-12-06T05:28:28Z

@c3y1huang any progress on this?

@jsalatiel I've had a few other tasks on my plate, but I will have the fix out soon.

longhorn-io-github-bot · 2023-12-07T03:38:11Z

chriscchien · 2023-12-08T00:50:49Z

Verified pass on longhorn-master (longhorn-manager a074cd) with test steps

In master-head, enable replica-auto-balance to best-effort won't replicate replica on nodes have no schedulable disk.

jsalatiel added kind/bug require/qa-review-coverage Require QA to review coverage labels Aug 11, 2023

longhorn-io-github-bot added this to Community Review Sprint Aug 11, 2023

github-project-automation bot moved this to New in Community Review Sprint Aug 11, 2023

jsalatiel changed the title ~~[BUG] Enabling replica-auto-balance try to replicated for disabled nodes causing lots of errors in the logs and in the UI~~ [BUG] Enabling replica-auto-balance tries to replicate to disabled nodes causing lots of errors in the logs and in the UI Aug 11, 2023

innobead assigned c3y1huang Aug 11, 2023

innobead added this to the v1.6.0 milestone Sep 12, 2023

innobead added area/volume-replica-scheduling Volume replica scheduling related priority/0 Must be implement or fixed in this release (managed by PO) labels Sep 12, 2023

c3y1huang moved this from New to Resolved/Scheduled in Community Review Sprint Sep 26, 2023

This was referenced Dec 6, 2023

test(integration): replica-auto-balance when disabled disk scheduling in zone longhorn/longhorn-tests#1615

Merged

fix(replica-auto-balance): loop when node has no schedulable disk longhorn/longhorn-manager#2336

Merged

c3y1huang added the require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated label Dec 7, 2023

innobead added backport/1.4.5 backport/1.5.4 labels Dec 7, 2023

chriscchien self-assigned this Dec 7, 2023

chriscchien closed this as completed Dec 8, 2023

derekbit added this to Longhorn Sprint Aug 3, 2024

derekbit moved this to Closed in Longhorn Sprint Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Enabling replica-auto-balance tries to replicate to disabled nodes causing lots of errors in the logs and in the UI #6508

[BUG] Enabling replica-auto-balance tries to replicate to disabled nodes causing lots of errors in the logs and in the UI #6508

jsalatiel commented Aug 11, 2023

jsalatiel commented Sep 12, 2023

c3y1huang commented Dec 6, 2023

longhorn-io-github-bot commented Dec 7, 2023 •

edited by c3y1huang

Loading

chriscchien commented Dec 8, 2023

[BUG] Enabling replica-auto-balance tries to replicate to disabled nodes causing lots of errors in the logs and in the UI #6508

[BUG] Enabling replica-auto-balance tries to replicate to disabled nodes causing lots of errors in the logs and in the UI #6508

Comments

jsalatiel commented Aug 11, 2023

Describe the bug (🐛 if you encounter this issue)

Support bundle for troubleshooting

Environment

Additional context

jsalatiel commented Sep 12, 2023

c3y1huang commented Dec 6, 2023

longhorn-io-github-bot commented Dec 7, 2023 • edited by c3y1huang Loading

Pre Ready-For-Testing Checklist

chriscchien commented Dec 8, 2023

longhorn-io-github-bot commented Dec 7, 2023 •

edited by c3y1huang

Loading