[BUG] Longhorn Helm Chart doesn't have tolerations for CSI plugin and Engine image daemonsets #6606

KostLinux · 2023-08-29T09:24:16Z

Describe the bug (🐛 if you encounter this issue)

After upgrading Longhorn from the Helm chart, the engine image and CSI plugin do not have tolerations, resulting in those components not being applied to storage nodes.

To Reproduce

Create a few workers, with three of them labeled as Longhorn nodes.
Apply taints on the three Longhorn nodes.
Upgrade Longhorn using the Helm chart, ensuring to include tolerations.
Go to the Longhorn UI and navigate to Settings > Engine Image tab.
Click on the engine image.
Observe that the engine image is only deployed on untainted nodes.
Add tolerations to the engine image using kubectl edit.
Verify that the engine image is now deployed on the storage nodes.

Expected behavior

When upgrading Longhorn from the Helm chart, the engine images and CSI plugin daemonsets should be automatically deployed on storage nodes.

Support bundle for troubleshooting

Environment

Longhorn version: v1.4.2
Installation method: Rancher Catalog App
Kubernetes distro and version: RKE / Kubernetes v1.24.8
Number of management nodes in the cluster: 3
Number of worker nodes in the cluster: 8
Node Configuration:
OS type and version: Oracle Linux 8
CPU per node: 8
Memory per node: 16 GB
Disk type: 200 GB
Network bandwidth between the nodes: 1 Gbps
Underlying Infrastructure: VMware
Number of Longhorn volumes in the cluster: 500

Additional context

It is important to note that some PVCs may come from an older version of Longhorn. In such cases, the engine images for those PVCs will not be updated until the daemonsets are manually edited. Ensure that the engine image automatic update option is enabled.

Also the reason i've created another bug issue, you've ignored this issue > #6103

innobead · 2023-08-29T14:58:42Z

Thanks for reporting this. I closed the original, so we can keep this.

This part should not have a recent change. cc @ChanYiLin follow up.

sybnex · 2024-01-29T14:22:36Z

I ran into this issue today ... is this still on the roadmap?

kreeger · 2024-09-02T15:01:26Z

I ran into this today as well with a fresh Helm install; I'm setting defaultSettings.taintToleration: Storage=true:NoSchedule in my values (with Storage=true:NoSchedule as the taint applied to my 3 Longhorn-only nodes) and it doesn't seem to get applied to the engine-image-ei-04c05bf8 or longhorn-csi-plugin DaemonSets.

defaultSettings:
  createDefaultDiskLabeledNodes: true
  defaultReplicaCount: 2
  taintToleration: Storage=true:NoSchedule
  removeSnapshotsDuringFilesystemTrim: enabled

I can see it clear as day in my longhorn-default-setting ConfigMap though, for what it's worth, but I don't think Longhorn actually picks up this config map; if I browse to Longhorn UI's settings, all those have their default values applied (including an empty toleration in the Danger Zone) and if I add a toleration there, it works. All of the Setting objects in my longhorn-system namespace all have their default values. Here's my ConfigMap:

create-default-disk-labeled-nodes: true

default-replica-count: 2

taint-toleration: Storage=true:NoSchedule

priority-class: longhorn-critical

disable-revision-counter: true

remove-snapshots-during-filesystem-trim: enabled

Happy to help troubleshoot, I'm on v1.30.4+rke2r1. One would think I've hit #2562, but I'm using only settings that are defined in the v1.7.0 values.yaml.

Longhorn version: v1.7.0
Installation method: Helm
Kubernetes distro and version: v1.30.4+rke2r1
Number of management nodes in the cluster: 3
Number of worker nodes in the cluster: 6
Node Configuration:
- OS type and version: Debian 12 Bookworm
- CPU per node: 8
- Memory per node: workers 18 GB, management 8 GB
- Disk type: 128 GB
Network bandwidth between the nodes: 1 Gbps
Underlying Infrastructure: Proxmox
Number of Longhorn volumes in the cluster: 0

kreeger · 2024-09-02T16:03:53Z

Ah-hah: it was my Helm value for removeSnapshotsDuringFilesystemTrim causing this problem; it needed to be true or false, but not enabled (which is an acceptable value for the persistence section of the values chart, but not the defaultSettings section). I discovered this when writing up a loop of Ansible patch operations to modify the Setting objects after an initial install. With the updated section in my values:

defaultSettings:
  createDefaultDiskLabeledNodes: true
  defaultReplicaCount: '2'
  taintToleration: Storage=true:NoSchedule
  removeSnapshotsDuringFilesystemTrim: true # Note this line!

…my Setting objects are now properly populated, including my taint-toleration Setting.

NAME                                                              VALUE                                         APPLIED   AGE
create-default-disk-labeled-nodes                                 true                                          true      5m13s
default-replica-count                                             2                                             true      5m13s
taint-toleration                                                  Storage=true:NoSchedule                       true      5m13s
remove-snapshots-during-filesystem-trim                           true                                          true      5m12s

So to those having this issue if you're installing from Helm: check the values you have set for all of your defaultSettings keys in your Helm chart; some of them may not map to acceptable Longhorn values or value types.

To the Longhorn maintainers: since these defaultSettings values are picked up and applied the way they are, each key could use additional specific documentation about acceptable values. Many do, but not all!

KostLinux · 2024-09-09T05:36:41Z

@mantissahz will it be fixed?
This issue is up already for one year :D

opethema · 2025-01-06T10:43:21Z

I can only see the issue when doing helm upgrade.
but with a fresh helm install, the tolerations are added as expected.

KostLinux added kind/bug require/qa-review-coverage Require QA to review coverage labels Aug 29, 2023

longhorn-io-github-bot added this to Community Review Sprint Aug 29, 2023

github-project-automation bot moved this to New in Community Review Sprint Aug 29, 2023

c3y1huang added the area/install-uninstall-upgrade Install, Uninstall or Upgrade related label Sep 26, 2023

longhorn-io-github-bot added this to Longhorn Sprint Aug 2, 2024

longhorn-io-github-bot moved this to New Issues in Longhorn Sprint Aug 2, 2024

innobead added this to the v1.8.0 milestone Sep 2, 2024

innobead assigned mantissahz Sep 2, 2024

derekbit modified the milestones: v1.8.0, v1.9.0 Nov 25, 2024

mantissahz moved this from Analysis and Design to New Issues in Longhorn Sprint Nov 28, 2024

longhorn-io-github-bot moved this to In Progress in Community Review Sprint Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Longhorn Helm Chart doesn't have tolerations for CSI plugin and Engine image daemonsets #6606

[BUG] Longhorn Helm Chart doesn't have tolerations for CSI plugin and Engine image daemonsets #6606

KostLinux commented Aug 29, 2023

innobead commented Aug 29, 2023

sybnex commented Jan 29, 2024

kreeger commented Sep 2, 2024 •

edited

Loading

kreeger commented Sep 2, 2024 •

edited

Loading

KostLinux commented Sep 9, 2024

opethema commented Jan 6, 2025

[BUG] Longhorn Helm Chart doesn't have tolerations for CSI plugin and Engine image daemonsets #6606

[BUG] Longhorn Helm Chart doesn't have tolerations for CSI plugin and Engine image daemonsets #6606

Comments

KostLinux commented Aug 29, 2023

Describe the bug (🐛 if you encounter this issue)

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

Additional context

innobead commented Aug 29, 2023

sybnex commented Jan 29, 2024

kreeger commented Sep 2, 2024 • edited Loading

kreeger commented Sep 2, 2024 • edited Loading

KostLinux commented Sep 9, 2024

opethema commented Jan 6, 2025

kreeger commented Sep 2, 2024 •

edited

Loading

kreeger commented Sep 2, 2024 •

edited

Loading