[BUG] Volumes Stuck in Attach/Detach Loop when running on OpenShift/OKD #4988
Description
I see a few active Issues, relating too: "attaching/detaching loop with error", so this issue may be a duplicate, but I am not seeing the same errors in my logs.
Might be Related too: #2241, but it seems to be that iSCSI is installed and functional.
I don't necessarily think this is a Longhorn issue.
Describe the bug (🐛 if you encounter this issue)
A clear and concise description of what the bug is.
(block device frontend)
- Regular Attach Mode: Volumes Stuck in Attach/Detach Loop
- (All Volumes, new and existing)
- Maintenance Attach Mode: No Issues
ISCSI Front End Seems to Attach Fine Maintenance/Normal
To Reproduce
Upgraded from 4.11.0-0.okd-2022-11-19-050030 -> 4.11.0-0.okd-2022-12-03-100655
Or incept a cluster at or newer.
https://amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.11.0-0.okd-2022-12-02-145640
Install Longhorn, apply RBAC tweaks for OCP/OKD
Steps to reproduce the behavior:
- Attach New or Existing Volume
Log or Support bundle
Attached Support Bundle
longhorn-support-bundle_00e45f14-1554-4313-bb2b-27dbf74ab4c7_2022-12-03T18-25-20Z.zip
Logs Should be relatively Clean, "grafana" was the volume I was testing after rebooting everything.
Environment
- Longhorn version: v1.3.2
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Kubectl
- Note: RBAC Modifications for OpenShift/OKD: https://github.com/ArthurVardevanyan/HomeLab/tree/production/kubernetes/longhorn/components/okd/rbac.yaml
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: 4.11.0-0.okd-2022-12-03-100655 (v1.24.6+5658434)
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 6
- Node config
- OS type and version: Fedora CoreOS 36 (Fedora CoreOS 411.36.4)
- CPU per node: 4
- Memory per node: ~13GB
- Disk type(e.g. SSD/NVMe): NVME
- Network bandwidth between the nodes: 1Gigabit Across "Zones", NVME Disk Speed in Same "Zone"
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM
- Number of Longhorn volumes in the cluster: 50
Additional context
Previous Working Version:
https://amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.11.0-0.okd-2022-11-19-050030
I have also tried upgrading longhorn to the latest commit on the master branch:
This didn't help either
I was able to reproduce the issue on a new Sandbox Cluster.
It appears that:
- OKD 4.12.0-0.okd-2022-12-04-090858
- Kubernetes 1.25.2
- Fedora CoreOS 37.20221116.10
- Longhorn 1.4.0-dev
works fine, so that may be an alternative solution when 1.4 is released.
Metadata
Assignees
Labels
Type
Projects
Status
Resolved
Status
Closed