[BUG] Backing image manager fails when SELinux is enabled #6108
Closed
Description
Describe the bug (🐛 if you encounter this issue)
When SELinux is enabled, backing-image-manager pods end up in a crash loop.
To Reproduce
- Deploy an RKE2 cluster on Rocky 9.2. Do not disable SELinux (
getenforce
returns enforcing). - Run the Longhorn integration tests. (There is no doubt a much simpler recreate, but I haven't investigated this much yet.)
- Observe that instead of being marked as FAILED, tests are marked as ERROR. This indicates something broken in the cluster itself (not just a test case failure).
- Observe the below symptoms:
With kubectl
access:
[rocky@ip-10-0-1-71 ~]$ kubectl get pod -n longhorn-system | grep backing
backing-image-manager-1050-1130 0/1 Error 0 3s
backing-image-manager-1050-237b 0/1 Error 0 3s
backing-image-manager-1050-b9da 0/1 ContainerCreating 0 1s
[rocky@ip-10-0-1-71 ~]$ kubectl logs -n longhorn-system backing-image-manager-1050-1130
time="2023-06-12T21:55:45Z" level=fatal msg="Error running start command" error="cannot find disk config file /data/longhorn-disk.cfg: open /data/longhorn-disk.cfg: permission denied"
On a worker node:
[rocky@ip-10-0-2-113 ~]$ sudo ausearch -m AVC -ts recent
----
time->Mon Jun 12 21:56:24 2023
type=PROCTITLE msg=audit(1686606984.732:9363): proctitle=6261636B696E672D696D6167652D6D616E61676572002D2D6465627567006461656D6F6E002D2D6C697374656E00302E302E302E303A38303030002D2D73796E632D6C697374656E00302E302E302E303A38303031
type=SYSCALL msg=audit(1686606984.732:9363): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0001a07b0 a2=80000 a3=0 items=0 ppid=192769 pid=192885 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="backing-image-m" exe="/usr/local/bin/backing-image-manager" subj=system_u:system_r:container_t:s0:c576,c979 key=(null)
type=AVC msg=audit(1686606984.732:9363): avc: denied { read } for pid=192885 comm="backing-image-m" name="longhorn-disk.cfg" dev="xvda5" ino=134481951 scontext=system_u:system_r:container_t:s0:c576,c979 tcontext=system_u:object_r:container_var_lib_t:s0 tclass=file permissive=0
Expected behavior
Backing image manager works fine.
Log or Support bundle
https://ci.longhorn.io/job/private/job/longhorn-tests-regression/4086
Environment
- Longhorn version: v1.5.0-rc1
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.27.2+rke2
- Number of management node in the cluster: 1
- Number of worker node in the cluster: 3
- Node config
- OS type and version: Rocky v9.2
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): AWS
- Number of Longhorn volumes in the cluster:
Additional context
I THINK this is how https://ci.longhorn.io/job/private/job/longhorn-tests-regression/4086/console is failing / failed. I will try to confirm when it is complete and I have a support bundle.
Metadata
Assignees
Labels
Type
Projects
Status
Closed