Skip to content

[BUG] Backing image manager fails when SELinux is enabled #6108

Closed
@ejweber

Description

Describe the bug (🐛 if you encounter this issue)

When SELinux is enabled, backing-image-manager pods end up in a crash loop.

To Reproduce

  1. Deploy an RKE2 cluster on Rocky 9.2. Do not disable SELinux (getenforce returns enforcing).
  2. Run the Longhorn integration tests. (There is no doubt a much simpler recreate, but I haven't investigated this much yet.)
  3. Observe that instead of being marked as FAILED, tests are marked as ERROR. This indicates something broken in the cluster itself (not just a test case failure).
  4. Observe the below symptoms:

With kubectl access:

[rocky@ip-10-0-1-71 ~]$ kubectl get pod -n longhorn-system | grep backing
backing-image-manager-1050-1130                     0/1     Error               0          3s
backing-image-manager-1050-237b                     0/1     Error               0          3s
backing-image-manager-1050-b9da                     0/1     ContainerCreating   0          1s

[rocky@ip-10-0-1-71 ~]$ kubectl logs -n longhorn-system backing-image-manager-1050-1130
time="2023-06-12T21:55:45Z" level=fatal msg="Error running start command" error="cannot find disk config file /data/longhorn-disk.cfg: open /data/longhorn-disk.cfg: permission denied"

On a worker node:

[rocky@ip-10-0-2-113 ~]$ sudo ausearch -m AVC -ts recent
----
time->Mon Jun 12 21:56:24 2023
type=PROCTITLE msg=audit(1686606984.732:9363): proctitle=6261636B696E672D696D6167652D6D616E61676572002D2D6465627567006461656D6F6E002D2D6C697374656E00302E302E302E303A38303030002D2D73796E632D6C697374656E00302E302E302E303A38303031
type=SYSCALL msg=audit(1686606984.732:9363): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0001a07b0 a2=80000 a3=0 items=0 ppid=192769 pid=192885 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="backing-image-m" exe="/usr/local/bin/backing-image-manager" subj=system_u:system_r:container_t:s0:c576,c979 key=(null)
type=AVC msg=audit(1686606984.732:9363): avc:  denied  { read } for  pid=192885 comm="backing-image-m" name="longhorn-disk.cfg" dev="xvda5" ino=134481951 scontext=system_u:system_r:container_t:s0:c576,c979 tcontext=system_u:object_r:container_var_lib_t:s0 tclass=file permissive=0

Expected behavior

Backing image manager works fine.

Log or Support bundle

https://ci.longhorn.io/job/private/job/longhorn-tests-regression/4086

Environment

  • Longhorn version: v1.5.0-rc1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.27.2+rke2
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 3
  • Node config
    • OS type and version: Rocky v9.2
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): AWS
  • Number of Longhorn volumes in the cluster:

Additional context

I THINK this is how https://ci.longhorn.io/job/private/job/longhorn-tests-regression/4086/console is failing / failed. I will try to confirm when it is complete and I have a support bundle.

Metadata

Labels

Type

No type

Projects

  • Status

    Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions