Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error setting cgroup config for procHooks process: ... cpu.weight: no such file or directory #1395

Closed
tijmenvandenbrink opened this issue Mar 18, 2024 · 8 comments
Labels
kind/bug Something isn't working

Comments

@tijmenvandenbrink
Copy link

Description

We're experiencing issues with Flatcar versions that run kernel version 6.x and docker 24.x (i.e. latest stable (3815.2.0) and latest beta (3850.1.0)). Containers fail to start because cgroup config can't be set (specifically cpu.weight). See following error:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod23bac652_98ba_4c80_a7a6_3e979420831f.slice/docker-47487ca594daa6ae9887f4781a115fc886a392efe4d83876bd95f8c4d8b6ca01.scope/cpu.weight: no such file or directory: unknown\"" pod="somenamespace/k8s-event-logger-bb99597b5-5cl2g" podUID=23bac652-98ba-4c80-a7a6-3e979420831f

Context

Our setup is as follows:

  • Rancher 2.8.2
  • RKE 1.5.5
  • Kubernetes 1.26 / 1.27
  • Flatcar 3602.2.3 trying to upgrade to latest stable (3815.2.0) or latest beta (3850.1.0)

Some things to mention:

  • Rancher uses cri-dockerd in their kubelet container to communicate with docker.
  • We verified we're using cgroup2

Impact

Because of this issue Kubelet is not able to start containers on the node and the node becomes in a faulty state.

Environment and steps to reproduce

  1. Set-up: [ describe the environment Flatcar/Lokomotive/Nebraska etc was running in when encountering the bug; Platform etc. ]

Our setup is as follows:

  • Rancher 2.8.2
  • RKE 1.5.5
  • Kubernetes 1.26 and 1.27
  • Flatcar 3850.1.0 (Beta)
  1. Task: After upgrading Flatcar 3602.2.3 to latest stable (3815.2.0) or latest beta (3850.1.0) the node becomes in a faulty state not able to schedule pods.

  2. Action(s): See below:

  3. Error: [describe the error that was triggered]

  4. Start Rancher

$ sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher
  1. Get the password
docker logs <container> 2>&1 | grep "Bootstrap Password:"
  1. Login to the UI and provide the URL you want to have rancher listen on. I used: https://rancher. You will later need to add an entry to /etc/hosts to point rancher to the IP address of the container running rancher.
  2. Tick the RKE1 vs RKE2/K3s box so RKE1 is created
  3. Go to Cluster Management -> Click Create
  4. Select Custom (Use existing nodes and create a cluster using RKE)
  5. Provide a name and click Next
  6. Tick all Node Roles (etcd, controlplane, worker)
  7. Copy the generated command which you'll need later on in the ignition config
  8. Click Done
sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.8.2 --server https://rancher --token n8pvx9t9qwfnjdg7jxhzl2m4qcbxqrjrs2kt2m4zgb4768kh8sfs49 --ca-checksum 83e459403bf410e1bebe2eb69d0d9abdae75b8253de8de86b5b6b6e3f632566a --etcd --controlplane --worker
  1. Change the rancher_agent.service with the command in the previous step. And use that config.json to start a flatcar node.:
{
    "ignition": {
        "config": {},
        "security": {
            "tls": {}
        },
        "timeouts": {},
        "version": "2.3.0"
    },
    "passwd": {
        "users": [
            {
                "name": "core",
                "passwordHash": "provide-some-password-hash"
            },
            {
                "groups": [
                    "wheel",
                    "docker",
                    "sudo"
                ],
                "homeDir": "/home/rke",
                "name": "rke",
                "sshAuthorizedKeys": [],
                "shell": "/bin/bash"
            }
        ]
    },
    "storage": {
        "directories": [
            {
                "filesystem": "root",
                "path": "/etc/systemd/system/docker.service.d",
                "mode": 493
            },
            {
                "filesystem": "root",
                "path": "/etc/modprobe.d",
                "mode": 493
            }
        ],
        "files": [
            {
                "filesystem": "oem",
                "path": "/grub.cfg",
                "contents": {
                    "source": "data:,set%20oem_id%3D%22vmware%22%0Aset%20linux_append%3D%22%22%0A",
                    "verification": {}
                },
                "mode": 420
            },
            {
                "filesystem": "root",
                "path": "/etc/hostname",
                "contents": {
                    "source": "data:,node-01",
                    "verification": {}
                },
                "mode": 420
            },
            {
                "filesystem": "root",
                "path": "/etc/docker/daemon.json",
                "contents": {
                    "source": "data:,%7B%0A%20%20%22log-driver%22:%20%22json-file%22,%0A%20%20%22log-opts%22:%20%7B%0A%20%20%20%20%22max-size%22:%20%2210m%22,%0A%20%20%20%20%22max-file%22:%20%223%22%0A%20%20%7D,%0A%20%20%22default-ulimits%22:%20%7B%0A%20%20%20%20%22nofile%22:%20%7B%0A%20%20%20%20%20%20%22Name%22:%20%22nofile%22,%0A%20%20%20%20%20%20%22Hard%22:%2065536,%0A%20%20%20%20%20%20%22Soft%22:%2065536%0A%20%20%20%20%7D,%0A%20%20%20%20%22nproc%22:%20%7B%0A%20%20%20%20%20%20%22Name%22:%20%22nproc%22,%0A%20%20%20%20%20%20%22Hard%22:%204096,%0A%20%20%20%20%20%20%22Soft%22:%204096%0A%20%20%20%20%7D%0A%20%20%7D,%0A%20%20%22bip%22:%20%22172.31.0.1/16%22%0A%7D",
                    "verification": {}
                },
                "mode": 420
            },
            {
                "filesystem": "root",
                "path": "/etc/sysctl.d/10-ipv6-disable.conf",
                "contents": {
                    "source": "data:,net.ipv6.conf.all.disable_ipv6%20%3D%201%0Anet.ipv6.conf.default.disable_ipv6%20%3D%201%0Anet.ipv6.conf.lo.disable_ipv6%20%3D%201%0A",
                    "verification": {}
                },
                "mode": 416
            },
            {
                "filesystem": "root",
                "path": "/boot/flatcar/hardening.sh",
                "contents": {
                    "source": "data:,%23!%2Fbin%2Fsh%0A%0A%23%20etcd%20user%2Fgroup%20fix%0Agrep%20-q%20%22%5Eetcd%3A%22%20%2Fetc%2Fgroup%20%7C%7C%20echo%20%22etcd%3Ax%3A52034%3A%22%20%3E%3E%20%2Fetc%2Fgroup%0Agrep%20-q%20%22%5Eetcd%3A%22%20%2Fetc%2Fpasswd%20%7C%7C%20echo%20%22etcd%3Ax%3A52034%3A52034%3A%3A%2Fdev%2Fnull%3A%2Fsbin%2Fnologin%22%20%3E%3E%20%2Fetc%2Fpasswd%0A%5B%5B%20-d%20%2Fopt%2Frke%2Fvar%2Flib%2Fetcd%20%5D%5D%20%26%26%20chown%20-R%20etcd%3Aetcd%20%2Fopt%2Frke%2Fvar%2Flib%2Fetcd%0Aexit%200%0A",
                    "verification": {}
                },
                "mode": 448
            },
            {
                "filesystem": "root",
                "path": "/etc/systemd/system/docker.service.d/override.conf",
                "contents": {
                    "source": "data:,%5BService%5D%0AEnvironment%3DTORCX_IMAGEDIR%3D%2Fdocker%20DOCKER_SELINUX%3D--selinux-enabled%3Dfalse%0AExecStartPre%3D%2Fbin%2Fbash%20-c%20'%2Fusr%2Fbin%2Fecho%20N%20%3E%20%2Fsys%2Fmodule%2Foverlay%2Fparameters%2Fredirect_dir'%0AExecStartPre%3D%2Fbin%2Fbash%20-c%20'%2Fusr%2Fbin%2Fecho%20N%20%3E%20%2Fsys%2Fmodule%2Foverlay%2Fparameters%2Fmetacopy'%0A",
                    "verification": {}
                },
                "mode": 420
            },
            {
                "filesystem": "root",
                "path": "/etc/profile.d/auto-logout.sh",
                "contents": {
                    "source": "data:,export%20TMOUT%3D600%0A",
                    "verification": {}
                },
                "mode": 420
            },
            {
                "filesystem": "root",
                "path": "/etc/multipath.conf",
                "contents": {
                    "source": "data:,defaults%20%7B%0A%20%20user_friendly_names%20yes%0A%20%20find_multipaths%20no%0A%7D%0A",
                    "verification": {}
                },
                "mode": 420
            },
            {
                "filesystem": "root",
                "path": "/etc/modprobe.d/disable_overlay_redirect_dir.conf",
                "contents": {
                    "source": "data:,options%20overlay%20redirect_dir%3Doff",
                    "verification": {}
                },
                "mode": 420
            },
            {
                "filesystem": "root",
                "path": "/etc/coreos/update.conf",
                "contents": {
                    "source": "data:,GROUP%3Dbeta%0A",
                    "verification": {}
                },
                "mode": 420
            }
        ],
        "filesystems": [
            {
                "mount": {
                    "device": "/dev/disk/by-label/OEM",
                    "format": "ext4",
                    "label": "OEM"
                },
                "name": "oem"
            }
        ],
        "links": [
            {
                "filesystem": "root",
                "path": "/etc/localtime",
                "target": "/usr/share/zoneinfo/Europe/Amsterdam"
            },
            {
                "filesystem": "root",
                "path": "/etc/systemd/system/multi-user.target.wants/docker.service",
                "target": "/run/systemd/system/docker.service"
            }
        ]
    },
    "systemd": {
        "units": [
            {
                "enabled": false,
                "name": "sshd.socket"
            },
            {
                "enabled": true,
                "name": "ntpd.service"
            },
            {
                "enabled": true,
                "name": "docker.service"
            },
            {
                "enabled": true,
                "name": "update-engine.service"
            },
            {
                "enabled": true,
                "name": "multipathd.service"
            },
            {
                "enabled": true,
                "name": "iscsid.service"
            },
            {
                "mask": true,
                "name": "locksmithd.service"
            },
            {
                "contents": "[Unit]\nDescription=Start Rancher Agent\nConditionPathExists=!/boot/flatcar/rancher_agent_firstboot\nAfter=network.target\n\n[Service]\nType=simple\nExecStart=/usr/bin/docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.8.2 --server https://rancher --token lmn9cjlq5g7tl5c6rr5pwk6rhtsmbxhk99gphkf9q7vh7nf5s2spv7 --ca-checksum 85e6882c37f6bbc4994c9d523231fe59d19b48adbe99b802919b8a67efddbf51 --etcd --controlplane --worker \nRemainAfterExit=true\nExecStartPost=touch /boot/flatcar/rancher_agent_firstboot\nRestart=on-failure\nRestartSec=30\n\n[Install]\nWantedBy=multi-user.target\n",
                "enabled": true,
                "name": "rancher_agent.service"
            },
            {
                "contents": "[Unit]\nDescription=Hardening flatcar\nAfter=rancher_agent.service\n\n[Service]\nType=simple\nExecStart=/boot/flatcar/hardening.sh\n\n[Install]\nWantedBy=multi-user.target\n",
                "enabled": true,
                "name": "hardening.service"
            }
        ]
    }
}
  1. I'm using qemu to start the node: ./flatcar_production_qemu.sh -i config.json -nographic  -smp 4 -m 4096

  2. When the node booted alter /etc/hosts so the url provided in step 3 resolves correctly.

  3. This part takes a while (depending on the resources you allocated to the machine). You can see the progress of bootstrapping by running:

    journalctl -u rancher_agent.service -f and later by looking at the docker logs of the rancher_agent container on the flatcar node. And the docker logs of the rancher container.

  4. After a while you should have a functioning one node cluster. (Note that due to - I think - resource issues in my case the node became in active state, but there was an issue preventing it to be 100% usable). The next couple of steps are therefor not tested locally but are tested in our environment.

  5. Verify that pods can be scheduled and you don't see messages like this: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod23bac652_98ba_4c80_a7a6_3e979420831f.slice/docker-47487ca594daa6ae9887f4781a115fc886a392efe4d83876bd95f8c4d8b6ca01.scope/cpu.weight: no such file or directory: unknown\"" pod="somenamespace/k8s-event-logger-bb99597b5-5cl2g" podUID=23bac652-98ba-4c80-a7a6-3e979420831f

  6. Upgrade the flatcar node. If you point /etc/flatcar/update.conf to group=beta and run the following command it will start the upgrade: update_engine_client -update.

  7. Reboot the node

  8. When the node is rebooted you will start to see the errors. You can verify cpu.weight is not present in any of the slices under /sys/fs/cgroup/* and containers won't start.

Not sure this is related but as of linux kernel 6.6 CFS seems to be replaced by EEVDF. See here

Among the core changes introduced in this release, one of particular interest is the replacement of the CFS scheduler with the earliest eligible virtual deadline first (EEVDF) CPU scheduler. EEVDF is also a virtual-time scheduler, but in contrast to CFS, which only uses one weight parameter, it employs two parameters: relative deadline and weight (see [LWN article](https://lwn.net/Articles/925371/) for more details). EEVDF has a better-defined scheduling policy; it removes a lot of the CFS heuristics and results in fewer knobs. Even though this scheduler offers improved performance and fairness, rare performance regressions are expected with some adversarial workloads; efforts to address regressions are ongoing and will continue post-release. This kernel version also significantly improves the memory efficiency of the tracing subsystem, with eventfs now assigning inodes and dentries structures needed for tracepoints only when tracing is actually used.
@ader1990
Copy link

Hello,

This issue looks interesting, as the cpu.weight in my environment with Flatcar 3908 and 3850 on ARM64/AMD64 seems to be present.

 cat config |grep -i CONFIG_SCHED_
# CONFIG_SCHED_CORE is not set
CONFIG_SCHED_MM_CID=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SCHED_MC is not set
# CONFIG_SCHED_CLUSTER is not set
CONFIG_SCHED_SMT=y
CONFIG_SCHED_HRTICK=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
# CONFIG_SCHED_TRACER is not set
sh-5.2# ls -la /sys/fs/cgroup/*.slice/ |grep -i weight
-rw-r--r--.  1 root root 0 Mar 14 15:31 cpu.weight
-rw-r--r--.  1 root root 0 Mar 14 15:31 cpu.weight.nice
-rw-r--r--.  1 root root 0 Mar 14 15:31 io.bfq.weight
-rw-r--r--.  1 root root 0 Mar 14 15:30 cpu.weight
-rw-r--r--.  1 root root 0 Mar 14 15:31 cpu.weight.nice
-rw-r--r--.  1 root root 0 Mar 14 15:30 io.bfq.weight
-rw-r--r--.  1 root root 0 Mar 14 15:30 cpu.weight
-rw-r--r--.  1 root root 0 Mar 14 15:30 cpu.weight.nice
-rw-r--r--.  1 root root 0 Mar 14 15:30 io.bfq.weight
sh-5.2#
sh-5.2# uname -a
Linux sut01-altra 6.6.17-flatcar #1 SMP PREEMPT Thu Mar 14 13:20:23 -00 2024 aarch64 GNU/Linux
sh-5.2# cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3908.0.0+nightly-20240313-2100-50-g11449d2458
VERSION_ID=3908.0.0
BUILD_ID=nightly-20240313-2100-50-g11449d2458
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3908.0.0+nightly-20240313-2100-50-g11449d2458 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="arm64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3908.0.0+nightly-20240313-2100-50-g11449d2458:*:*:*:*:*:*:*"

I'll try to reproduce your workflow to see if the issue reproduces on my local environment.

@jepio
Copy link
Member

jepio commented Mar 18, 2024

Thanks for these instructions, super helpful.

This is an insane issue that I can't figure out yet. The following ignition json reproduces it on 3850.1.0, but i can't figure out why:

{
    "ignition": {
        "config": {},
        "security": {
            "tls": {}
        },
        "timeouts": {},
        "version": "2.3.0"
    },
    "systemd": {
        "units": [
            {
                "enabled": true,
                "name": "multipathd.service"
            },
            {
                "mask": true,
                "name": "locksmithd.service"
            }
        ]
    }
}

@ader1990
Copy link

ader1990 commented Mar 18, 2024

I reproduced the full environment with rancher. Following up @jepio findings, I found by trial and error that if you stop the systemd unit multipathd even for just a moment, everything works.

To reproduce the issue, you can just enable and start the service multipathd and run:

echo '+cpu' >> /sys/fs/cgroup/cgroup.subtree_control
-bash: echo: write error: Invalid argument

From https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=931243, it looks that if there s any real time priority process running, the cpu cgroup2 controller cannot be enabled.
I checked then multpathd process and voila, running with real time priority:

ps -eo command,rtprio|grep -i multipathd
/sbin/multipathd -d -s          99

@jepio
Copy link
Member

jepio commented Mar 18, 2024

I need to verify what scheduler config was in previous flatcar versions but the best workaround I can think of is to disable realtime priority for multipathd (assuming you depend on it) by adding a dropin:

# /etc/systemd/system/multipathd.service.d/override.conf
[Service]
RestrictRealtime=yes
Nice=-20

@jepio
Copy link
Member

jepio commented Mar 18, 2024

This will likely be the fix: flatcar/scripts#1771

@jepio
Copy link
Member

jepio commented Mar 19, 2024

I cherry-picked the fix to all branches, it won't be part of this weeks release (#1391), only the one after.

@jepio jepio closed this as completed Mar 19, 2024
@github-project-automation github-project-automation bot moved this from 📝 Needs Triage to Implemented in Flatcar tactical, release planning, and roadmap Mar 19, 2024
@tijmenvandenbrink
Copy link
Author

@jepio would it be possible to get it into this release? This prevented users of multipathd from upgrading and are missing the runc cve fix. This would be much appreciated.

@jepio
Copy link
Member

jepio commented Mar 20, 2024

Unfortunately not - the release is already in progress and delayed a week from when it should have happened.

Sorry that you're blocked from upgrading. You can apply the fix to your nodes manually before the upgrade, create:
/etc/systemd/system/multipathd.service.d/override.conf with the contents:

[Service]
RestrictRealtime=yes
Nice=-20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants