-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error setting cgroup config for procHooks process: ... cpu.weight: no such file or directory #1395
Comments
Hello, This issue looks interesting, as the cpu.weight in my environment with Flatcar
I'll try to reproduce your workflow to see if the issue reproduces on my local environment. |
Thanks for these instructions, super helpful. This is an insane issue that I can't figure out yet. The following ignition json reproduces it on 3850.1.0, but i can't figure out why:
|
I reproduced the full environment with rancher. Following up @jepio findings, I found by trial and error that if you stop the systemd unit To reproduce the issue, you can just enable and start the service
From https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=931243, it looks that if there s any real time priority process running, the cpu cgroup2 controller cannot be enabled.
|
I need to verify what scheduler config was in previous flatcar versions but the best workaround I can think of is to disable realtime priority for multipathd (assuming you depend on it) by adding a dropin:
|
This will likely be the fix: flatcar/scripts#1771 |
I cherry-picked the fix to all branches, it won't be part of this weeks release (#1391), only the one after. |
@jepio would it be possible to get it into this release? This prevented users of multipathd from upgrading and are missing the runc cve fix. This would be much appreciated. |
Unfortunately not - the release is already in progress and delayed a week from when it should have happened. Sorry that you're blocked from upgrading. You can apply the fix to your nodes manually before the upgrade, create:
|
Description
We're experiencing issues with Flatcar versions that run kernel version 6.x and docker 24.x (i.e. latest stable (3815.2.0) and latest beta (3850.1.0)). Containers fail to start because cgroup config can't be set (specifically
cpu.weight
). See following error:Context
Our setup is as follows:
Some things to mention:
Impact
Because of this issue Kubelet is not able to start containers on the node and the node becomes in a faulty state.
Environment and steps to reproduce
Our setup is as follows:
Task: After upgrading Flatcar 3602.2.3 to latest stable (3815.2.0) or latest beta (3850.1.0) the node becomes in a faulty state not able to schedule pods.
Action(s): See below:
Error: [describe the error that was triggered]
Start Rancher
https://rancher
. You will later need to add an entry to/etc/hosts
to point rancher to the IP address of the container running rancher.RKE1 vs RKE2/K3s
box so RKE1 is createdCluster Management
-> ClickCreate
Custom
(Use existing nodes and create a cluster using RKE)Next
Done
rancher_agent.service
with the command in the previous step. And use thatconfig.json
to start a flatcar node.:I'm using qemu to start the node:
./flatcar_production_qemu.sh -i config.json -nographic -smp 4 -m 4096
When the node booted alter
/etc/hosts
so the url provided in step 3 resolves correctly.This part takes a while (depending on the resources you allocated to the machine). You can see the progress of bootstrapping by running:
journalctl -u rancher_agent.service -f
and later by looking at the docker logs of the rancher_agent container on the flatcar node. And the docker logs of the rancher container.After a while you should have a functioning one node cluster. (Note that due to - I think - resource issues in my case the node became in active state, but there was an issue preventing it to be 100% usable). The next couple of steps are therefor not tested locally but are tested in our environment.
Verify that pods can be scheduled and you don't see messages like this:
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod23bac652_98ba_4c80_a7a6_3e979420831f.slice/docker-47487ca594daa6ae9887f4781a115fc886a392efe4d83876bd95f8c4d8b6ca01.scope/cpu.weight: no such file or directory: unknown\"" pod="somenamespace/k8s-event-logger-bb99597b5-5cl2g" podUID=23bac652-98ba-4c80-a7a6-3e979420831f
Upgrade the flatcar node. If you point /etc/flatcar/update.conf to
group=beta
and run the following command it will start the upgrade:update_engine_client -update
.Reboot the node
When the node is rebooted you will start to see the errors. You can verify
cpu.weight
is not present in any of the slices under/sys/fs/cgroup/*
and containers won't start.Not sure this is related but as of linux kernel 6.6
CFS
seems to be replaced byEEVDF
. See hereThe text was updated successfully, but these errors were encountered: