[FG:InPlacePodVerticalScaling] Pod CPU limit is not configured to cgroups as calculated #129357

hshiina · 2024-12-21T16:16:50Z

What happened?

As a result of #124216, which was introduced in v.1.32, a pod CPU limit calculated in ResourceConfigForPod() is rounded up to the nearest 10ms in libcontainer at resizing the pod:

Resize a pod:

$ kubectl patch pod resize-pod --subresource=resize --patch '{"spec":{"containers":[{"name":"resize-container", "resources":{"limits":{"cpu":"417m"}}}]}}'
pod/resize-pod patched

The container cgroup value is set with 1ms precision:

$ kubectl exec resize-pod -- cat /sys/fs/cgroup/cpu.max
41700 100000

The pod cgroup value is rounded up:

$ cat /sys/fs/cgroup/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod68a17b59_0d31_40b2_ba86_ea43f3b2f05c.slice/cpu.max
42000 100000

When systemd cgroup driver is used, libcontainer passes the CPU Quota to systemd with rounding up:

kubernetes/vendor/github.com/opencontainers/runc/libcontainer/cgroups/systemd/common.go

Lines 304 to 311 in a4b8a3b

    
           // systemd converts CPUQuotaPerSecUSec (microseconds per CPU second) to CPUQuota 
        
           // (integer percentage of CPU) internally.  This means that if a fractional percent of 
        
           // CPU is indicated by Resources.CpuQuota, we need to round up to the nearest 
        
           // 10ms (1% of a second) such that child cgroups can set the cpu.cfs_quota_us they expect. 
        
           cpuQuotaPerSecUSec = uint64(quota*1000000) / period 
        
           if cpuQuotaPerSecUSec%10000 != 0 { 
        
           	cpuQuotaPerSecUSec = ((cpuQuotaPerSecUSec / 10000) + 1) * 10000 
        
           }

In addition, there seems to be a race in libcontainer. It directly writes values to the cgroup file without roundup after it passes the rounded value to systemd:

kubernetes/vendor/github.com/opencontainers/runc/libcontainer/cgroups/systemd/v2.go

Lines 489 to 493 in a4b8a3b

    
           if err := setUnitProperties(m.dbus, getUnitName(m.cgroups), properties...); err != nil { 
        
           	return fmt.Errorf("unable to set unit properties: %w", err) 
        
           } 
        
           return m.fsMgr.Set(r)

So, there is also a case where the cgroup value is set as calculated. As far as I tried, decreasing CPU limits usually hits this case though I’m not sure why:

Decrease the CPU limits:

$ kubectl patch pod resize-pod --subresource=resize --patch '{"spec":{"containers":[{"name":"resize-container", "resources":{"limits":{"cpu":"365m"}}}]}}'
pod/resize-pod patched

Both the container and the pod cgroup values are set with 1ms precision:

$ kubectl exec resize-pod -- cat /sys/fs/cgroup/cpu.max
36500 100000
$ cat /sys/fs/cgroup/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod68a17b59_0d31_40b2_ba86_ea43f3b2f05c.slice/cpu.max
36500 100000

What did you expect to happen?

This tiny gap can be practically ignored. However, we might need to confirm this gap doesn’t cause a similar issue to #128769.

How can we reproduce it (as minimally and precisely as possible)?

Use systemd cgroup driver and enable InPlacePodVertialScaling.
Resize CPU limits of a pod with 1ms precision.

Anything else we need to know?

No response

Kubernetes version

V1.32

$ kubectl version
# paste output here
Client Version: v1.31.4
Kustomize Version: v5.4.2
Server Version: v1.32.0

Cloud provider

N/A

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

hshiina · 2024-12-21T16:17:09Z

/sig node

pacoxu · 2024-12-23T03:30:41Z

cc @iholder101 @tallclair

kannon92 · 2024-12-23T18:57:13Z

I am going to assume this is valid.

/triage accepted

hshiina added the kind/bug Categorizes issue or PR as related to a bug. label Dec 21, 2024

github-project-automation bot added this to SIG Node: In Place Pod Vertical Scaling Dec 21, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 21, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 21, 2024

github-project-automation bot added this to SIG Node Bugs Dec 21, 2024

github-project-automation bot moved this to Triage in SIG Node Bugs Dec 21, 2024

hshiina mentioned this issue Dec 21, 2024

[FG:InPlacePodVerticalScaling] e2e node: Verify pod cgroups in resize test #127192

Open

kannon92 moved this from Triage to Triaged in SIG Node Bugs Dec 23, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FG:InPlacePodVerticalScaling] Pod CPU limit is not configured to cgroups as calculated #129357

[FG:InPlacePodVerticalScaling] Pod CPU limit is not configured to cgroups as calculated #129357

hshiina commented Dec 21, 2024

hshiina commented Dec 21, 2024

pacoxu commented Dec 23, 2024

kannon92 commented Dec 23, 2024

[FG:InPlacePodVerticalScaling] Pod CPU limit is not configured to cgroups as calculated #129357

[FG:InPlacePodVerticalScaling] Pod CPU limit is not configured to cgroups as calculated #129357

Comments

hshiina commented Dec 21, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

hshiina commented Dec 21, 2024

pacoxu commented Dec 23, 2024

kannon92 commented Dec 23, 2024