Allow gathering of stats for root cgroup on v2 #2801

odinuge · 2021-02-07T16:14:29Z

Some of the root stats are used by kubernetes metrics endpoints, and recent kernel versions expose a lot of the stats on the root cgroup as well.

k8s-ci-robot · 2021-02-07T16:14:37Z

Hi @odinuge. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

iwankgb · 2021-02-08T07:40:08Z

/ok-to-test

iwankgb

LGTM

bobbypage · 2021-02-12T01:49:04Z

container/libcontainer/handler.go

 	if cgroups.IsCgroup2UnifiedMode() {
-		// On cgroup v2 there are no stats at the root cgroup
-		// so check whether it is the root cgroup
+		// On cgroup v2 the root cgroup stats have been introduced in recent kernel versions,


do you know from what kernel versions root cgroup stats are supported? That would be good to note.

I think there has been a floating transition to add various stats to the root cgroups in different versions. I am running mainline right now, and there are still some stats lacking right now as well.

I can do a small digging job, but I guess it will take some time (unless I find the data in a structured format somewhere)... :)

no biggie if it's hard to find, was just curious :)

Will take a look anyway! Always nice with some background info.

bobbypage · 2021-02-12T01:50:25Z

container/libcontainer/handler.go

-		}
+
+	cgroupStats, err := h.cgroupManager.GetStats()
+	if err != nil && !ignoreStatsError {


does it make sense to log something that we've ignored this error? otherwise we're kinda silencing this error

Ahh, yeah, that is smart!

Do you think we should use klog.V(4) to avoid log spam, or should we always display it?

In libcontainer, they already do this for rootless containers: https://github.com/opencontainers/runc/blob/6bae53f5894d9a9811239deb756406d9652843de/libcontainer/cgroups/fs2/fs2.go#L144-L147

I see, a klog.V(4) sounds reasonable to avoid spam.

Thanks! Will fix!

Added a V(4) now. Still not sure what to write in order to avoid avoid misunderstanding... 🤔

harche · 2021-02-17T06:35:07Z

@odinuge Thanks for this PR. I am looking into the failures of prow job pull-kubernetes-node-crio-cgrpv2-e2e which runs node e2e NodeConformance and NodeFeatures tests.

Currently on cgroup v2, I see failures in retrieving /stats/summary endpoint of the kubelet. Sample failure looks like, https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/99110/pull-kubernetes-node-crio-cgrpv2-e2e/1361561789094957056/

I was wondering if your PR is trying addresses the issues we are seeing /stats/summary test in cgroup v2.

harche · 2021-02-17T08:30:41Z

I ran Summary API test on cgroup v2 host using this change.

How to run the tests

[harshal@localhost kubernetes]$ git diff 
diff --git a/test/e2e/framework/test_context.go b/test/e2e/framework/test_context.go
index 474b61cf996..f395b95ade3 100644
--- a/test/e2e/framework/test_context.go
+++ b/test/e2e/framework/test_context.go
@@ -296,7 +296,7 @@ func RegisterCommonFlags(flags *flag.FlagSet) {
      flags.StringVar(&TestContext.ReportPrefix, "report-prefix", "", "Optional prefix for JUnit XML reports. Default is empty, which doesn't prepend anything to the default name.")
      flags.StringVar(&TestContext.ReportDir, "report-dir", "", "Path to the directory where the JUnit XML reports should be saved. Default is empty, which doesn't generate these reports.")
      flags.Var(cliflag.NewMapStringBool(&TestContext.FeatureGates), "feature-gates", "A set of key=value pairs that describe feature gates for alpha/experimental features.")
-       flags.StringVar(&TestContext.ContainerRuntime, "container-runtime", "docker", "The container runtime of cluster VM instances (docker/remote).")
+       flags.StringVar(&TestContext.ContainerRuntime, "container-runtime", "remote", "The container runtime of cluster VM instances (docker/remote).")
      flags.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "unix:///var/run/dockershim.sock", "The container runtime endpoint of cluster VM instances.")
      flags.StringVar(&TestContext.ContainerRuntimeProcessName, "container-runtime-process-name", "dockerd", "The name of the container runtime process.")
      flags.StringVar(&TestContext.ContainerRuntimePidFile, "container-runtime-pid-file", "/var/run/docker.pid", "The pid file of the container runtime.")

[harshal@localhost kubernetes]$ export KUBE_SSH_USER=core && make test-e2e-node REMOTE=true RUNTIME=remote IMAGE_CONFIG_FILE="/home/harshal/Downloads/image-config-cgrpv2.yaml" INSTANCE_PREFIX="harpatil" CLEANUP=false CONTAINER_RUNTIME_ENDPOINT="unix:///var/run/crio/crio.sock" FOCUS="Summary API" SKIP="\[Flaky\]|\[Slow\]|\[Serial\]" TEST_ARGS='--kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --non-masquerade-cidr=0.0.0.0/0" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"'

Without this change the failure output looks like,

Failed /stats/summary without this change

Failure [114.390 seconds]
[k8s.io] Summary API [NodeConformance]
/home/harshal/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:635
  when querying /stats/summary
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/summary_test.go:43
    should report resource usage through the stats api [It]
    _output/local/go/src/k8s.io/kubernetes/test/e2e_node/summary_test.go:54

    Timed out after 90.000s.
    Expected
        <string>: Summary
    to match fields: {
    [[.Node.SystemContainers[kubelet].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc00133a978>: 3848781824
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	, [.Node.SystemContainers[runtime].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287254964Z, .Node.SystemContainers[runtime].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc00133aa10>: 3760959488
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	], [.Node.SystemContainers[pods].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287232835Z, .Node.SystemContainers[pods].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	.MajorPageFaults:
    		Expected
    		    <uint64>: 15
    		to be <=
    		    <int>: 10
    	}
    	]], .Node.CPU:
    	Expected
    	    <string>: CPUStats
    	to match fields: {
    	.UsageNanoCores:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <float64>: 100000
    	.UsageCoreNanoSeconds:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <float64>: 1e+09
    	}
    	, .Node.Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.UsageBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.WorkingSetBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	.PageFaults:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int>: 1000
    	}
    	]
    [[[.Pods[summary-test-8220::stats-busybox-1].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287296172Z, .Pods[summary-test-8220::stats-busybox-1].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850932224
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-8220::stats-busybox-1].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], [[.Pods[summary-test-8220::stats-busybox-0].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287296172Z, .Pods[summary-test-8220::stats-busybox-0].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850903552
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-8220::stats-busybox-0].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	]]
    }

And with this change the failure output looks like,

Failed /stats/summary with this change

   Timed out after 90.000s.
    Expected
        <string>: Summary
    to match fields: {
    [[[.Node.SystemContainers[pods].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688369582Z, .Node.SystemContainers[pods].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Node.SystemContainers[kubelet].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc000b29ef8>: 3849154560
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	, [.Node.SystemContainers[runtime].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688392137Z, .Node.SystemContainers[runtime].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc000b72230>: 3739369472
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	]], .Node.Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.UsageBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.WorkingSetBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	]
    [[[.Pods[summary-test-2584::stats-busybox-0].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688442456Z, .Pods[summary-test-2584::stats-busybox-0].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850919936
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-2584::stats-busybox-0].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], [[.Pods[summary-test-2584::stats-busybox-1].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688442456Z, .Pods[summary-test-2584::stats-busybox-1].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850899456
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-2584::stats-busybox-1].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	]]
    }

Notably, with this change we no longer see the failure in getting Node.CPU stats, but we still see failures in getting Node.Memory

odinuge · 2021-02-17T12:24:37Z

I was wondering if your PR is trying addresses the issues we are seeing /stats/summary test in cgroup v2.

I am trying to address general issues with cgroup v2, so I guess a side effect is that some tests in k8s will start working for cgroup v2. 😄

Notably, with this change we no longer see the failure in getting Node.CPU stats, but we still see failures in getting Node.Memory

Yeah, that is true. cgroup v2 do not have all stats available for the root cgroup, so this is "the best" we can do.

I think this is a broader discussion about what we should do with stuff supported in cgroup v1 and not in v2. We can say "this is the best we can do", or we can gather the stats via /proc/ ourselves. Not sure what the best solution is. We can implement all/most stats for root cgroup in v2 in the kernel, but it will still take ages before users actually will start running such a kernel version... Any thoughts?

iwankgb · 2021-02-17T15:08:14Z

I opt for relying on cgroups rather than procfs. Interested users will update eventually and silent majority will be using v1 for a while (this is just an assumption and I might be wrong).

harche · 2021-02-18T15:53:16Z

I think we should not break the existing API. The /stats/summary of kubelet API is expected to return certain information, and if we think we now have to change it then we should not do that without going through required approvals and version change.

So if cgroups v2 do not have those CPU stats for root cgroup, then either cadvisor or kubelet will still have to fetch that information from /proc, if it takes that.

iwankgb · 2021-02-18T22:11:30Z

This is a valid point but I'm not sure if we can get same values from procfs.

bobbypage · 2021-02-19T00:14:14Z

I think it may be worth opening separate issue on k/k regarding summary API and cgroup v2 :)

@harche Would be interested to understand what specific things are missing on summary API when using cAdvisor on cgroup v2. Definitely agree we need to aim to not break the data returned in summary API on cgroup v2.

harche · 2021-02-19T08:57:15Z

@bobbypage I created an issue on k/k to track the failures we are seeing in NodeConformance and NodeFeatures tests, kubernetes/kubernetes#99230

The issue links to the failed job run, which shows the missing fields in the failed tests' output.

harche · 2021-02-19T09:51:25Z

I was wondering if your PR is trying addresses the issues we are seeing /stats/summary test in cgroup v2.

I am trying to address general issues with cgroup v2, so I guess a side effect is that some tests in k8s will start working for cgroup v2.

Notably, with this change we no longer see the failure in getting Node.CPU stats, but we still see failures in getting Node.Memory

Yeah, that is true. cgroup v2 do not have all stats available for the root cgroup, so this is "the best" we can do.

I think this is a broader discussion about what we should do with stuff supported in cgroup v1 and not in v2. We can say "this is the best we can do", or we can gather the stats via /proc/ ourselves. Not sure what the best solution is. We can implement all/most stats for root cgroup in v2 in the kernel, but it will still take ages before users actually will start running such a kernel version... Any thoughts?

@odinuge I came to know from @giuseppe that on newer kernel versions, 5.10 and above, we have /sys/fs/cgroup/cpu.stat file that can be used for this purpose.

bobbypage · 2021-03-02T23:18:48Z

LGTM, thanks!

…oup on v2

移植upstream对kubelet及cadvisor的修改，修复使用cgroupv2时指标收集统计的问题 1. port cadvisor pr google/cadvisor#2839 reading cpu stats on cgroup v2 2. port cadvisor pr google/cadvisor#2837 read "max" value for cgroup v2 3. port cadvisor pr google/cadvisor#2801 gathering of stats for root cgroup on v2 4. port cadvisor pr google/cadvisor#2800: Update heuristic for container creation time 5. Fix cgroup handling for systemd with cgroup v2 6. test: adjust summary test for cgroup v2

google-cla bot added the cla: yes label Feb 7, 2021

k8s-ci-robot added the needs-ok-to-test label Feb 7, 2021

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Feb 8, 2021

iwankgb approved these changes Feb 10, 2021

View reviewed changes

bobbypage reviewed Feb 12, 2021

View reviewed changes

harche mentioned this pull request Feb 17, 2021

Update heuristic for container creation time #2800

Merged

bobbypage self-assigned this Feb 17, 2021

harche mentioned this pull request Feb 19, 2021

Failing kubelet /stats/summary API and ResourceMetricsAPI tests on cgroup v2 kubernetes/kubernetes#99230

Closed

odinuge force-pushed the root_cgroup_stats_v2 branch from 823e8c2 to 748deaa Compare February 22, 2021 07:54

Allow gathering of stats for root cgroup on v2

6ba10a3

odinuge force-pushed the root_cgroup_stats_v2 branch from 748deaa to 6ba10a3 Compare February 22, 2021 07:56

bobbypage approved these changes Mar 2, 2021

View reviewed changes

bobbypage merged commit a71b567 into google:master Mar 2, 2021

This was referenced Sep 23, 2021

ci-crio-cgroupv2-node-e2e-conformance is failing kubernetes/kubernetes#104567

Closed

[Failing test] [sig-node] Summary API and ResourceMetricsAPI tests are failing on cgroupv2 containerd job kubernetes/kubernetes#103759

Closed

chenchun pushed a commit to chenchun/kubernetes that referenced this pull request Mar 20, 2024

port cadvisor pr google/cadvisor#2801 gathering of stats for root cgr…

cfde24a

…oup on v2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow gathering of stats for root cgroup on v2 #2801

Allow gathering of stats for root cgroup on v2 #2801

odinuge commented Feb 7, 2021

k8s-ci-robot commented Feb 7, 2021

iwankgb commented Feb 8, 2021

iwankgb left a comment

bobbypage Feb 12, 2021

odinuge Feb 12, 2021

bobbypage Feb 12, 2021

odinuge Feb 17, 2021

bobbypage Feb 12, 2021 •

edited

Loading

odinuge Feb 12, 2021

odinuge Feb 12, 2021

bobbypage Feb 12, 2021

odinuge Feb 17, 2021

odinuge Feb 22, 2021

harche commented Feb 17, 2021

harche commented Feb 17, 2021

odinuge commented Feb 17, 2021

iwankgb commented Feb 17, 2021

harche commented Feb 18, 2021 •

edited

Loading

iwankgb commented Feb 18, 2021

bobbypage commented Feb 19, 2021 •

edited

Loading

harche commented Feb 19, 2021

harche commented Feb 19, 2021

bobbypage commented Mar 2, 2021

Allow gathering of stats for root cgroup on v2 #2801

Allow gathering of stats for root cgroup on v2 #2801

Conversation

odinuge commented Feb 7, 2021

k8s-ci-robot commented Feb 7, 2021

iwankgb commented Feb 8, 2021

iwankgb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobbypage Feb 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harche commented Feb 17, 2021

harche commented Feb 17, 2021

odinuge commented Feb 17, 2021

iwankgb commented Feb 17, 2021

harche commented Feb 18, 2021 • edited Loading

iwankgb commented Feb 18, 2021

bobbypage commented Feb 19, 2021 • edited Loading

harche commented Feb 19, 2021

harche commented Feb 19, 2021

bobbypage commented Mar 2, 2021

bobbypage Feb 12, 2021 •

edited

Loading

harche commented Feb 18, 2021 •

edited

Loading

bobbypage commented Feb 19, 2021 •

edited

Loading