Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow gathering of stats for root cgroup on v2 #2801

Merged
merged 1 commit into from
Mar 2, 2021

Conversation

odinuge
Copy link
Contributor

@odinuge odinuge commented Feb 7, 2021

Some of the root stats are used by kubernetes metrics endpoints, and recent kernel versions expose a lot of the stats on the root cgroup as well.

@k8s-ci-robot
Copy link
Collaborator

Hi @odinuge. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@iwankgb
Copy link
Collaborator

iwankgb commented Feb 8, 2021

/ok-to-test

Copy link
Collaborator

@iwankgb iwankgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if cgroups.IsCgroup2UnifiedMode() {
// On cgroup v2 there are no stats at the root cgroup
// so check whether it is the root cgroup
// On cgroup v2 the root cgroup stats have been introduced in recent kernel versions,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know from what kernel versions root cgroup stats are supported? That would be good to note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there has been a floating transition to add various stats to the root cgroups in different versions. I am running mainline right now, and there are still some stats lacking right now as well.

I can do a small digging job, but I guess it will take some time (unless I find the data in a structured format somewhere)... :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no biggie if it's hard to find, was just curious :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look anyway! Always nice with some background info.

}

cgroupStats, err := h.cgroupManager.GetStats()
if err != nil && !ignoreStatsError {
Copy link
Collaborator

@bobbypage bobbypage Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to log something that we've ignored this error? otherwise we're kinda silencing this error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yeah, that is smart!

Do you think we should use klog.V(4) to avoid log spam, or should we always display it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, a klog.V(4) sounds reasonable to avoid spam.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will fix!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a V(4) now. Still not sure what to write in order to avoid avoid misunderstanding... 🤔

@harche
Copy link
Contributor

harche commented Feb 17, 2021

@odinuge Thanks for this PR. I am looking into the failures of prow job pull-kubernetes-node-crio-cgrpv2-e2e which runs node e2e NodeConformance and NodeFeatures tests.

Currently on cgroup v2, I see failures in retrieving /stats/summary endpoint of the kubelet. Sample failure looks like, https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/99110/pull-kubernetes-node-crio-cgrpv2-e2e/1361561789094957056/

I was wondering if your PR is trying addresses the issues we are seeing /stats/summary test in cgroup v2.

@harche
Copy link
Contributor

harche commented Feb 17, 2021

I ran Summary API test on cgroup v2 host using this change.

How to run the tests
[harshal@localhost kubernetes]$ git diff 
diff --git a/test/e2e/framework/test_context.go b/test/e2e/framework/test_context.go
index 474b61cf996..f395b95ade3 100644
--- a/test/e2e/framework/test_context.go
+++ b/test/e2e/framework/test_context.go
@@ -296,7 +296,7 @@ func RegisterCommonFlags(flags *flag.FlagSet) {
      flags.StringVar(&TestContext.ReportPrefix, "report-prefix", "", "Optional prefix for JUnit XML reports. Default is empty, which doesn't prepend anything to the default name.")
      flags.StringVar(&TestContext.ReportDir, "report-dir", "", "Path to the directory where the JUnit XML reports should be saved. Default is empty, which doesn't generate these reports.")
      flags.Var(cliflag.NewMapStringBool(&TestContext.FeatureGates), "feature-gates", "A set of key=value pairs that describe feature gates for alpha/experimental features.")
-       flags.StringVar(&TestContext.ContainerRuntime, "container-runtime", "docker", "The container runtime of cluster VM instances (docker/remote).")
+       flags.StringVar(&TestContext.ContainerRuntime, "container-runtime", "remote", "The container runtime of cluster VM instances (docker/remote).")
      flags.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "unix:///var/run/dockershim.sock", "The container runtime endpoint of cluster VM instances.")
      flags.StringVar(&TestContext.ContainerRuntimeProcessName, "container-runtime-process-name", "dockerd", "The name of the container runtime process.")
      flags.StringVar(&TestContext.ContainerRuntimePidFile, "container-runtime-pid-file", "/var/run/docker.pid", "The pid file of the container runtime.")

[harshal@localhost kubernetes]$ export KUBE_SSH_USER=core && make test-e2e-node REMOTE=true RUNTIME=remote IMAGE_CONFIG_FILE="/home/harshal/Downloads/image-config-cgrpv2.yaml" INSTANCE_PREFIX="harpatil" CLEANUP=false CONTAINER_RUNTIME_ENDPOINT="unix:///var/run/crio/crio.sock" FOCUS="Summary API" SKIP="\[Flaky\]|\[Slow\]|\[Serial\]" TEST_ARGS='--kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --non-masquerade-cidr=0.0.0.0/0" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"'

Without this change the failure output looks like,

Failed /stats/summary without this change
Failure [114.390 seconds]
[k8s.io] Summary API [NodeConformance]
/home/harshal/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:635
  when querying /stats/summary
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/summary_test.go:43
    should report resource usage through the stats api [It]
    _output/local/go/src/k8s.io/kubernetes/test/e2e_node/summary_test.go:54

    Timed out after 90.000s.
    Expected
        <string>: Summary
    to match fields: {
    [[.Node.SystemContainers[kubelet].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc00133a978>: 3848781824
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	, [.Node.SystemContainers[runtime].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287254964Z, .Node.SystemContainers[runtime].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc00133aa10>: 3760959488
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	], [.Node.SystemContainers[pods].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287232835Z, .Node.SystemContainers[pods].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	.MajorPageFaults:
    		Expected
    		    <uint64>: 15
    		to be <=
    		    <int>: 10
    	}
    	]], .Node.CPU:
    	Expected
    	    <string>: CPUStats
    	to match fields: {
    	.UsageNanoCores:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <float64>: 100000
    	.UsageCoreNanoSeconds:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <float64>: 1e+09
    	}
    	, .Node.Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.UsageBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.WorkingSetBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	.PageFaults:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int>: 1000
    	}
    	]
    [[[.Pods[summary-test-8220::stats-busybox-1].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287296172Z, .Pods[summary-test-8220::stats-busybox-1].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850932224
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-8220::stats-busybox-1].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], [[.Pods[summary-test-8220::stats-busybox-0].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:59:47.287296172Z, .Pods[summary-test-8220::stats-busybox-0].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850903552
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-8220::stats-busybox-0].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	]]
    }

And with this change the failure output looks like,

Failed /stats/summary with this change
   Timed out after 90.000s.
    Expected
        <string>: Summary
    to match fields: {
    [[[.Node.SystemContainers[pods].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688369582Z, .Node.SystemContainers[pods].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Node.SystemContainers[kubelet].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc000b29ef8>: 3849154560
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	, [.Node.SystemContainers[runtime].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688392137Z, .Node.SystemContainers[runtime].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <*uint64 | 0xc000b72230>: 3739369472
    		to be nil
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000000
    	}
    	]], .Node.Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.UsageBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.WorkingSetBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 10000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	]
    [[[.Pods[summary-test-2584::stats-busybox-0].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688442456Z, .Pods[summary-test-2584::stats-busybox-0].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850919936
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-2584::stats-busybox-0].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], [[.Pods[summary-test-2584::stats-busybox-1].Containers[busybox-container].StartTime:
    	Expected
    	    <time.Time>: 0001-01-01T00:00:00Z
    	to be >=
    	    <time.Time>: 2020-02-18T07:53:03.688442456Z, .Pods[summary-test-2584::stats-busybox-1].Containers[busybox-container].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.AvailableBytes:
    		Expected
    		    <uint64>: 3850899456
    		to be <=
    		    <int64>: 80000000
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	], .Pods[summary-test-2584::stats-busybox-1].Memory:
    	Expected
    	    <string>: MemoryStats
    	to match fields: {
    	.RSSBytes:
    		Expected
    		    <uint64>: 0
    		to be >=
    		    <int64>: 1000
    	}
    	]]
    }

Notably, with this change we no longer see the failure in getting Node.CPU stats, but we still see failures in getting Node.Memory

@odinuge
Copy link
Contributor Author

odinuge commented Feb 17, 2021

I was wondering if your PR is trying addresses the issues we are seeing /stats/summary test in cgroup v2.

I am trying to address general issues with cgroup v2, so I guess a side effect is that some tests in k8s will start working for cgroup v2. 😄

Notably, with this change we no longer see the failure in getting Node.CPU stats, but we still see failures in getting Node.Memory

Yeah, that is true. cgroup v2 do not have all stats available for the root cgroup, so this is "the best" we can do.

I think this is a broader discussion about what we should do with stuff supported in cgroup v1 and not in v2. We can say "this is the best we can do", or we can gather the stats via /proc/ ourselves. Not sure what the best solution is. We can implement all/most stats for root cgroup in v2 in the kernel, but it will still take ages before users actually will start running such a kernel version... Any thoughts?

@iwankgb
Copy link
Collaborator

iwankgb commented Feb 17, 2021

I opt for relying on cgroups rather than procfs. Interested users will update eventually and silent majority will be using v1 for a while (this is just an assumption and I might be wrong).

@harche
Copy link
Contributor

harche commented Feb 18, 2021

I think we should not break the existing API. The /stats/summary of kubelet API is expected to return certain information, and if we think we now have to change it then we should not do that without going through required approvals and version change.

So if cgroups v2 do not have those CPU stats for root cgroup, then either cadvisor or kubelet will still have to fetch that information from /proc, if it takes that.

@iwankgb
Copy link
Collaborator

iwankgb commented Feb 18, 2021

This is a valid point but I'm not sure if we can get same values from procfs.

@bobbypage
Copy link
Collaborator

bobbypage commented Feb 19, 2021

I think it may be worth opening separate issue on k/k regarding summary API and cgroup v2 :)

@harche Would be interested to understand what specific things are missing on summary API when using cAdvisor on cgroup v2. Definitely agree we need to aim to not break the data returned in summary API on cgroup v2.

@harche
Copy link
Contributor

harche commented Feb 19, 2021

@bobbypage I created an issue on k/k to track the failures we are seeing in NodeConformance and NodeFeatures tests, kubernetes/kubernetes#99230

The issue links to the failed job run, which shows the missing fields in the failed tests' output.

@harche
Copy link
Contributor

harche commented Feb 19, 2021

I was wondering if your PR is trying addresses the issues we are seeing /stats/summary test in cgroup v2.

I am trying to address general issues with cgroup v2, so I guess a side effect is that some tests in k8s will start working for cgroup v2.

Notably, with this change we no longer see the failure in getting Node.CPU stats, but we still see failures in getting Node.Memory

Yeah, that is true. cgroup v2 do not have all stats available for the root cgroup, so this is "the best" we can do.

I think this is a broader discussion about what we should do with stuff supported in cgroup v1 and not in v2. We can say "this is the best we can do", or we can gather the stats via /proc/ ourselves. Not sure what the best solution is. We can implement all/most stats for root cgroup in v2 in the kernel, but it will still take ages before users actually will start running such a kernel version... Any thoughts?

@odinuge I came to know from @giuseppe that on newer kernel versions, 5.10 and above, we have /sys/fs/cgroup/cpu.stat file that can be used for this purpose.

@odinuge odinuge force-pushed the root_cgroup_stats_v2 branch from 823e8c2 to 748deaa Compare February 22, 2021 07:54
@odinuge odinuge force-pushed the root_cgroup_stats_v2 branch from 748deaa to 6ba10a3 Compare February 22, 2021 07:56
@bobbypage
Copy link
Collaborator

LGTM, thanks!

chenchun pushed a commit to chenchun/kubernetes that referenced this pull request Mar 20, 2024
移植upstream对kubelet及cadvisor的修改,修复使用cgroupv2时指标收集统计的问题
1. port cadvisor pr google/cadvisor#2839 reading cpu stats on cgroup v2
2. port cadvisor pr google/cadvisor#2837 read "max" value for cgroup v2
3. port cadvisor pr google/cadvisor#2801 gathering of stats for root cgroup on v2
4. port cadvisor pr google/cadvisor#2800: Update heuristic for container creation time
5. Fix cgroup handling for systemd with cgroup v2
6. test: adjust summary test for cgroup v2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants