Standalone cAdvisor for monitoring #18770

vishh · 2015-12-16T18:54:18Z

As of Kubernetes v1.1, and possibly v1.2 as well, cAdvisor is linked into kubelet.
Kubelet groups detailed cAdvisor metrics into pods and nodes and exposes them via REST endpoints (/stats)

Monitoring is resource intensive. There are currently several issues filed against kubelet around resource usage which are mostly related to cAdvisor's resource consumption.
Support for custom metrics adds significant resource overhead to kubelet.

Not all users of kubernetes need cAdvisor for monitoring.

Kubelet will need just a few metrics for its functionality. Specifically, cpu usage, memory usage, and filesystem usage for now. These metrics are needed for scheduling and other cluster functionalities.

By separating out cAdvisor monitoring agent from kubelet, users can choose to run cAdvisor as a standalone container and get a lot more detailed metrics.
This split will let us optimize resource impact of kubelet and cadvisor independently.
cAdvisor can expose potentially expensive metrics as well.

For the basic metrics that kubelet needs, it will embed specific libraries from cAdvisor and compute those metrics internally.

cAdvisor needs to be enhanced to support pods. To make this easier, kubelet should expose the pods that it is currently managing via some API. (#9473 talks about one such API)

cc @dchen1107 @timstclair @kubernetes/goog-node @jimmidyson @jszczepkowski

The text was updated successfully, but these errors were encountered:

ncdc · 2015-12-16T20:42:48Z

cc @kubernetes/rh-cluster-infra

dchen1107 · 2015-12-17T22:30:25Z

The issue is filed based on several offline design discussions including several attempts of improving Kubelet Metrics API made by @timstclair (#15691, #15862, #16621, #17708, #18544, ...).

cc/ @erictune

huang195 · 2015-12-18T22:57:30Z

if we're going forward with collecting the non-core metrics using a separate cadvisor pod, how will the metrics information be exposed to collectors such as heapster? Will kubelet provide a directory service that helps heapster find where cadvisor pod can be accessed or will kubelet act as a middle-man (similar to the way the code works today) that gets metrics from cadvisor and servicing them out to heapster?

vishh · 2015-12-18T23:04:47Z

Heapster will have to get metrics from cAdvisor directly. Heapster can
query the IP:Port information for all the cAdvisor's in the
cluster(essentially one per node), and can collect metrics using cAdvisor's
IP directly.

On Fri, Dec 18, 2015 at 2:58 PM, Hai Huang notifications@github.com wrote:

if we're going forward with collecting the non-core metrics using a
separate cadvisor pod, how will the metrics information be exposed to
collectors such as heapster? Will kubelet provide a directory service that
helps heapster find where cadvisor pod can be accessed or will kubelet act
as a middle-man (similar to the way the code works today) that gets metrics
from cadvisor and servicing them out to heapster?

—
Reply to this email directly or view it on GitHub
#18770 (comment)
.

jimmidyson · 2015-12-18T23:06:31Z

So are you expecting cadvisor pod to be a daemon set?

vishh · 2015-12-18T23:26:33Z

Yes

On Fri, Dec 18, 2015 at 3:07 PM, Jimmi Dyson notifications@github.com
wrote:

So are you expecting cadvisor pod to be a daemon set?

—
Reply to this email directly or view it on GitHub
#18770 (comment)
.

huang195 · 2015-12-19T00:45:01Z

How will heapster find the port that cadvisor listens to? If each cadvisor is running in a separate pod, it could be mapped to different host port on different host.

vishh · 2015-12-19T01:47:49Z

Host port is not required. Remember Pods can talk to each other directly ;)

On Fri, Dec 18, 2015 at 4:45 PM, Hai Huang notifications@github.com wrote:

How will heapster find the port that cadvisor listens to? If each cadvisor
is running in a separate pod, it could be mapped to different host port on
different host.

—
Reply to this email directly or view it on GitHub
#18770 (comment)
.

huang195 · 2015-12-19T19:24:33Z

OK, I just saw that you're expecting cadvisor pods to be running as a daemon set, so the host ips are known to the heapster pod collecting the metrics information. Reading how to communicate with a daemon set, I found some information here: https://github.com/kubernetes/kubernetes/blob/master/docs/admin/daemons.md#communicating-with-daemonset-pods, and it says

Clients knows the the list of nodes ips somehow, and know the port by convention.

I take it this means cadvisor will need to listen to the same port on every host, e.g., 4194, so whoever is communicating with it knows how to find it. Or it has to register with DNS.

vishh · 2015-12-19T19:35:38Z

I was not suggesting using services. I was instead suggesting using the
Pod IPs directly.

huang195 · 2015-12-20T16:22:17Z

How will the consumer of such metrics get the list of Pod IPs of cadvisor Pods? Sorry if I missed this being already described somewhere.

feiskyer · 2016-01-11T03:39:02Z

I like the idea of making cAdvisor standalone, but I have a few questions:

Could cAdvisor be running not as a pod, e.g., standalone of kubelet?
If above question is true, how could heapster find out the list of those cAdvisor apis?

huangyuqi · 2016-01-11T04:11:33Z

Hi, @feiskyer , Some of my opinions

Could cAdvisor be running not as a pod, e.g., standalone of kubelet?

In my opinion, cAdvisor running as a pod(maybe with daemon set) is more better than running standalone. If a daemon pod is killed or stopped, the DaemonSet will create a new replica of the daemon pod on the node. This pod can be managed by kubernetes.
In addition, When a new node is added to the cluster, the DaemonSet controller will start daemon pods on the node. So, cAdvisor will be add on the node automatically.

If above question is true, how could heapster find out the list of those cAdvisor apis?

Maybe, we can get the cAdvisor list by "get" and "describe" commands of "DaemonSet "

feiskyer · 2016-01-11T04:15:49Z

@huangyuqi For some container runtimes, e.g. Hyper, cAdvisor must be running outside of a pod.

timothysc · 2016-01-11T14:11:25Z

/cc @kubernetes/sig-scalability @mattf @abhgupta

*this is exactly what I've observed in #19341

Is this deployment change expected for 1.2?

timothysc · 2016-01-11T14:48:07Z

IMHO I would like to do this, if only to cap the CPU usage of the pod for maximum density.

huang195 · 2016-01-11T15:33:55Z

In #18769 I'm trying to make the metrics/health collector (currently using a built-in cadvisor) a pluggable interface, which allows users to choose if k8s should use cadvisor running in build-in mode, pod mode, or standalone mode. Furthermore, one could also plug in one's favorite monitoring tool, other than cadvisor. That should complement this really well.

vishh · 2016-01-11T18:05:02Z

@feiskyer:

Could cAdvisor be running not as a pod, e.g., standalone of kubelet?

Yeah. The idea is to run it as an optional DaemonSet.

If above question is true, how could heapster find out the list of those cAdvisor apis?

cAdvisor's APIs are not discoverable like that of Kubernetes APIs. We intend to fix that. Until then, heapster will be hardcoded to query specific cAdvisor APIs.

vishh · 2016-01-11T18:06:47Z

@timothysc: I assume that cAdvisor will be run by default on all OpenShift clusters. If thats true, it is important to optimize not only the kubelet, but cAdvisor as well. Separating them will help us with performance measurements, but otherwise the work that needs to go in for optimizing resource consumption has to happen anyways.

dchen1107 · 2016-02-04T00:22:34Z

On hyper's use case, we have to provide a way to run cAdvisor as a daemon on the host node, similar to Kubelet.

euank · 2016-08-02T23:02:03Z

@dchen1107 to be sure I understand, you're saying that we should make it possible to run directly on the host node, but it's still okay if the default way to run it is as a pod, as discussed above in this issue?

As an aside, I suspect that node-problem-detector and kube-proxy, which are both currently run as privleged pods, also will not work that way with Hyper.

davidopp · 2016-08-05T20:51:10Z

We could avoid making Heapster handle two sources if we say that the non-core monitoring agent must supply a super-set of the metrics provided by core cAdvisor (either by fetching from core cAdvisor and merging with its own metrics, or something else).

dchen1107 · 2016-08-08T23:19:44Z

xref: #27097 (comment)

re @davidopp's comment: Yes, non-core monitoring agent should supply the super-set of the metrics provided by core cAdvisor which is defined at https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go

dchen1107 · 2016-08-09T00:20:44Z

re @euank's comment: #18770 (comment)

What I mentioned above of running cAdvisor directly on the host node is an intermediate workaround to support hyperContainer before we prioritized today's container runtime effort.

But now the story is changed, especially at #27097 (comment), I tried to re-define what the core metrics is: node-level and pod-level. If we agreed upon that (I think we are largely convergent already), I think running containers within a pod in a hyper or as native pod on the host node is punted to hyper runtime implementation, neither part of kubelet core system, nor cAdvisor config.

fejta-bot · 2017-12-16T20:25:37Z

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-15T21:13:23Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-14T21:20:54Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

vishh mentioned this issue Dec 16, 2015

Introduce direct API for application metrics google/cadvisor#1016

Open

yujuhong mentioned this issue Dec 16, 2015

Proposal to create a generic interface from kubelet to cadvisor #18769

Closed

yujuhong added area/monitoring sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 16, 2015

dchen1107 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Dec 17, 2015

vishh mentioned this issue Dec 18, 2015

Heapster long term vision kubernetes-retired/heapster#769

Merged

timstclair mentioned this issue Jan 8, 2016

Deprecate cAdvisor flags in kubelet #19432

Open

adohe-zz mentioned this issue Jan 11, 2016

reduce kubelet overhead #19341

Closed

feiskyer mentioned this issue Jan 12, 2016

image manager should support multiple runtimes #19519

Closed

huang195 mentioned this issue Jan 15, 2016

Create a generic interface from kubelet to cadvisor #19708

Closed

dchen1107 added this to the next-candidate milestone Feb 4, 2016

dchen1107 mentioned this issue Feb 19, 2016

Kubelet should monitor number of inodes, not just disk space #21546

Closed

timstclair added the area/cadvisor label Mar 15, 2016

timstclair self-assigned this Mar 15, 2016

timstclair mentioned this issue Mar 15, 2016

collector interfaces and structs #19951

Closed

dchen1107 mentioned this issue Aug 4, 2016

A better story of container metrics for runtime integration #27097

Closed

davidopp mentioned this issue Sep 14, 2016

cAdvisor should export pod labels for container metrics #32326

Closed

jimmidyson mentioned this issue Sep 14, 2016

Clarify desired cAdvisor integration in kubelet #32638

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2018

k8s-ci-robot closed this as completed Feb 14, 2018

poblahblahblah mentioned this issue Jul 14, 2022

Kubelet's cAdvisor endpoint serving stale metrics? #111157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone cAdvisor for monitoring #18770

Standalone cAdvisor for monitoring #18770

vishh commented Dec 16, 2015

ncdc commented Dec 16, 2015

dchen1107 commented Dec 17, 2015

huang195 commented Dec 18, 2015

vishh commented Dec 18, 2015

jimmidyson commented Dec 18, 2015

vishh commented Dec 18, 2015

huang195 commented Dec 19, 2015

vishh commented Dec 19, 2015

huang195 commented Dec 19, 2015

vishh commented Dec 19, 2015

huang195 commented Dec 20, 2015

feiskyer commented Jan 11, 2016

huangyuqi commented Jan 11, 2016

feiskyer commented Jan 11, 2016

timothysc commented Jan 11, 2016

timothysc commented Jan 11, 2016

huang195 commented Jan 11, 2016

vishh commented Jan 11, 2016

vishh commented Jan 11, 2016

dchen1107 commented Feb 4, 2016

euank commented Aug 2, 2016

davidopp commented Aug 5, 2016

dchen1107 commented Aug 8, 2016

dchen1107 commented Aug 9, 2016

fejta-bot commented Dec 16, 2017

fejta-bot commented Jan 15, 2018

fejta-bot commented Feb 14, 2018

Standalone cAdvisor for monitoring #18770

Standalone cAdvisor for monitoring #18770

Comments

vishh commented Dec 16, 2015

ncdc commented Dec 16, 2015

dchen1107 commented Dec 17, 2015

huang195 commented Dec 18, 2015

vishh commented Dec 18, 2015

jimmidyson commented Dec 18, 2015

vishh commented Dec 18, 2015

huang195 commented Dec 19, 2015

vishh commented Dec 19, 2015

huang195 commented Dec 19, 2015

vishh commented Dec 19, 2015

huang195 commented Dec 20, 2015

feiskyer commented Jan 11, 2016

huangyuqi commented Jan 11, 2016

feiskyer commented Jan 11, 2016

timothysc commented Jan 11, 2016

timothysc commented Jan 11, 2016

huang195 commented Jan 11, 2016

vishh commented Jan 11, 2016

vishh commented Jan 11, 2016

dchen1107 commented Feb 4, 2016

euank commented Aug 2, 2016

davidopp commented Aug 5, 2016

dchen1107 commented Aug 8, 2016

dchen1107 commented Aug 9, 2016

fejta-bot commented Dec 16, 2017

fejta-bot commented Jan 15, 2018

fejta-bot commented Feb 14, 2018