[kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ #633

lorenzo-biava · 2017-09-21T14:09:31Z

What did you do?

Successfully installed kube-prometheus in a Kubeadm cluster v1.7.5.

What did you expect to see?

The cAdvisor endpoints in the Prometheus kubelet job working correctly.

What did you see instead? Under which circumstances?

Several metrics are gathered correctly, but not the cAdvisor ones in the kubelet job.

Environment

Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T08:56:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

Kubeadm at v1.7.5.

Manifests:

kube-prometheus/manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml

[...]
spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  - port: cadvisor
    interval: 30s
    honorLabels: true

Basically, as the cAdvisor metrics have been moved, that configuration is not working anymore.
The official Prometheus' Kubernetes configuration example has already been updated with the change.

A similar configuration should be applied to the prometheus-k8s-service-monitor-kubelet.yaml manifest too, e.g.

[...]
spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  # This is for cAdvisor in K8s 1.7.3+
  - path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

It worked in my environment, but of course it might not be backward-compatible.
Perhaps changing it in the kube-prometheus Helm chart (I'm assuming they are working the same way - but haven't tested it yet) and adding a configuration property to the chart o switch the behavior might be a better option.

However, I couldn't managed to find a way to express the cAdvisor metric link with the proxy, as done in the Prometheus official example (https://kubernetes.default.svc:443/api/v1/nodes/<NODE_NAME>/proxy/metrics/cadvisor instead of http://<NODE_IP>:10255/metrics/cadvisor which is the result of this configuration change).
Since it would be useful also to access the metrics endpoint for the kube-scheduler pod (which is only available through the master-proxy in my setup), I was wondering if it is possible to build such endpoint.

The text was updated successfully, but these errors were encountered:

brancz · 2017-09-21T15:20:46Z

I'm not super familiar with kubeadm, but the change in 1.7.3 was unrelated to the cAdvisor metric endpoint exposed on port 4194, the metrics were only removed from the /metric endpoint of the kubelet and moved to /metric/cadvisor.

I think in terms of this issue this was actually a firewall issue, that your worker that runs the Prometheus pod doesn't have network access to the kubelet 4194 port.

lorenzo-biava · 2017-09-21T19:08:00Z

@brancz I've dug into the actual kubelet configuration and you're right: the default kubeadm configuration disables the cAdvisor port (with the --cadvisor-port=0 flag).

Actually I've just found the original thread on the Prometheus repository, which already covered this explanation.

Personally, I'm not very willing to change the kubeadm settings and redeploy the cluster just to enable that port, while there is already another endpoint working (and changing a manifest is enough to exploit it). But that's just me 😄
Feel free to decide whether the alternative is worth being included or not; I can use the workaround for now (and perhaps other people might find this useful too).

See: kubernetes/kubernetes#49079 prometheus-operator#633

edrex · 2017-10-04T21:49:06Z

metoo, with kube-aws. seems like eventually scraping the /metrics/cadvisor endpoint will be preferred? We'll maintain a fork till this is resolved (see commit pingback)

See: kubernetes/kubernetes#49079 prometheus-operator#633

brancz · 2017-10-05T08:04:27Z

We'll probably move to the /metrics/cadvisor endpoint when we upgrade everything to be target 1.8.0+.

ghostflame · 2017-10-26T13:23:51Z

Is it meant to be adding in a kubelet/4194 target by default?

We have: k8s 1.7.7 built with bootkube
prometheus-operator 0.14.0 (not meddled with the versions, to prometheus 2.0.0-rc1)

We, as the original poster described, get kubelet metrics but no cadvisor metrics. Checking prometheus' targets, its scraping :10250 as per normal.

I don't really mind whether it would get cadvisor metrics from 4194/metrics or 10250/metrics/cadvisor - but how do we get it to do... either?

It's clear from the kubernetes issues that they don't really consider breaking them out and making the separate a big thing - so they're unlikely to put them back into 10250/metrics. For people hand-rolling their prometheus config it's not too difficult, but when operator is doing it, we lack the control to add things in (one day, prometheus will get the hang of multiple config sources).

So, what have people who use prometheus-operator been doing to get those metrics back?

brancz · 2017-10-26T14:38:18Z

Can you explain more what you think is not working today? I’m saying this because I think all combinations are possible today, but possibly not well enough documented.

If one wants to use the 4194/metrics endpoint then that’s already reflected in the kube-prometheus manifests.

If one wants to use the 10250/metrics/cadvisor metrics then one has to modify the servicemonitor endpoint provided in kube-prometheus and set the explicit metrics endpoint.

Does that clear things up?

lorenzo-biava · 2017-10-26T15:18:22Z

@ghostflame To use the 10250/metrics/cadvisor endpoint, it should be enough to edit the kube-prometheus/manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml manifest the way I reported it in the issue description:

change

- port: cadvisor
    interval: 30s
    honorLabels: true

to

- path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

and redeploy.

Hope this helps...

KeithTt · 2018-01-03T09:21:01Z

In my case, I just let 4194 been listened, and they worked.

--cadvisor-port=0 disables cAdvisor from listening to 0.0.0.0:4194 by default. cAdvisor will still be run inside of the kubelet and its API can be accessed at https://{node-ip}:10250/stats/. If you want to enable cAdvisor to listen on a wide-open port, run:

sed -e "/cadvisor-port=0/d" -i /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload
systemctl restart kubelet

brancz · 2018-01-03T10:07:55Z

Actually CIS requires cadvisor-port=0, so I recommend everyone to use the /metrics/cadvisor path. I'll leave this open as we need to fix it for kube-prometheus.

lzbgt · 2018-02-22T10:00:37Z

@lorenzo-biava
as you have mentioned the correct servicemonitor configs

- path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

or

spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  # This is for cAdvisor in K8s 1.7.3+
  - path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

I did not get it work, the alarm manager still reports k8skubeletdown.
Could you give some hints?

My setup is k8s v1.9.2 and master branch of prometheus-operator as of today.

[root@master1 kube-prometheus]# k describe svc kubelet
Name:              kubelet
Namespace:         kube-system
Labels:            k8s-app=kubelet
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                None
Port:              https-metrics  10250/TCP
TargetPort:        10250/TCP
Endpoints:         192.168.50.57:10250,192.168.50.58:10250,192.168.50.59:10250 + 8 more...
Session Affinity:  None
Events:            <none>

[root@master1 kube-prometheus]# k get svc kubelet -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-02-09T10:09:39Z
  labels:
    k8s-app: kubelet
  name: kubelet
  namespace: kube-system
  resourceVersion: "446305"
  selfLink: /api/v1/namespaces/kube-system/services/kubelet
  uid: 57005c64-0d81-11e8-91f9-005056a3367f
spec:
  clusterIP: None
  ports:
  - name: https-metrics
    port: 10250
    protocol: TCP
    targetPort: 10250
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

[root@master1 kube-prometheus]# cat ./manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubelet
  labels:
    k8s-app: kubelet
spec:
  jobLabel: k8s-app
  endpoints:
  #- port: https-metrics
  #  scheme: https
  #  interval: 30s
  #  tlsConfig:
  #    insecureSkipVerify: true
  #  bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  - port: https-metrics
    scheme: https
    path: /metrics/cadvisor
    interval: 30s
    honorLabels: true
    tlsConfig:
      insecureSkipVerify: true
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  selector:
    matchLabels:
      k8s-app: kubelet
  namespaceSelector:
    matchNames:
    - kube-system

lzbgt · 2018-03-06T02:26:47Z

to resolve the issue in my case above, just do below on all nodes, according to https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/docs/kube-prometheus-on-kubeadm.md

sed -e "s/--authorization-mode=Webhook/--authentication-token-webhook=true --authorization-mode=Webhook/" -i /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload; systemctl restart kubelet

shenshouer · 2018-03-16T09:57:27Z

@lzbgt thks

lorenzo-biava · 2018-03-16T15:57:13Z

@lzbgt Sorry I've missed this; I've not been working with Kubernetes and Prometheus Operator lately, but it's definitely good to know that there's a dedicated doc for kubeadm configurations now. Thanks for pointing that out 😉

brancz · 2018-04-12T08:36:13Z

Closing as sample configurations are available and specific documentation for certain platforms as well.

pbaezab · 2018-08-03T18:44:41Z

Thanks @lorenzo-biava This I work in RKE cluster

- job_name: 'kubernetes-nodes-cadvisor' # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https scrape_interval: 60s scrape_timeout: 30s # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # If your node certificates are self-signed or use a different CA to the # master CA, then disable certificate verification below. Note that # certificate verification is an integral part of a secure infrastructure # so this should only be disabled in a controlled environment. You can # disable certificate verification by uncommenting the line below. # insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node # This configuration will work only on kubelet 1.7.3+ # As the scrape endpoints for cAdvisor have changed # if you are using older version you need to change the replacement to # replacement: /api/v1/nodes/$1:4194/proxy/metrics # more info here prometheus-operator/prometheus-operator#633 relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor Signed-off-by: taylord0ng <hibase123@gmail.com>

lorenzo-biava changed the title ~~[kube-prometheus] cAdvisor metrics are unavailable with Kubernetes v1.7.3+~~ [kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ Sep 21, 2017

edrex added a commit to PlainsightAI/prometheus-operator that referenced this issue Oct 4, 2017

scrape from the new (k8s 1.7.3+) cadvisor metrics path

9bd80b4

See: kubernetes/kubernetes#49079 prometheus-operator#633

edrex added a commit to PlainsightAI/prometheus-operator that referenced this issue Oct 4, 2017

scrape from the new (k8s 1.7.3+) cadvisor metrics path

a287e21

See: kubernetes/kubernetes#49079 prometheus-operator#633

champtar mentioned this issue Jan 17, 2018

Query kubelet via proxy / metricRelabelings issue #902

Closed

brancz closed this as completed Apr 12, 2018

sensay-nelson mentioned this issue Jan 29, 2019

incomplete install instructions? grafana/kubernetes-app#58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ #633

[kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ #633

lorenzo-biava commented Sep 21, 2017

brancz commented Sep 21, 2017

lorenzo-biava commented Sep 21, 2017

edrex commented Oct 4, 2017

brancz commented Oct 5, 2017

ghostflame commented Oct 26, 2017

brancz commented Oct 26, 2017

lorenzo-biava commented Oct 26, 2017

KeithTt commented Jan 3, 2018

brancz commented Jan 3, 2018

lzbgt commented Feb 22, 2018 •

edited

Loading

lzbgt commented Mar 6, 2018

shenshouer commented Mar 16, 2018

lorenzo-biava commented Mar 16, 2018

brancz commented Apr 12, 2018

pbaezab commented Aug 3, 2018 •

edited

Loading

[kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ #633

[kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ #633

Comments

lorenzo-biava commented Sep 21, 2017

brancz commented Sep 21, 2017

lorenzo-biava commented Sep 21, 2017

edrex commented Oct 4, 2017

brancz commented Oct 5, 2017

ghostflame commented Oct 26, 2017

brancz commented Oct 26, 2017

lorenzo-biava commented Oct 26, 2017

KeithTt commented Jan 3, 2018

brancz commented Jan 3, 2018

lzbgt commented Feb 22, 2018 • edited Loading

lzbgt commented Mar 6, 2018

shenshouer commented Mar 16, 2018

lorenzo-biava commented Mar 16, 2018

brancz commented Apr 12, 2018

pbaezab commented Aug 3, 2018 • edited Loading

lzbgt commented Feb 22, 2018 •

edited

Loading

pbaezab commented Aug 3, 2018 •

edited

Loading