Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ #633

Closed
lorenzo-biava opened this issue Sep 21, 2017 · 15 comments

Comments

@lorenzo-biava
Copy link

What did you do?

Successfully installed kube-prometheus in a Kubeadm cluster v1.7.5.

What did you expect to see?

The cAdvisor endpoints in the Prometheus kubelet job working correctly.

What did you see instead? Under which circumstances?

Several metrics are gathered correctly, but not the cAdvisor ones in the kubelet job.

Environment

  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T08:56:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

Kubeadm at v1.7.5.

  • Manifests:

kube-prometheus/manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml

[...]
spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  - port: cadvisor
    interval: 30s
    honorLabels: true

Basically, as the cAdvisor metrics have been moved, that configuration is not working anymore.
The official Prometheus' Kubernetes configuration example has already been updated with the change.

A similar configuration should be applied to the prometheus-k8s-service-monitor-kubelet.yaml manifest too, e.g.

[...]
spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  # This is for cAdvisor in K8s 1.7.3+
  - path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

It worked in my environment, but of course it might not be backward-compatible.
Perhaps changing it in the kube-prometheus Helm chart (I'm assuming they are working the same way - but haven't tested it yet) and adding a configuration property to the chart o switch the behavior might be a better option.

However, I couldn't managed to find a way to express the cAdvisor metric link with the proxy, as done in the Prometheus official example (https://kubernetes.default.svc:443/api/v1/nodes/<NODE_NAME>/proxy/metrics/cadvisor instead of http://<NODE_IP>:10255/metrics/cadvisor which is the result of this configuration change).
Since it would be useful also to access the metrics endpoint for the kube-scheduler pod (which is only available through the master-proxy in my setup), I was wondering if it is possible to build such endpoint.

@brancz
Copy link
Contributor

brancz commented Sep 21, 2017

I'm not super familiar with kubeadm, but the change in 1.7.3 was unrelated to the cAdvisor metric endpoint exposed on port 4194, the metrics were only removed from the /metric endpoint of the kubelet and moved to /metric/cadvisor.

I think in terms of this issue this was actually a firewall issue, that your worker that runs the Prometheus pod doesn't have network access to the kubelet 4194 port.

@lorenzo-biava
Copy link
Author

@brancz I've dug into the actual kubelet configuration and you're right: the default kubeadm configuration disables the cAdvisor port (with the --cadvisor-port=0 flag).

Actually I've just found the original thread on the Prometheus repository, which already covered this explanation.

Personally, I'm not very willing to change the kubeadm settings and redeploy the cluster just to enable that port, while there is already another endpoint working (and changing a manifest is enough to exploit it). But that's just me 😄
Feel free to decide whether the alternative is worth being included or not; I can use the workaround for now (and perhaps other people might find this useful too).

@lorenzo-biava lorenzo-biava changed the title [kube-prometheus] cAdvisor metrics are unavailable with Kubernetes v1.7.3+ [kube-prometheus] cAdvisor metrics are unavailable with Kubeadm default deploy at v1.7.3+ Sep 21, 2017
edrex added a commit to PlainsightAI/prometheus-operator that referenced this issue Oct 4, 2017
@edrex
Copy link

edrex commented Oct 4, 2017

metoo, with kube-aws. seems like eventually scraping the /metrics/cadvisor endpoint will be preferred? We'll maintain a fork till this is resolved (see commit pingback)

edrex added a commit to PlainsightAI/prometheus-operator that referenced this issue Oct 4, 2017
@brancz
Copy link
Contributor

brancz commented Oct 5, 2017

We'll probably move to the /metrics/cadvisor endpoint when we upgrade everything to be target 1.8.0+.

@ghostflame
Copy link

Is it meant to be adding in a kubelet/4194 target by default?

We have: k8s 1.7.7 built with bootkube
prometheus-operator 0.14.0 (not meddled with the versions, to prometheus 2.0.0-rc1)

We, as the original poster described, get kubelet metrics but no cadvisor metrics. Checking prometheus' targets, its scraping :10250 as per normal.

I don't really mind whether it would get cadvisor metrics from 4194/metrics or 10250/metrics/cadvisor - but how do we get it to do... either?

It's clear from the kubernetes issues that they don't really consider breaking them out and making the separate a big thing - so they're unlikely to put them back into 10250/metrics. For people hand-rolling their prometheus config it's not too difficult, but when operator is doing it, we lack the control to add things in (one day, prometheus will get the hang of multiple config sources).

So, what have people who use prometheus-operator been doing to get those metrics back?

@brancz
Copy link
Contributor

brancz commented Oct 26, 2017

Can you explain more what you think is not working today? I’m saying this because I think all combinations are possible today, but possibly not well enough documented.

If one wants to use the 4194/metrics endpoint then that’s already reflected in the kube-prometheus manifests.

If one wants to use the 10250/metrics/cadvisor metrics then one has to modify the servicemonitor endpoint provided in kube-prometheus and set the explicit metrics endpoint.

Does that clear things up?

@lorenzo-biava
Copy link
Author

@ghostflame To use the 10250/metrics/cadvisor endpoint, it should be enough to edit the kube-prometheus/manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml manifest the way I reported it in the issue description:

change

- port: cadvisor
    interval: 30s
    honorLabels: true

to

- path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

and redeploy.

Hope this helps...

@KeithTt
Copy link

KeithTt commented Jan 3, 2018

In my case, I just let 4194 been listened, and they worked.

--cadvisor-port=0 disables cAdvisor from listening to 0.0.0.0:4194 by default. cAdvisor will still be run inside of the kubelet and its API can be accessed at https://{node-ip}:10250/stats/. If you want to enable cAdvisor to listen on a wide-open port, run:

sed -e "/cadvisor-port=0/d" -i /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload
systemctl restart kubelet

@brancz
Copy link
Contributor

brancz commented Jan 3, 2018

Actually CIS requires cadvisor-port=0, so I recommend everyone to use the /metrics/cadvisor path. I'll leave this open as we need to fix it for kube-prometheus.

@lzbgt
Copy link

lzbgt commented Feb 22, 2018

@lorenzo-biava
as you have mentioned the correct servicemonitor configs

- path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

or

spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  # This is for cAdvisor in K8s 1.7.3+
  - path: /metrics/cadvisor
    port: http-metrics
    interval: 30s
    honorLabels: true

I did not get it work, the alarm manager still reports k8skubeletdown.
Could you give some hints?

My setup is k8s v1.9.2 and master branch of prometheus-operator as of today.

[root@master1 kube-prometheus]# k describe svc kubelet
Name:              kubelet
Namespace:         kube-system
Labels:            k8s-app=kubelet
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                None
Port:              https-metrics  10250/TCP
TargetPort:        10250/TCP
Endpoints:         192.168.50.57:10250,192.168.50.58:10250,192.168.50.59:10250 + 8 more...
Session Affinity:  None
Events:            <none>

[root@master1 kube-prometheus]# k get svc kubelet -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-02-09T10:09:39Z
  labels:
    k8s-app: kubelet
  name: kubelet
  namespace: kube-system
  resourceVersion: "446305"
  selfLink: /api/v1/namespaces/kube-system/services/kubelet
  uid: 57005c64-0d81-11e8-91f9-005056a3367f
spec:
  clusterIP: None
  ports:
  - name: https-metrics
    port: 10250
    protocol: TCP
    targetPort: 10250
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

[root@master1 kube-prometheus]# cat ./manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubelet
  labels:
    k8s-app: kubelet
spec:
  jobLabel: k8s-app
  endpoints:
  #- port: https-metrics
  #  scheme: https
  #  interval: 30s
  #  tlsConfig:
  #    insecureSkipVerify: true
  #  bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  - port: https-metrics
    scheme: https
    path: /metrics/cadvisor
    interval: 30s
    honorLabels: true
    tlsConfig:
      insecureSkipVerify: true
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  selector:
    matchLabels:
      k8s-app: kubelet
  namespaceSelector:
    matchNames:
    - kube-system



image

@lzbgt
Copy link

lzbgt commented Mar 6, 2018

to resolve the issue in my case above, just do below on all nodes, according to https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/docs/kube-prometheus-on-kubeadm.md

sed -e "s/--authorization-mode=Webhook/--authentication-token-webhook=true --authorization-mode=Webhook/" -i /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload; systemctl restart kubelet

@shenshouer
Copy link

@lzbgt thks

@lorenzo-biava
Copy link
Author

@lzbgt Sorry I've missed this; I've not been working with Kubernetes and Prometheus Operator lately, but it's definitely good to know that there's a dedicated doc for kubeadm configurations now. Thanks for pointing that out 😉

@brancz
Copy link
Contributor

brancz commented Apr 12, 2018

Closing as sample configurations are available and specific documentation for certain platforms as well.

@brancz brancz closed this as completed Apr 12, 2018
@pbaezab
Copy link

pbaezab commented Aug 3, 2018

Thanks @lorenzo-biava This I work in RKE cluster

taylord0ng added a commit to taylord0ng/helm-charts that referenced this issue Feb 9, 2023
      - job_name: 'kubernetes-nodes-cadvisor'
        # Default to scraping over https. If required, just disable this or change to
        # `http`.
        scheme: https
        scrape_interval: 60s
        scrape_timeout: 30s

        # This TLS & bearer token file config is used to connect to the actual scrape
        # endpoints for cluster components. This is separate to discovery auth
        # configuration because discovery & scraping are two separate concerns in
        # Prometheus. The discovery auth config is automatic if Prometheus runs inside
        # the cluster. Otherwise, more config options have to be provided within the
        # <kubernetes_sd_config>.
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          # If your node certificates are self-signed or use a different CA to the
          # master CA, then disable certificate verification below. Note that
          # certificate verification is an integral part of a secure infrastructure
          # so this should only be disabled in a controlled environment. You can
          # disable certificate verification by uncommenting the line below.
          #
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        kubernetes_sd_configs:
          - role: node

        # This configuration will work only on kubelet 1.7.3+
        # As the scrape endpoints for cAdvisor have changed
        # if you are using older version you need to change the replacement to
        # replacement: /api/v1/nodes/$1:4194/proxy/metrics
        # more info here prometheus-operator/prometheus-operator#633
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor

Signed-off-by: taylord0ng <hibase123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants