Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start monitoring-influxdb-grafana-v3-0 using petset #28591

Closed
ddysher opened this issue Jul 7, 2016 · 15 comments
Closed

Unable to start monitoring-influxdb-grafana-v3-0 using petset #28591

ddysher opened this issue Jul 7, 2016 · 15 comments
Labels

Comments

@ddysher
Copy link
Contributor

ddysher commented Jul 7, 2016

monitoring-influxdb-grafana-v3-0 stays in ContainerCreating

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                                 READY     STATUS              RESTARTS   AGE
default       web-0                                                1/1       Running             0          2h
default       web-1                                                1/1       Running             0          2h
kube-system   etcd-server-events-kubernetes-master                 1/1       Running             0          2h
kube-system   etcd-server-kubernetes-master                        1/1       Running             0          2h
kube-system   fluentd-cloud-logging-kubernetes-master              1/1       Running             0          2h
kube-system   fluentd-cloud-logging-kubernetes-minion-group-k1j0   1/1       Running             0          2h
kube-system   fluentd-cloud-logging-kubernetes-minion-group-o9be   1/1       Running             0          2h
kube-system   fluentd-cloud-logging-kubernetes-minion-group-tu28   1/1       Running             0          2h
kube-system   heapster-v1.1.0-527143062-rephf                      4/4       Running             0          2h
kube-system   kube-addon-manager-kubernetes-master                 1/1       Running             0          2h
kube-system   kube-apiserver-kubernetes-master                     1/1       Running             2          2h
kube-system   kube-controller-manager-kubernetes-master            1/1       Running             2          2h
kube-system   kube-dns-v18-en195                                   3/3       Running             0          2h
kube-system   kube-proxy-kubernetes-minion-group-k1j0              1/1       Running             0          2h
kube-system   kube-proxy-kubernetes-minion-group-o9be              1/1       Running             0          2h
kube-system   kube-proxy-kubernetes-minion-group-tu28              1/1       Running             0          2h
kube-system   kube-scheduler-kubernetes-master                     1/1       Running             0          2h
kube-system   kubernetes-dashboard-v1.1.0-w2p8u                    1/1       Running             0          2h
kube-system   l7-default-backend-v1.0-wh3we                        1/1       Running             0          2h
kube-system   l7-lb-controller-v0.7.0-kubernetes-master            1/1       Running             0          2h
kube-system   monitoring-influxdb-grafana-v3-0                     0/2       ContainerCreating   0          2h
kube-system   node-problem-detector-v0.1-0q1w2                     1/1       Running             0          2h
kube-system   node-problem-detector-v0.1-1ivpq                     1/1       Running             0          2h
kube-system   node-problem-detector-v0.1-710wc                     1/1       Running             0          2h
kube-system   node-problem-detector-v0.1-xyc7u                     1/1       Running             0          2h

logs from ps controller

I0707 10:16:25.408349       5 pet_set.go:317] Syncing PetSet kube-system/monitoring-influxdb-grafana-v3 with 1 pets
I0707 10:16:25.410101       5 pet_set.go:325] PetSet monitoring-influxdb-grafana-v3 blocked from scaling on pet monitoring-influxdb-grafana-v3-0
I0707 10:16:25.412982       5 pet.go:101] PetSet monitoring-influxdb-grafana-v3 waiting on unhealthy pet monitoring-influxdb-grafana-v3-0
I0707 10:16:55.398695       5 pet_set.go:317] Syncing PetSet kube-system/monitoring-influxdb-grafana-v3 with 1 pets
I0707 10:16:55.403028       5 pet_set.go:325] PetSet monitoring-influxdb-grafana-v3 blocked from scaling on pet monitoring-influxdb-grafana-v3-0
I0707 10:16:55.407318       5 pet.go:101] PetSet monitoring-influxdb-grafana-v3 waiting on unhealthy pet monitoring-influxdb-grafana-v3-0

kubernetes version

$ kubectl version                     
Client Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-alpha.0.1284+1c56319b79e3a8", GitCommit:"1c56319b79e3a80c380e3b92683daba8042dcbea", GitTreeState:"clean", BuildDate:"2016-07-07T07:18:20Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-alpha.0.1284+1c56319b79e3a8", GitCommit:"1c56319b79e3a80c380e3b92683daba8042dcbea", GitTreeState:"clean", BuildDate:"2016-07-07T07:12:17Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

The cluster is brought up using cluster/kube-up.sh

@jszczepkowski @bprashanth

@jszczepkowski
Copy link
Contributor

@ddysher
Can you tell us what are the events for monitoring-influxdb-grafana pod? Events can be returned by executing:
kubectl describe pod [pod name]

@ddysher
Copy link
Contributor Author

ddysher commented Jul 7, 2016

Here's the result:

deyuan@sugarcane:~/code/tool/kubernetes$ kubectl --kubeconfig=$HOME/Downloads/config-gce describe pods monitoring-influxdb-grafana-v3-0 --namespace=kube-system
Name:       monitoring-influxdb-grafana-v3-0
Namespace:  kube-system
Node:       kubernetes-minion-group-k1j0/10.240.0.4
Start Time: Thu, 07 Jul 2016 15:30:06 +0800
Labels:     k8s-app=influxGrafana,kubernetes.io/cluster-service=true,version=v3
Status:     Pending
IP:     
Controllers:    PetSet/monitoring-influxdb-grafana-v3
Containers:
  influxdb:
    Container ID:   
    Image:      gcr.io/google_containers/heapster_influxdb:v0.5
    Image ID:       
    Ports:      8083/TCP, 8086/TCP
    QoS Tier:
      cpu:  Guaranteed
      memory:   Guaranteed
    Limits:
      cpu:  100m
      memory:   500Mi
    Requests:
      cpu:      100m
      memory:       500Mi
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment Variables:
  grafana:
    Container ID:   
    Image:      gcr.io/google_containers/heapster_grafana:v2.6.0-2
    Image ID:       
    Port:       
    QoS Tier:
      memory:   Guaranteed
      cpu:  Guaranteed
    Limits:
      cpu:  100m
      memory:   100Mi
    Requests:
      cpu:      100m
      memory:       100Mi
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment Variables:
      INFLUXDB_SERVICE_URL:     http://monitoring-influxdb:8086
      GF_AUTH_BASIC_ENABLED:        false
      GF_AUTH_ANONYMOUS_ENABLED:    true
      GF_AUTH_ANONYMOUS_ORG_ROLE:   Admin
      GF_SERVER_ROOT_URL:       /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  influxdb-persistent-storage:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  influxdb-claim
    ReadOnly:   false
  grafana-persistent-storage:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  default-token-byczb:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-byczb
Events:
  FirstSeen LastSeen    Count   From                    SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----                    -------------   --------    ------      -------
  4h        33s     121 {kubelet kubernetes-minion-group-k1j0}          Warning     FailedMount Unable to mount volumes for pod "monitoring-influxdb-grafana-v3-0_kube-system(9fbd0dbb-4414-11e6-8d7e-42010af00002)": timeout expired waiting for volumes to attach/mount for pod "monitoring-influxdb-grafana-v3-0"/"kube-system". list of unattached/unmounted volumes=[influxdb-persistent-storage]
  4h        33s     121 {kubelet kubernetes-minion-group-k1j0}          Warning     FailedSync  Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "monitoring-influxdb-grafana-v3-0"/"kube-system". list of unattached/unmounted volumes=[influxdb-persistent-storage]

@chrislovecnm
Copy link
Contributor

I need to see you Pet Set yaml. The message is saying that the PV was missing. Did you mean to have the PV auto provisioned?

@ddysher
Copy link
Contributor Author

ddysher commented Jul 10, 2016

I believe so. I'm walking through petset and this is what i got when running kubectl get pv, they are all auto provisioned.

$ kubectl get pv
NAME                                       CAPACITY   ACCESSMODES   STATUS    CLAIM                        REASON    AGE
influxdb-pv                                10Gi       RWO,ROX       Bound     kube-system/influxdb-claim             3d
pvc-35bd78aa-4416-11e6-8d7e-42010af00002   1Gi        RWO           Bound     default/www-web-0                      3d
pvc-35c345a1-4416-11e6-8d7e-42010af00002   1Gi        RWO           Bound     default/www-web-1                      3d

For monitoring, I didn't do any customization, so I believe the pet set yaml comes from here?
https://github.com/kubernetes/kubernetes/blob/a261776f3e0d9f0f3dede72a0e389d40b5117cce/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-petset.yaml

@bprashanth
Copy link
Contributor

@jszczepkowski @piosz suggest modifying the petset to:

apiVersion: apps/v1alpha1
kind: PetSet
metadata:
  name: monitoring-influxdb-grafana-v4
  # Note: Modified namespace
  namespace: default
  labels:
    name: grafana
    version: v4
spec:
  # This service must exist
  serviceName: monitoring-influxdb
  replicas: 1
  template:
    metadata:
      labels:
        # Note: Modified labels.
        name: grafana
        version: v4
      annotations:
        # This is an alpha safety hook for quorum datbases. If it's false the petset won't scale.
        pod.alpha.kubernetes.io/initialized: "true"
    spec:
      containers:
        - image: gcr.io/google_containers/heapster_influxdb:v0.5
          name: influxdb
          resources:
            # keep request = limit to keep this container in guaranteed class
            limits:
              cpu: 100m
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 500Mi
          ports:
            - containerPort: 8083
            - containerPort: 8086
          volumeMounts:
          - name: influxdb-persistent-storage
            mountPath: /data
        - image: gcr.io/google_containers/heapster_grafana:v2.6.0-2
          name: grafana
          env:
          resources:
            # keep request = limit to keep this container in guaranteed class
            limits:
              cpu: 100m
              memory: 100Mi
            requests:
              cpu: 100m
              memory: 100Mi
          env:
            # This variable is required to setup templates in Grafana.
            - name: INFLUXDB_SERVICE_URL
              value: http://monitoring-influxdb:8086
              # The following env variables are required to make Grafana accessible via
              # the kubernetes api-server proxy. On production clusters, we recommend
              # removing these env variables, setup auth for grafana, and expose the grafana
              # service using a LoadBalancer or a public IP.
            - name: GF_AUTH_BASIC_ENABLED
              value: "false"
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: "true"
            - name: GF_AUTH_ANONYMOUS_ORG_ROLE
              value: Admin
            - name: GF_SERVER_ROOT_URL
              value: /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/
          volumeMounts:
          - name: grafana-persistent-storage
            mountPath: /var
      volumes:
      # These are a list of hardcoded claims, specifying a pvs here means the admin
      # will pre-provision it.
      - name: grafana-persistent-storage
        emptyDir: {}
  # This is a list of petset volumes. A new one will get created for each pet.
  volumeClaimTemplates:
  - metadata:
      # This name must match a mounted volume on a pet
      name: influxdb-persistent-storage
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

See inline comments and docs mentioned in #260 (comment) for more context. Also note that both petset/volume provisioning are in alpha.

@bprashanth
Copy link
Contributor

But if you're provisioning the volume yourself, as part of kube-up, then why do you need petset? It doens't look like you're using its DNS property either?

@ddysher
Copy link
Contributor Author

ddysher commented Jul 11, 2016

@bprashanth same thoughts here. What's the benefits of using petset for monitoring? It doesn't seems like the addon is stateful.

@jszczepkowski
Copy link
Contributor

@bprashanth @ddysher
We need influxdb to write to a persistent volume, so after pod failure monitoring history is not lost. The persistent volume is mount in read-write mode, so, we want a single instance (one pod) to be attached to it. In my opinion this is one of pet sets' use cases, see the examples in pet set doc.

@jszczepkowski
Copy link
Contributor

@bprashanth
So, you are proposing the following modifications of petset.yaml:

  • add serviceName to spec,
  • move pvc from separate file to volumeClaimTemplates section

Am I right?

@bprashanth
Copy link
Contributor

I think serviceName is already in the spec https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-petset.yaml#L72
Today you hand create a pd/pv: https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/cluster-monitoring/influxdb/influxdb-pv.yaml

That makes scaling the rc hard.
If you want the petset to create this pd as a unique disk per pet, you should use the volumeTemplate method described above. This won't matter if you're never going to scale the petset.

With petset you also get a DNS name, if you require it to cluster your scaled influx instances (most cluster software needs it): https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/petset.md#network-identity. If you can add a clustered influx db example using petset, and an e2e that uses petset (we already have a few: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/petset.go#L223) that would be great.

@jszczepkowski
Copy link
Contributor

@bprashanth
Please see the follow-up in #28840.

@zmerlynn
Copy link
Member

zmerlynn commented Aug 3, 2016

The kubernetes-e2e-gce-master-on-cvm build is broken because of this bug. Please prioritize this.

@bprashanth
Copy link
Contributor

Suggest not using petset till we have cluster shutdown sorted out, or meticuously deleting all resources from kube-down without relying on order of shutdown of controllers/namespace etc (you need to continue deleting pd like we previously did).

@zmerlynn
Copy link
Member

zmerlynn commented Aug 3, 2016

If PetSet is broken without a shutdown API, can we back out the influxdb change?

@piosz
Copy link
Member

piosz commented Aug 25, 2016

This was fixed by #30080. Please reopen in case or more problems.

@piosz piosz closed this as completed Aug 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants