0.10.1 and a lot of pods in Unknown state #4415

pires · 2015-02-13T11:11:47Z

One of my best test-cases for each Kubernetes release is to assemble and query an Elasticsearch cluster. Right now, with 0.10.1 I see a lot of pods in Unknown state but assigned to minions. Curiously enough, I ssh into minions and look for Docker logs and I see some containers that ran once but shouldn't have run in the first place, since kube API states that it was assigned to a different host.

With 0.9.2 it works flawlessly. Can't try with 0.9.3 because the binaries weren't released.

The text was updated successfully, but these errors were encountered:

brendandburns · 2015-02-13T15:21:52Z

Will dig into this and try to repro. 0.10.1 passes our e2e tests...
Though we have found and fixed some bugs in pod status recently.

If you want the 0.9.3 binaries, I can def. push them today, let me know.

Brendan
On Feb 13, 2015 3:12 AM, "Paulo Pires" notifications@github.com wrote:

One of my best test-cases for each Kubernetes release is to assemble and
query an Elasticsearch cluster
https://github.com/pires/kubernetes-elasticsearch-cluster. Right now,
with 0.10.1 I see a lot of pods in Unknown state but assigned to a
minion. Curiously enough, I ssh into minions and look for Docker logs and I
see some containers that ran once but shouldn't have run in the first
place, since kube API states that it was assigned to a different host.

With 0.9.2 it works flawlessly. Can't try with 0.9.3 because the binaries
weren't released
#4277.

—
Reply to this email directly or view it on GitHub
#4415.

pires · 2015-02-13T15:28:20Z

@brendanburns just for the sake of testing with 0.9.3 and limit the changelog window for future debugging, yeah do it, please.

brendandburns · 2015-02-13T18:49:28Z

v0.9.3 is pushed to the usual locations.

pires · 2015-02-13T18:50:59Z

Thanks. Will try and let you know.
On Feb 13, 2015 6:50 PM, "Brendan Burns" notifications@github.com wrote:

v0.9.3 is pushed to the usual locations.

—
Reply to this email directly or view it on GitHub
#4415 (comment)
.

pires · 2015-02-13T20:17:43Z

0.9.3 works as well.

pires · 2015-02-13T21:19:32Z

Once again, tried 0.10.1 and the issue is present.

brendandburns · 2015-02-13T21:45:29Z

If I send you a release tarball at head, can you test that?

Brendan
On Feb 13, 2015 1:19 PM, "Paulo Pires" notifications@github.com wrote:

Once again, tried 0.10.1 and the issue is present.

—
Reply to this email directly or view it on GitHub
#4415 (comment)
.

pires · 2015-02-13T22:01:09Z

I could build my own but for the sake of Friday laziness, please do.

On Fri, Feb 13, 2015 at 9:46 PM, Brendan Burns notifications@github.com
wrote:

If I send you a release tarball at head, can you test that?

Brendan
On Feb 13, 2015 1:19 PM, "Paulo Pires" notifications@github.com wrote:

Once again, tried 0.10.1 and the issue is present.

—
Reply to this email directly or view it on GitHub
<
#4415 (comment)

.

—
Reply to this email directly or view it on GitHub
#4415 (comment)
.

Paulo Pires

brendandburns · 2015-02-13T22:06:58Z

http://storage.googleapis.com/kubernetes-release/ci/v0.10.0-506-gb23230e/kubernetes.tar.gz

pires · 2015-02-14T00:14:05Z

With this build the error doesn't show up but now I have no env vars in my containers, which means no access to the API. I'm probably needing some sleep and am messing up... but all I did was to recreate the cluster with the provided binaries.

$ kubectl get pods
POD                          IP                  CONTAINER(S)           IMAGE(S)                     HOST                        LABELS                                STATUS
elasticsearch-master-fplln   10.244.56.2         elasticsearch-master   pires/elasticsearch:master   172.17.8.103/172.17.8.103   component=elasticsearch,role=master   Running

$ kubectl get service elasticsearch
NAME                LABELS              SELECTOR                                     IP                  PORT
elasticsearch       <none>              component=elasticsearch,role=load-balancer   10.100.15.178       9200

$ docker ps
CONTAINER ID        IMAGE                        COMMAND                CREATED             STATUS              PORTS               NAMES
6ed53093e989        pires/elasticsearch:master   "/usr/bin/runsvdir -   5 minutes ago       Up 5 minutes                            k8s_elasticsearch-master.6d831f7e_elasticsearch-master-fplln.default.etcd_49bcabb5-b3dc-11e4-a8b4-08002714726a_948a3cf0
d01a608c6b75        kubernetes/pause:go          "/pause"               16 minutes ago      Up 16 minutes                           k8s_POD.8f3eed67_elasticsearch-master-fplln.default.etcd_49bcabb5-b3dc-11e4-a8b4-08002714726a_61589a5a

$ docker exec 6ed53093e989 env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=elasticsearch-master-fplln
HOME=/root
JAVA_HOME=/usr/lib/jvm/java-8-oracle
ES_PKG_NAME=elasticsearch-1.4.2

pires · 2015-02-14T00:20:18Z

core@master ~ $ /opt/bin/kube-apiserver --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty
core@master ~ $ /opt/bin/kube-controller-manager --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty
core@master ~ $ /opt/bin/kube-scheduler --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty

core@node-02 ~ $ /opt/bin/kube-proxy --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty
core@node-02 ~ $ /opt/bin/kubelet --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty

pires · 2015-02-17T16:02:31Z

Related to #4462?

dchen1107 · 2015-02-17T19:08:09Z

I think my PR #4376 should fix most of Unknown state here unless the node status is unreachable.

roberthbailey · 2015-03-02T19:51:58Z

@pires Is there anything left here or can I mark this as closed?

pires · 2015-03-02T20:20:44Z

Works with 0.11.0.

pires mentioned this issue Feb 13, 2015

Pods cycling endlessly through Pending and Running in Guestbook example #4414

Closed

pires added a commit to pires/kubernetes-vagrant-coreos-cluster that referenced this issue Feb 13, 2015

Reverting to Kubernetes 0.9.2 because of kubernetes/kubernetes#4415.

98af160

saad-ali added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Feb 13, 2015

pires mentioned this issue Feb 19, 2015

Pod containers with no environment variables set #4614

Closed

pires closed this as completed Mar 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.10.1 and a lot of pods in Unknown state #4415

0.10.1 and a lot of pods in Unknown state #4415

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 13, 2015

pires commented Feb 13, 2015

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 14, 2015

pires commented Feb 14, 2015

pires commented Feb 17, 2015

dchen1107 commented Feb 17, 2015

roberthbailey commented Mar 2, 2015

pires commented Mar 2, 2015

0.10.1 and a lot of pods in Unknown state #4415

0.10.1 and a lot of pods in Unknown state #4415

Comments

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 13, 2015

pires commented Feb 13, 2015

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 13, 2015

brendandburns commented Feb 13, 2015

pires commented Feb 14, 2015

pires commented Feb 14, 2015

pires commented Feb 17, 2015

dchen1107 commented Feb 17, 2015

roberthbailey commented Mar 2, 2015

pires commented Mar 2, 2015