Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.10.1 and a lot of pods in Unknown state #4415

Closed
pires opened this issue Feb 13, 2015 · 15 comments
Closed

0.10.1 and a lot of pods in Unknown state #4415

pires opened this issue Feb 13, 2015 · 15 comments
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@pires
Copy link
Contributor

pires commented Feb 13, 2015

One of my best test-cases for each Kubernetes release is to assemble and query an Elasticsearch cluster. Right now, with 0.10.1 I see a lot of pods in Unknown state but assigned to minions. Curiously enough, I ssh into minions and look for Docker logs and I see some containers that ran once but shouldn't have run in the first place, since kube API states that it was assigned to a different host.

With 0.9.2 it works flawlessly. Can't try with 0.9.3 because the binaries weren't released.

@brendandburns
Copy link
Contributor

Will dig into this and try to repro. 0.10.1 passes our e2e tests...
Though we have found and fixed some bugs in pod status recently.

If you want the 0.9.3 binaries, I can def. push them today, let me know.

Brendan
On Feb 13, 2015 3:12 AM, "Paulo Pires" notifications@github.com wrote:

One of my best test-cases for each Kubernetes release is to assemble and
query an Elasticsearch cluster
https://github.com/pires/kubernetes-elasticsearch-cluster. Right now,
with 0.10.1 I see a lot of pods in Unknown state but assigned to a
minion. Curiously enough, I ssh into minions and look for Docker logs and I
see some containers that ran once but shouldn't have run in the first
place, since kube API states that it was assigned to a different host.

With 0.9.2 it works flawlessly. Can't try with 0.9.3 because the binaries
weren't released
#4277.


Reply to this email directly or view it on GitHub
#4415.

@pires
Copy link
Contributor Author

pires commented Feb 13, 2015

@brendanburns just for the sake of testing with 0.9.3 and limit the changelog window for future debugging, yeah do it, please.

@saad-ali saad-ali added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Feb 13, 2015
@brendandburns
Copy link
Contributor

v0.9.3 is pushed to the usual locations.

@pires
Copy link
Contributor Author

pires commented Feb 13, 2015

Thanks. Will try and let you know.
On Feb 13, 2015 6:50 PM, "Brendan Burns" notifications@github.com wrote:

v0.9.3 is pushed to the usual locations.


Reply to this email directly or view it on GitHub
#4415 (comment)
.

@pires
Copy link
Contributor Author

pires commented Feb 13, 2015

0.9.3 works as well.

@pires
Copy link
Contributor Author

pires commented Feb 13, 2015

Once again, tried 0.10.1 and the issue is present.

@brendandburns
Copy link
Contributor

If I send you a release tarball at head, can you test that?

Brendan
On Feb 13, 2015 1:19 PM, "Paulo Pires" notifications@github.com wrote:

Once again, tried 0.10.1 and the issue is present.


Reply to this email directly or view it on GitHub
#4415 (comment)
.

@pires
Copy link
Contributor Author

pires commented Feb 13, 2015

I could build my own but for the sake of Friday laziness, please do.

On Fri, Feb 13, 2015 at 9:46 PM, Brendan Burns notifications@github.com
wrote:

If I send you a release tarball at head, can you test that?

Brendan
On Feb 13, 2015 1:19 PM, "Paulo Pires" notifications@github.com wrote:

Once again, tried 0.10.1 and the issue is present.


Reply to this email directly or view it on GitHub
<
#4415 (comment)

.


Reply to this email directly or view it on GitHub
#4415 (comment)
.

Paulo Pires

@brendandburns
Copy link
Contributor

@pires
Copy link
Contributor Author

pires commented Feb 14, 2015

With this build the error doesn't show up but now I have no env vars in my containers, which means no access to the API. I'm probably needing some sleep and am messing up... but all I did was to recreate the cluster with the provided binaries.

$ kubectl get pods
POD                          IP                  CONTAINER(S)           IMAGE(S)                     HOST                        LABELS                                STATUS
elasticsearch-master-fplln   10.244.56.2         elasticsearch-master   pires/elasticsearch:master   172.17.8.103/172.17.8.103   component=elasticsearch,role=master   Running
$ kubectl get service elasticsearch
NAME                LABELS              SELECTOR                                     IP                  PORT
elasticsearch       <none>              component=elasticsearch,role=load-balancer   10.100.15.178       9200
$ docker ps
CONTAINER ID        IMAGE                        COMMAND                CREATED             STATUS              PORTS               NAMES
6ed53093e989        pires/elasticsearch:master   "/usr/bin/runsvdir -   5 minutes ago       Up 5 minutes                            k8s_elasticsearch-master.6d831f7e_elasticsearch-master-fplln.default.etcd_49bcabb5-b3dc-11e4-a8b4-08002714726a_948a3cf0
d01a608c6b75        kubernetes/pause:go          "/pause"               16 minutes ago      Up 16 minutes                           k8s_POD.8f3eed67_elasticsearch-master-fplln.default.etcd_49bcabb5-b3dc-11e4-a8b4-08002714726a_61589a5a
$ docker exec 6ed53093e989 env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=elasticsearch-master-fplln
HOME=/root
JAVA_HOME=/usr/lib/jvm/java-8-oracle
ES_PKG_NAME=elasticsearch-1.4.2

@pires
Copy link
Contributor Author

pires commented Feb 14, 2015

core@master ~ $ /opt/bin/kube-apiserver --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty
core@master ~ $ /opt/bin/kube-controller-manager --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty
core@master ~ $ /opt/bin/kube-scheduler --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty
core@node-02 ~ $ /opt/bin/kube-proxy --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty
core@node-02 ~ $ /opt/bin/kubelet --version
Kubernetes v0.10.0-506-gb23230e616ac56-dirty

@pires
Copy link
Contributor Author

pires commented Feb 17, 2015

Related to #4462?

@dchen1107
Copy link
Member

I think my PR #4376 should fix most of Unknown state here unless the node status is unreachable.

@roberthbailey
Copy link
Contributor

@pires Is there anything left here or can I mark this as closed?

@pires
Copy link
Contributor Author

pires commented Mar 2, 2015

Works with 0.11.0.

@pires pires closed this as completed Mar 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests

5 participants