Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flannel not starting on master on Azure #4309

Closed
chanezon opened this issue Feb 11, 2015 · 2 comments · Fixed by #4425
Closed

Flannel not starting on master on Azure #4309

chanezon opened this issue Feb 11, 2015 · 2 comments · Fixed by #4425
Labels
kind/support Categorizes issue or PR as a support question. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@chanezon
Copy link

When following docs https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/coreos_multinode_cluster.md
On Azure.
Using CoreOS alpha (584.0.0)

Flannel fails to start on master node, initialized with cloud-init https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/cloud-configs/master.yaml
Which version of CoreOS, Flannel and Docker has this guide been tested with?

Flannel starts well on regular nodes.

core@pat-coreos-kube12-coreos-0 ~ $ sudo systemctl status flannel
● flannel.service - flannel is an etcd backed overlay network for containers
Loaded: loaded (/etc/systemd/system/flannel.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2015-02-11 06:52:27 UTC; 1min 22s ago
Process: 671 ExecStartPre=/usr/bin/etcdctl mk /coreos.com/network/config {"Network":"10.244.0.0/16", "Backend": {"Type": "vxlan"}} (code=exited, status=4)
Process: 668 ExecStartPre=/usr/bin/chmod +x /opt/bin/flanneld (code=exited, status=0/SUCCESS)
Process: 665 ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/k8s/flanneld (code=exited, status=0/SUCCESS)
Process: 662 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)

Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Resolving storage.googleapis.com... 74.125.239.139, 74.125.239.140, 74.125.239.138, ...
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Connecting to storage.googleapis.com|74.125.239.139|:443... connected.
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: HTTP request sent, awaiting response... 200 OK
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Length: 7784547 (7.4M) [binary/octet-stream]
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Server file no newer than local file '/opt/bin/flanneld' -- not retrieving.
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 etcdctl[671]: Error: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: flannel.service: control process exited, code=exited status=4
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: Failed to start flannel is an etcd backed overlay network for containers.
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: Unit flannel.service entered failed state.

Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: flannel.service failed.

/usr/bin/etcdctl ls --recursive /
/coreos.com
/coreos.com/network
/coreos.com/network/config
/coreos.com/network/subnets
/coreos.com/network/subnets/10.244.33.0-24
/coreos.com/network/subnets/10.244.71.0-24
/coreos.com/updateengine
/coreos.com/updateengine/rebootlock
/coreos.com/updateengine/rebootlock/semaphore
/registry
/registry/controllers
/registry/controllers/default
/registry/controllers/default/my-nginx
/registry/events
/registry/events/default
/registry/minions
/registry/minions/100.69.136.59
/registry/minions/100.69.92.58
/registry/nodes
/registry/nodes/100.69.92.58
/registry/nodes/100.69.92.58/boundpods
/registry/nodes/100.69.136.59
/registry/nodes/100.69.136.59/boundpods
/registry/pods
/registry/pods/default
/registry/pods/default/6068d0b5-ad6c-11e4-986b-00155da9d98d
/registry/pods/default/60692a88-ad6c-11e4-986b-00155da9d98d
/registry/pods/default/redis-master
/registry/services
/registry/services/endpoints
/registry/services/endpoints/default
/registry/services/endpoints/default/kubernetes
/registry/services/endpoints/default/kubernetes-ro
/registry/services/specs
/registry/services/specs/default
/registry/services/specs/default/kubernetes
/registry/services/specs/default/kubernetes-ro


When I do
docker run -i -t ubuntu /bin/bash

root@e3af57412e4f:/#
root@e3af57412e4f:/# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:ac:11:00:02
inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0

The ip is in 172 space, not sure where that comes from.
On a minion node I get the expected ip in 10.244.x.x range:

docker run -i -t ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
ubuntu:latest: The image you are pulling has been verified
511136ea3c5a: Pull complete
27d47432a69b: Pull complete
5f92234dcf1e: Pull complete
51a9c7c1f8bb: Pull complete
5ba9dab47459: Pull complete
Status: Downloaded newer image for ubuntu:latest
root@a11bbd902bf5:/# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0a:f4:2e:02
inet addr:10.244.46.2 Bcast:0.0.0.0 Mask:255.255.255.0

@roberthbailey
Copy link
Contributor

/cc @jeffmendoza

@roberthbailey roberthbailey added kind/support Categorizes issue or PR as a support question. priority/support sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Feb 11, 2015
@chanezon
Copy link
Author

#4362 seems to fix it. It was an issue on AWS as well: if you reboot, flannel does not start because the StartPre task creating the etcd key is not idempotent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants