Flannel not starting on master on Azure #4309

chanezon · 2015-02-11T07:04:45Z

When following docs https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/coreos_multinode_cluster.md
On Azure.
Using CoreOS alpha (584.0.0)

Flannel fails to start on master node, initialized with cloud-init https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/cloud-configs/master.yaml
Which version of CoreOS, Flannel and Docker has this guide been tested with?

Flannel starts well on regular nodes.

core@pat-coreos-kube12-coreos-0 ~ $ sudo systemctl status flannel
● flannel.service - flannel is an etcd backed overlay network for containers
Loaded: loaded (/etc/systemd/system/flannel.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2015-02-11 06:52:27 UTC; 1min 22s ago
Process: 671 ExecStartPre=/usr/bin/etcdctl mk /coreos.com/network/config {"Network":"10.244.0.0/16", "Backend": {"Type": "vxlan"}} (code=exited, status=4)
Process: 668 ExecStartPre=/usr/bin/chmod +x /opt/bin/flanneld (code=exited, status=0/SUCCESS)
Process: 665 ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/k8s/flanneld (code=exited, status=0/SUCCESS)
Process: 662 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)

Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Resolving storage.googleapis.com... 74.125.239.139, 74.125.239.140, 74.125.239.138, ...
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Connecting to storage.googleapis.com|74.125.239.139|:443... connected.
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: HTTP request sent, awaiting response... 200 OK
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Length: 7784547 (7.4M) [binary/octet-stream]
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 wget[665]: Server file no newer than local file '/opt/bin/flanneld' -- not retrieving.
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 etcdctl[671]: Error: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: flannel.service: control process exited, code=exited status=4
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: Failed to start flannel is an etcd backed overlay network for containers.
Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: Unit flannel.service entered failed state.

Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: flannel.service failed.

/usr/bin/etcdctl ls --recursive /
/coreos.com
/coreos.com/network
/coreos.com/network/config
/coreos.com/network/subnets
/coreos.com/network/subnets/10.244.33.0-24
/coreos.com/network/subnets/10.244.71.0-24
/coreos.com/updateengine
/coreos.com/updateengine/rebootlock
/coreos.com/updateengine/rebootlock/semaphore
/registry
/registry/controllers
/registry/controllers/default
/registry/controllers/default/my-nginx
/registry/events
/registry/events/default
/registry/minions
/registry/minions/100.69.136.59
/registry/minions/100.69.92.58
/registry/nodes
/registry/nodes/100.69.92.58
/registry/nodes/100.69.92.58/boundpods
/registry/nodes/100.69.136.59
/registry/nodes/100.69.136.59/boundpods
/registry/pods
/registry/pods/default
/registry/pods/default/6068d0b5-ad6c-11e4-986b-00155da9d98d
/registry/pods/default/60692a88-ad6c-11e4-986b-00155da9d98d
/registry/pods/default/redis-master
/registry/services
/registry/services/endpoints
/registry/services/endpoints/default
/registry/services/endpoints/default/kubernetes
/registry/services/endpoints/default/kubernetes-ro
/registry/services/specs
/registry/services/specs/default
/registry/services/specs/default/kubernetes
/registry/services/specs/default/kubernetes-ro

When I do
docker run -i -t ubuntu /bin/bash

root@e3af57412e4f:/#
root@e3af57412e4f:/# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:ac:11:00:02
inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0

The ip is in 172 space, not sure where that comes from.
On a minion node I get the expected ip in 10.244.x.x range:

docker run -i -t ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
ubuntu:latest: The image you are pulling has been verified
511136ea3c5a: Pull complete
27d47432a69b: Pull complete
5f92234dcf1e: Pull complete
51a9c7c1f8bb: Pull complete
5ba9dab47459: Pull complete
Status: Downloaded newer image for ubuntu:latest
root@a11bbd902bf5:/# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0a:f4:2e:02
inet addr:10.244.46.2 Bcast:0.0.0.0 Mask:255.255.255.0

roberthbailey · 2015-02-11T18:02:01Z

/cc @jeffmendoza

chanezon · 2015-02-12T00:18:06Z

#4362 seems to fix it. It was an issue on AWS as well: if you reboot, flannel does not start because the StartPre task creating the etcd key is not idempotent.

Fixes #4309 #4362

roberthbailey added kind/support Categorizes issue or PR as a support question. priority/support sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Feb 11, 2015

roberthbailey assigned jeffmendoza Feb 12, 2015

pires added a commit to pires/kubernetes that referenced this issue Feb 13, 2015

Fixes kubernetes#4309 kubernetes#4362

4c2819c

pires mentioned this issue Feb 13, 2015

Fixes #4309 #4362 #4425

Merged

bgrant0607 closed this as completed in #4425 Feb 17, 2015

bgrant0607 added a commit that referenced this issue Feb 17, 2015

Merge pull request #4425 from pires/fix_flannel_key_etcd

76ed22a

Fixes #4309 #4362

chanezon unassigned jeffmendoza Aug 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flannel not starting on master on Azure #4309

Flannel not starting on master on Azure #4309

chanezon commented Feb 11, 2015

roberthbailey commented Feb 11, 2015

chanezon commented Feb 12, 2015

Flannel not starting on master on Azure #4309

Flannel not starting on master on Azure #4309

Comments

chanezon commented Feb 11, 2015

Flannel starts well on regular nodes.

Feb 11 06:52:27 pat-coreos-kube12-coreos-0 systemd[1]: flannel.service failed.

roberthbailey commented Feb 11, 2015

chanezon commented Feb 12, 2015