Mitigate impact of docker 1.8 ErrNoAvailableIPs bug #19477

2opremio · 2016-01-11T14:55:12Z

I am using a kubernetes 1.1 cluster

It seems that errors creating containers cause kubelet to leak IPs.

See the following excerpt from kubectl describe pod <podname> after creating a replication controller whose pod was exposing an address already in use by the host.

27m       27m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id d818e55df2df
27m       27m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id d818e55df2df with error: API err
or (500): Cannot start container d818e55df2dfa752465c29d6f17dba04ff7e861ca9fdb3561708ca2e68f00310: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

27m       27m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container d818e55df2dfa7524
65c29d6f17dba04ff7e861ca9fdb3561708ca2e68f00310: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id cf9868377897
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id cf9868377897 with error: API err
or (500): Cannot start container cf986837789706360b24a37d0a83da051a37600a1a3551561a8afe0ad6291fb5: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container cf986837789706360
b24a37d0a83da051a37600a1a3551561a8afe0ad6291fb5: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id a0c2b96923f1
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id a0c2b96923f1 with error: API err
or (500): Cannot start container a0c2b96923f14a6986c272529ba5736d898f29db4f8a31ef797760e7e6bbf485: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container a0c2b96923f14a698
6c272529ba5736d898f29db4f8a31ef797760e7e6bbf485: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id b173f175fc1f with error: API err
or (500): Cannot start container b173f175fc1f8c63c26e14e8bd6198af5ed85f01fbec8b0b472b078e7e02f8ed: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created      {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id b173f175fc1f
26m       26m       1         weave-scope-app-ae4nl   Pod                                           FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start c
ontainer b173f175fc1f8c63c26e14e8bd6198af5ed85f01fbec8b0b472b078e7e02f8ed: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id 8ac5fab798d8
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id 8ac5fab798d8 with error: API err
or (500): Cannot start container 8ac5fab798d862a8e35b400aaa7e453c23e289b2c3c444741e218756e774c747: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container 8ac5fab798d862a8e
35b400aaa7e453c23e289b2c3c444741e218756e774c747: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id 747245aa5c0a
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id 747245aa5c0a with error: API err
or (500): Cannot start container 747245aa5c0a0322deec8ff882e4f93ce45529d98b40177dc9bbf201ac393aed: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container 747245aa5c0a0322d
eec8ff882e4f93ce45529d98b40177dc9bbf201ac393aed: no available ip addresses on network

The first error is legitimate: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use because the host was in fact using port 4040.

However, after some time of retrying (during which I was investigating why that address was in use and forgot to delete the replication controller), the error changed to no available ip addresses on network

This is wrong since no other pod was being started in the meantime and there were plenty of IPs available (The host was only running 10 pods and I am using a /24 CIDR for cbr0 which should allow for 256 pods)

I waited a few minutes to see if kubelet somehow garbage collected the IPs of the failed containers, but it was only after manually removing all the Exited docker containers with sudo docker ps -a | grep Exit | cut -d ' ' -f 1 | xargs sudo docker rm

My guess is kubelet reserves IPs for the failing containers but never deallocates them (at least not in a reasonable amount of time).

The text was updated successfully, but these errors were encountered:

dchen1107 · 2016-01-13T01:32:44Z

cc/ @ArtfulCoder This looks like the issue I ran into a while back, but never could reproduce it. cc/ @thockin

@2opremio Can you run 'docker info', and copy&paste of the result here. Can you also do "docker ps -a" and check how many are running and how many have exited. One possible issue of Kubelet might have which causes this leakage is that Kubelet failed to recycle POD infra container holding the network namespace for each pod.

If there is no POD container leakage I mentioned above, you might run into this docker network issue. Actually docker does ip allocation here.

2opremio · 2016-01-13T02:18:51Z

Can you run 'docker info', and copy&paste of the result here.

Sure:

$ sudo docker info
Containers: 42
Images: 312
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 396
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-33-generic
Operating System: Ubuntu 15.04
CPUs: 2
Total Memory: 3.858 GiB
Name: ip-172-20-0-219
ID: KAL6:AILG:HXMN:BBRG:CNWK:4ZCY:3RN3:5L3P:Y4DQ:UAN5:5NRW:KCCE
WARNING: No swap limit support

BTW, am running k8s in AWS using a setup very similar to the one create wtih https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/aws.md

Can you also do "docker ps -a" and check how many are running and how many have exited

As I mentioned above, I cleaned the Exited containers with sudo docker ps -a | grep Exit | cut -d ' ' -f 1 | xargs sudo docker rm.

Before doing this, there were a lot of them in Exited state due to the kubelet retries.

One possible issue of Kubelet might have which causes this leakage is that Kubelet failed to recycle POD infra container holding the network namespace for each pod.

I don't think that's the case here. There are no dangling /pause containers in execution. I only see the expected ones.

Actually docker does ip allocation here.

I may be missing something, but then I don't see why Docker doesn't deallocate the IP of containers once they Exit. Maybe it's a bug triggered by the bind: address already in use error.

In any case, knowing this is Docker's behaviour, kubelet should probably garbage collect the Exited containers immediately after retrying. Or, at the very least, set a limit on the number of Exited containers to keep.

bprashanth · 2016-01-13T17:58:59Z

This is most likely a docker issue fixed in 1.9, I can repro on 1.8 quite easily with:

# docker run -d -p 80:80 gcr.io/google_containers/nginx
# while true; do docker run -d -p 80:80 gcr.io/google_containers/nginx; done
# while true; do docker run -it busybox sh -c 'ip addr | grep eth0'; done
14765: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue 
    inet 10.245.1.104/24 scope global eth0
14771: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue 
    inet 10.245.1.225/24 scope global eth0
Error response from daemon: Cannot start container 3415f31b1ac17487c304599a2926af2cabcd7f2544738c7e4d77acf5cebb1850: no available ip addresses on network

If I had to guess I'd say it's somthing to do with hitting 255 and wrapping around to 245.1.0, when the bridge is offset by one. Since 1.8 they seem to have changed their ip allocation policy to reuse ips as they're released.

I may be missing something, but then I don't see why Docker doesn't deallocate the IP of containers once they Exit. Maybe it's a bug triggered by the bind: address already in use error.

Yeah, it's related to the hostport.

In any case, knowing this is Docker's behaviour, kubelet should probably garbage collect the Exited containers immediately after retrying. Or, at the very least, set a limit on the number of Exited containers to keep.

In any case when we move to using CNI's bridge plugin we will be handling ipam ourselves.

2opremio · 2016-01-14T10:19:54Z

Related: moby/moby#14788 flannel-io/flannel#315

2opremio · 2016-01-14T10:47:06Z

Closing this, since I am now convinced it's a docker bug (most probably moby/moby#14788).

BTW, removing the exited containers doesn't seem to be the solution (it's just a mitigation which deallocates a few ips). Docker really is leaking IPs from containers which do not exist anymore (it reaches a point in which removing Exited containers doesn't help and the number of running containers is way below the number of IPs which the bridge provides).

The solution seems to be upgrading to docker >= 1.9 which we don't want to do due to its performance problems: moby/moby#17720

bprashanth · 2016-01-14T15:58:05Z

@kubernetes/goog-node @kubernetes/goog-cluster this is an easy DOS attack. If we go to 1.2 with docker 1.8 we should detect and mitigate.

dchen1107 · 2016-08-31T16:02:06Z

Kubernetes drops support for docker 1.8.X, close the issue.

2opremio changed the title ~~Kubelet leaks POD IPs on container creation failures~~ Kubelet leaks pod IPs on container creation failures Jan 11, 2016

bgrant0607-nocc added sig/node Categorizes an issue or PR as relevant to SIG Node. team/cluster labels Jan 11, 2016

dchen1107 added the area/docker label Jan 13, 2016

2opremio mentioned this issue Jan 14, 2016

container failed to start due to network issue moby/moby#14788

Closed

2opremio closed this as completed Jan 14, 2016

bprashanth reopened this Jan 14, 2016

bprashanth changed the title ~~Kubelet leaks pod IPs on container creation failures~~ Mitigate impact of docker 1.8 ErrNoAvailableIPs bug Jan 14, 2016

This was referenced Jan 14, 2016

Add alert on error creating a POD weaveworks/service#290

Closed

Upgrade Docker weaveworks/service#299

Closed

bprashanth mentioned this issue Feb 20, 2016

docker 1.9.1: No available IPv4 addresses on this network's address pools: bridge #21523

Closed

dchen1107 closed this as completed Aug 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate impact of docker 1.8 ErrNoAvailableIPs bug #19477

Mitigate impact of docker 1.8 ErrNoAvailableIPs bug #19477

2opremio commented Jan 11, 2016

dchen1107 commented Jan 13, 2016

2opremio commented Jan 13, 2016

bprashanth commented Jan 13, 2016

2opremio commented Jan 14, 2016

2opremio commented Jan 14, 2016

bprashanth commented Jan 14, 2016

dchen1107 commented Aug 31, 2016

Mitigate impact of docker 1.8 ErrNoAvailableIPs bug #19477

Mitigate impact of docker 1.8 ErrNoAvailableIPs bug #19477

Comments

2opremio commented Jan 11, 2016

dchen1107 commented Jan 13, 2016

2opremio commented Jan 13, 2016

bprashanth commented Jan 13, 2016

2opremio commented Jan 14, 2016

2opremio commented Jan 14, 2016

bprashanth commented Jan 14, 2016

dchen1107 commented Aug 31, 2016