Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate impact of docker 1.8 ErrNoAvailableIPs bug #19477

Closed
2opremio opened this issue Jan 11, 2016 · 7 comments
Closed

Mitigate impact of docker 1.8 ErrNoAvailableIPs bug #19477

2opremio opened this issue Jan 11, 2016 · 7 comments
Labels
area/docker sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@2opremio
Copy link

I am using a kubernetes 1.1 cluster

It seems that errors creating containers cause kubelet to leak IPs.

See the following excerpt from kubectl describe pod <podname> after creating a replication controller whose pod was exposing an address already in use by the host.

27m       27m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id d818e55df2df
27m       27m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id d818e55df2df with error: API err
or (500): Cannot start container d818e55df2dfa752465c29d6f17dba04ff7e861ca9fdb3561708ca2e68f00310: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

27m       27m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container d818e55df2dfa7524
65c29d6f17dba04ff7e861ca9fdb3561708ca2e68f00310: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id cf9868377897
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id cf9868377897 with error: API err
or (500): Cannot start container cf986837789706360b24a37d0a83da051a37600a1a3551561a8afe0ad6291fb5: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container cf986837789706360
b24a37d0a83da051a37600a1a3551561a8afe0ad6291fb5: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id a0c2b96923f1
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id a0c2b96923f1 with error: API err
or (500): Cannot start container a0c2b96923f14a6986c272529ba5736d898f29db4f8a31ef797760e7e6bbf485: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container a0c2b96923f14a698
6c272529ba5736d898f29db4f8a31ef797760e7e6bbf485: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id b173f175fc1f with error: API err
or (500): Cannot start container b173f175fc1f8c63c26e14e8bd6198af5ed85f01fbec8b0b472b078e7e02f8ed: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created      {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id b173f175fc1f
26m       26m       1         weave-scope-app-ae4nl   Pod                                           FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start c
ontainer b173f175fc1f8c63c26e14e8bd6198af5ed85f01fbec8b0b472b078e7e02f8ed: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id 8ac5fab798d8
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id 8ac5fab798d8 with error: API err
or (500): Cannot start container 8ac5fab798d862a8e35b400aaa7e453c23e289b2c3c444741e218756e774c747: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container 8ac5fab798d862a8e
35b400aaa7e453c23e289b2c3c444741e218756e774c747: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Created   {kubelet ip-172-20-0-221.ec2.internal}   Created with docker id 747245aa5c0a
26m       26m       1         weave-scope-app-ae4nl   Pod       implicitly required container POD   Failed    {kubelet ip-172-20-0-221.ec2.internal}   Failed to start with docker id 747245aa5c0a with error: API err
or (500): Cannot start container 747245aa5c0a0322deec8ff882e4f93ce45529d98b40177dc9bbf201ac393aed: no available ip addresses on network

26m       26m       1         weave-scope-app-ae4nl   Pod                 FailedSync   {kubelet ip-172-20-0-221.ec2.internal}   Error syncing pod, skipping: API error (500): Cannot start container 747245aa5c0a0322d
eec8ff882e4f93ce45529d98b40177dc9bbf201ac393aed: no available ip addresses on network

The first error is legitimate: Error starting userland proxy: listen tcp 0.0.0.0:4040: bind: address already in use because the host was in fact using port 4040.

However, after some time of retrying (during which I was investigating why that address was in use and forgot to delete the replication controller), the error changed to no available ip addresses on network

This is wrong since no other pod was being started in the meantime and there were plenty of IPs available (The host was only running 10 pods and I am using a /24 CIDR for cbr0 which should allow for 256 pods)

I waited a few minutes to see if kubelet somehow garbage collected the IPs of the failed containers, but it was only after manually removing all the Exited docker containers with sudo docker ps -a | grep Exit | cut -d ' ' -f 1 | xargs sudo docker rm

My guess is kubelet reserves IPs for the failing containers but never deallocates them (at least not in a reasonable amount of time).

@2opremio 2opremio changed the title Kubelet leaks POD IPs on container creation failures Kubelet leaks pod IPs on container creation failures Jan 11, 2016
@bgrant0607-nocc bgrant0607-nocc added sig/node Categorizes an issue or PR as relevant to SIG Node. team/cluster labels Jan 11, 2016
@dchen1107
Copy link
Member

cc/ @ArtfulCoder This looks like the issue I ran into a while back, but never could reproduce it. cc/ @thockin

@2opremio Can you run 'docker info', and copy&paste of the result here. Can you also do "docker ps -a" and check how many are running and how many have exited. One possible issue of Kubelet might have which causes this leakage is that Kubelet failed to recycle POD infra container holding the network namespace for each pod.

If there is no POD container leakage I mentioned above, you might run into this docker network issue. Actually docker does ip allocation here.

@2opremio
Copy link
Author

Can you run 'docker info', and copy&paste of the result here.

Sure:

$ sudo docker info
Containers: 42
Images: 312
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 396
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-33-generic
Operating System: Ubuntu 15.04
CPUs: 2
Total Memory: 3.858 GiB
Name: ip-172-20-0-219
ID: KAL6:AILG:HXMN:BBRG:CNWK:4ZCY:3RN3:5L3P:Y4DQ:UAN5:5NRW:KCCE
WARNING: No swap limit support

BTW, am running k8s in AWS using a setup very similar to the one create wtih https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/aws.md

Can you also do "docker ps -a" and check how many are running and how many have exited

As I mentioned above, I cleaned the Exited containers with sudo docker ps -a | grep Exit | cut -d ' ' -f 1 | xargs sudo docker rm.

Before doing this, there were a lot of them in Exited state due to the kubelet retries.

One possible issue of Kubelet might have which causes this leakage is that Kubelet failed to recycle POD infra container holding the network namespace for each pod.

I don't think that's the case here. There are no dangling /pause containers in execution. I only see the expected ones.

Actually docker does ip allocation here.

I may be missing something, but then I don't see why Docker doesn't deallocate the IP of containers once they Exit. Maybe it's a bug triggered by the bind: address already in use error.

In any case, knowing this is Docker's behaviour, kubelet should probably garbage collect the Exited containers immediately after retrying. Or, at the very least, set a limit on the number of Exited containers to keep.

@bprashanth
Copy link
Contributor

This is most likely a docker issue fixed in 1.9, I can repro on 1.8 quite easily with:

# docker run -d -p 80:80 gcr.io/google_containers/nginx
# while true; do docker run -d -p 80:80 gcr.io/google_containers/nginx; done
# while true; do docker run -it busybox sh -c 'ip addr | grep eth0'; done
14765: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue 
    inet 10.245.1.104/24 scope global eth0
14771: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue 
    inet 10.245.1.225/24 scope global eth0
Error response from daemon: Cannot start container 3415f31b1ac17487c304599a2926af2cabcd7f2544738c7e4d77acf5cebb1850: no available ip addresses on network

If I had to guess I'd say it's somthing to do with hitting 255 and wrapping around to 245.1.0, when the bridge is offset by one. Since 1.8 they seem to have changed their ip allocation policy to reuse ips as they're released.

I may be missing something, but then I don't see why Docker doesn't deallocate the IP of containers once they Exit. Maybe it's a bug triggered by the bind: address already in use error.

Yeah, it's related to the hostport.

In any case, knowing this is Docker's behaviour, kubelet should probably garbage collect the Exited containers immediately after retrying. Or, at the very least, set a limit on the number of Exited containers to keep.

In any case when we move to using CNI's bridge plugin we will be handling ipam ourselves.

@2opremio
Copy link
Author

Related: moby/moby#14788 flannel-io/flannel#315

@2opremio
Copy link
Author

Closing this, since I am now convinced it's a docker bug (most probably moby/moby#14788).

BTW, removing the exited containers doesn't seem to be the solution (it's just a mitigation which deallocates a few ips). Docker really is leaking IPs from containers which do not exist anymore (it reaches a point in which removing Exited containers doesn't help and the number of running containers is way below the number of IPs which the bridge provides).

The solution seems to be upgrading to docker >= 1.9 which we don't want to do due to its performance problems: moby/moby#17720

@bprashanth
Copy link
Contributor

@kubernetes/goog-node @kubernetes/goog-cluster this is an easy DOS attack. If we go to 1.2 with docker 1.8 we should detect and mitigate.

@bprashanth bprashanth reopened this Jan 14, 2016
@bprashanth bprashanth changed the title Kubelet leaks pod IPs on container creation failures Mitigate impact of docker 1.8 ErrNoAvailableIPs bug Jan 14, 2016
@dchen1107
Copy link
Member

Kubernetes drops support for docker 1.8.X, close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

4 participants