-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker 1.9.1: No available IPv4 addresses on this network's address pools: bridge #21523
Comments
I think the meta problem is docker made bridge networks persistent across restarts in 1.9 and fixed several bugs between then and 1.10. If we just want the old behavior, we might be able to get away with nuking local-kv.db on each restart. Some questions:
There are some intense solutions like writing our own IPAM plugin or cloning libnetwork IPAM from 1.10 as a "plugin" (assuming it has the fix), but it's too high risk for 1.2.0 IMO without pushing the release date. @kubernetes/goog-cluster |
I'm all for nuking the DB - there's no real value to us in having Is it fixed in 1.10? On Thu, Feb 18, 2016 at 7:23 PM, Prashanth B notifications@github.com
|
According to bug reports most persisting-networks-cross-restart problems are fixed in 1.10, but 1.10 also has content-addressable storage. It sounds like quite a few fixes made it into "1.9.2", which was later just converted into a 1.10-rc.
I believe the only real reason they persist the bridge network cross restart is so cross host networks work consistently. This is a feature we don't care about. IMO nuking a file across O(100) nodes is too hard to do manually, but we can consider just documenting this if it's rare enough with a lower number of pods per node. |
No, only ~92 ips were used around that time.
I have no idea how often this hits with 30 pods per node.
I did all three: GC'd all containers, deleted /var/lib/docker/network, and restarted docker. It didn't work. According to moby/moby#18535, two containers can have the same IP addresses. I didn't check if it happened in my cluster, but it'd have been harder to notice. |
We need to figure out a reliable repro, with whatever max-pods we're going to ship with, to understand the impact. @kubernetes/goog-node what's the chance of just going back to 1.8? |
What are the chances of validating 1.10? :) |
1.8 has the same issue, we use |
@onorua 1.8 had its problems (eg: #19477) but those are just bugs that we need to live with, most of them have workarounds, and only occur with high number of pods per node or some obscure case like HostPort. 1.9 made the leap to using libnetwork with persistent storage for bridge networks and that brough a host of other networking issues. Whether we use libnetwork or not in the long run, we need to ensure that this release (which will use the default docker bridge plugin) is stable. Can you please file a different bug with your exact repro so we can consider the impact? |
Isn't 1.10 going to be a major leap for hosted offerings like GKE because of the overhead involved in migrating to content-addressable storage? |
So to catch up a bit: the bug exists in d 1.9, probably not in d 1.10, but We can and should add logic to Kubelet to detect multiple pods with the The rest is pretty scary, if I understand correctly. Someone tell me I am On Sat, Feb 20, 2016 at 3:01 PM, Prashanth B notifications@github.com
|
We haven't invested the effort in getting a reliable repro. From what I can tell 1.9 is just bad for networking. @dchen1107 mentioned that we might still go back to 1.8.
Sounds practical. On detecting an IP conflict we should probably delete all participant pods, or sort by creation timestamp and leave the first. Not sure I'd want to put this in each podworker, @kubernetes/goog-node will have better suggestions. If we're going to code a workaround, we can also detect when docker run starts returning 500s in a goroutine:
|
I could reproduce the issue only during docker reboot test so far. Adding a little bit logging, one can see the docker daemon received SIGTERM and start to persistent current state, and refuse to start other containers by throwing such error message. Meanwhile, kubelet could pick up such error during next syncLoop, which is misleading here:
Second experimental I had is after docker back to normal stage, check if docker daemon will cap the number of containers due to mistaken No available IPv4 addresses issue:
But I didn't see we never run into such issue when docker is not at restart stage, but it is rare. I didn't see such issue reported through e2e tests at all. |
Sent too quickly... Since it is rare, and it only failed when a docker creation state, not affect all running containers, I inclined to not revert back docker 1.9.1 release. Instead, we should document this as known issue. EDIT: s/1.9.1/1.8.3 |
Overall I agree with 1.9+workaround+documentation. I'm just trying to figure out the workarounds.
I'm not sure how conclusive this is since we don't pay much attentiont to soak and regular e2es are running < 30 pods per node.
That is also bound to help with: #20916 (comment), but I think ordering will be imporant:
How do we detect step 2/3?
In the initial report, docker never came back to a normal stage. In fact it didn't even return after removing the checkpoint file. So do we need to totally avoid any docker request while it's persisting state, or risk that it gets wedged? If so, how do we transmit this to the kubelet? |
I guess just stop, poll till docker pid is gone, rm -rf /var/lib/network/files/*, start; should work. |
As mentioned a couple of time earlier in this issue, can we instead switch to v1.8.3 and instead focus our energy on validating docker v1.10 and fixing possible issues with that release? |
The users could have configured their docker network checkpoint directory. Are we going to detect that to complicate our simple babysitter script? And why if it is rare issue? |
+1 docker itself gave up on v1. and just moved on with v1.10. I think we've spent too much time investigating and working around the known issues of docker v1.9. |
I'm not comfortable shipping 1.9 without a workaround if there is potential that the node is hosed (i.e docker is wedged and no amout of restarting brings it back). Running into this on O(10) nodes in a 100-1000 node cluster will be a bad experience. |
In my case, this is how I reproduced it:
The test usually fails the second time if it's going to fail. Note that I didn't explicitly kill docker, and supvervisord showed no sign of docker being killed unexpectedly. |
I am totally open to any opinion on this, but @vishh @bprashanth and @yujuhong how much confidence you guys have with docker 1.10 release here? We validated docker 1.9.X release since Oct, 2015 (#16110). We performed functional tests, integration tests and performance validation tests. Our entire jenkins project moves to docker 1.9.1 test more than 2 weeks for continuous and soaking tests. We used to find tons of issues caused by upgrade through salt and restart docker initially, especially corrupted docker network / storage checkpoint files (#20995). After we baked docker binary into our image, we didn't run into that problem. On another side, we have docker issues with all previous release. For example, with kubelet 1.1 release, we have several documented docker issues against docker 1.8.3: |
I run into it on one of 3-cluster Nodes after running MaxPods test a couple of times (which seems consistent with repro scenario that @yujuhong suggested). I nuked the cluster, but I have all the logs if someone is interested.
|
What's worst in this, is that Kubelet keeps claiming that it's healthy when this happens, so more pods are scheduled on it. |
Dawn or Yu-Ju - anything I can do to help? This is tough one... |
@gmarek Can you send the logs on my way? Thanks! |
We merged #22293 and expect this to be v. rare. If that's not the case (because honestly we haven't 100% understood the core problem and haven't invested time in a reliable repro) we need to discuss. |
#21703 and #22293 are merged to remove potential corrupted docker checkpoint files on every docker daemon restart. I can reproduce the issue very easily at node bootup time without those two prs. But like I mentioned above, we might still run into this issue with churn of massive docker container creation and deletion. It should be very rare, and when we run into that issue:
In this case, I am going to close this issue and document it as known docker issue for 1.2 release. |
I want to see a doc that covers Docker versions and known issues, so that On Wed, Mar 2, 2016 at 3:52 PM, Dawn Chen notifications@github.com wrote:
|
+1 @thockin |
ref #21000 |
This causes failures in SchedulerPredicates MaxPods test (#19681) - all recent failures are caused by 'No available IPv4 addresses on this network's address pools: bridge' error. |
It's causing test flakyness, so adding flake label. |
@gmarek can you post logs? I looked through #19681 (comment) but nothing jumped out. The ipv4 error can happen on startup, and might show up in events, but it shouldn't persist. It shouldn't cause the node to flip to not ready, which is the failure mode I'd expect to cause the scheduler to fail to find a fit for 110 x 3 pods. |
It's e.g. in http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-e2e-gke-serial/883/consoleText
|
ipv4 error will persist once it happens from the past experiences. The node will stay ready but none of the pods can become running beyond that point. |
I think Dawn concluded that it was a startup time thing in > 90% of the cases, and we can solve this by nuking the checkpoint file when we restart docker for cbr0 in:
If it's not a startup time thing I think we should be smarter about our docker health check and continue to restart+nuke checkpoint when we detect it, till we roll docker version forward. |
Yes, that was Dawn's conclusion. It could be that the docker checker script wasn't doing its job correctly, or that this is actually a startup problem.
We should improve or docker health check in general. On the other hand, I think upgrading the docker version may even come before that. |
I am closing this issue for now. We are going to switch to docker 1.10, which should contain fix for this bug claimed by docker. Also the cluster team is working on to switch to cni solution for ip allocation, so that soon we will remove the dependency here. cc/ @freehan |
UPSTREAM: <carry>: oc: allow easy binding to SCC via RBAC Origin-commit: 8df740ef02f535a1e61ea67d71089a269dc6a36c
UPSTREAM: <carry>: oc: allow easy binding to SCC via RBAC Origin-commit: 8df740ef02f535a1e61ea67d71089a269dc6a36c
I ran into this issue twice today by running pod creation/deletion tests (100 pods per node).
After reading @aboch's comments in the issues (e.g., moby/moby#18535 (comment)), all three issues below have the same root cause.
After screening the fixes in libnetwork during that time, it seems like moby/libnetwork#771 might be the fix (though I could be totally wrong given my limited knowledge on networking).
Since we may go with docker 1.9.1 for v1.2, any thoughts on how to proceed with this problem?
(BTW, I couldn't find any existing issue on this. Feel free to close this one if there is already one)
/cc @dchen1107
The text was updated successfully, but these errors were encountered: