-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker 1.9 doesn't restart cleanly #20995
Comments
I don't think we test "restarts" specifically, so my guess is no. Based on moby/moby#17083 (comment), docker v1.10 has the same issue. |
cc/ @vishh on docker 1.9.X validation |
Both @freehan and @ArtfulCoder with docker 1.9.1 ran into this problem with their desktop. I helped them fix the issue by 1) remove their docker storage rootdir 2) remove .../docker/linkgraph.db 3) restart docker daemon |
This was a fresh install though, just e2e cluster up |
I did test docker "restart" when I validating docker release. We ran into this issue with docker 1.7-rc before. Here is the issue I filed to docker: moby/moby#13850 |
I don't think this is about the storage driver. I think it's related to networking state that was stored in var/lib/docker/network/files/local-kv.db, it went away when i nuked their local-kv.db (and didn't touch anything else). |
@bprashanth Looks like upon restart, both storage driver state and network state could be corrupted with docker 1.9 release, and nuking the checkpoint files are the only way to workaround. cc/ @andyzheng0831 You guys already build the image with docker 1.9.1. Any of your customer reports this problem? |
Fwiw this showed up on one node in a 300 node cluster |
Did you mean any user of k8s/gke reported this problem to us? No.
|
@bprashanth I know this is fresh installed node, but it still involves with docker restart here:
I suspected this is only triggered by docker upgrade since it involves checkpoint schema changes here. I will double check on it. Meanwhile, @vishh is going to test docker restart with 1.9.1 without upgrade being involved here. |
I've only seen this issue on an initial provisioning, subsequent restarts all was well. /cc @ncdc |
Talked to @andyzheng0831 offline. They have docker 1.9.1 being baked into their image, and there are roughly a couple of thousand instances running with such image today. They never receive any report on this docker issue. If we can rule out pure docker restart, we can still go ahead to release 1.2 with docker 1.9.1 since it only happens at initial upgrade stage. We can document it as known issue for this one. |
I should clarify, I've seen the docker behavior mentioned on an upgrade, not baked setup. |
@timothysc Thanks for quick updates on this very issue. In your case, does the initial provisioning involve a docker version upgrade? |
fyi doing exactly what we do for Kubelet in daemon-restart should work, if we need to test docker restart in an e2e: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/daemon_restart.go#L297 |
@dchen1107 - In the environment where this happened, yes. A docker upgrade was part of the process. |
Two cases I mentioned at #20995 (comment) also involve docker version upgrade. @bprashanth Testing docker restart is a valid test case for container runtime validation test suite. We should including upgrade case too. |
FYI, we build 1.9.1 in image and have thousands of instances running. Instances may be rebooted, but we never allow upgrade or downgrade docker version. So far we don't receive report about this kind of corruption issue. |
A small summary:
We are going to decide which version to go with 1.2 release once receiving the last signal from Vishnu's test. |
FYI: @Amey-D this is the last remaining issue for validating docker 1.9.1 for 1.2 release. |
Our stress test on docker restart over night indicates that clean restart of docker daemon without upgrade works fine. I am closing this one and using #21086 to track all required changes for our containerVM 1.2 release. |
I have a node stuck in NotReady with the symptoms described in: moby/moby#17083. Essentially supervisord keeps restarting docker.
docker logs: https://storage.googleapis.com/devnul/active_endpoints_docker.log
kubelet: https://storage.googleapis.com/devnul/active_endpoints_kubelet.log
kern (though this doens't looke like a kernel issue): https://storage.googleapis.com/devnul/active_endpoints_kern.log
A couple of weird things:
First it gets shutdown for some reason:
Then it complains about cbr0, which is probably ok (though still weird):
then it goes lookin in the store:
it's wedged:
But a rm -rf /var/lib/docker/network/files/ fixes it, though I suspect it was just some corrupt state in
/var/lib/docker/network/files/local-kv.db
.@kubernetes/goog-node have we tested restarts with 1.9?
The text was updated successfully, but these errors were encountered: