-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure all the running containers are killed on daemon shutdown #12400
Ensure all the running containers are killed on daemon shutdown #12400
Conversation
can we have a daemon restart integration-cli test for this |
also thanks for hunting this down, I'm sure it was not the easiest thing to debug |
6ee7159
to
6d3c837
Compare
ping @jfrazelle @cpuguy83 add a test, can we run the test with |
Ah I see well we do this on all master builds for all the graph drivers so But if we have a test then the overlay master build will fail on Jenkins if On Wednesday, April 15, 2015, Lei Jitang notifications@github.com wrote:
|
Ping @crosbymichael @LK4D4 this is what I was thinking was related to your On Wednesday, April 15, 2015, Jessica Frazelle jess@docker.com wrote:
|
Can you PTAL at #12423 and see if it solves this for you On Wednesday, April 15, 2015, Jessica Frazelle jess@docker.com wrote:
|
@jfrazelle I have tested, #12423 can't solve this, it solve a different problem |
This one is to ensure the container is cleanuped (unmount the rootfs of the container) on docker daemon shutdown. |
ping @jfrazelle @cpuguy83 |
In some other refactoring here @estesp and I felt like we should bump this engine wait to something significantly higher. So if we set the timeout really high it shouldn't really hurt anything. |
My fix is to use
But just set the timeout really high may not work for some container in |
ping @jfrazelle @cpuguy83 @LK4D4 WDYT? |
I think this is a dup of #11988, which was closed yesterday because we don't want to SIGKILL containers on daemon exit. |
Hmm, but it will wait for 10 seconds. I dunno, you will kill them anyway with killing their parent - docker. |
Design seems legit for me, it is basically the same, but more understandable and secure. |
@LK4D4 what about comments of @unclejack and others from the other PR (for example: #11988 (comment)) that in some cases it is potentially very bad to SIGKILL, even after waiting and finding the container unresponsive? However, it seems like the problem of "un-cleaned up" container rootFS's needs to be fixed--this PR seems to fix it cleanly, but still leaves open the window of opportunity for a container with a long-shutdown cycle (closing a ton of connections, performing a significantly large store-to-disk operation, etc.) to be unceremoniously killed in the middle of what would be a reasonable shutdown if the timeout didn't exist. As @cpuguy83 we thought about making the timeout much larger to accommodate, but even that is hacky in the case that it is unpredictable how much time is required. |
by the way..with engine gone, this PR needs a rebase to not use engine. |
@estesp I'm not sure what will be with that processes without parent. I think they probably will just die. I'll try tomorrow. |
@rohansingh I think separate will be easier. |
LGTM |
28f6007
to
958f980
Compare
update as @LK4D4 's suggestion |
Also I believe this closes #7086 |
Also needs another rebase. :( |
@rohansingh @LK4D4 I opened a PR here for increasing upstart kill timeout: #13359 |
958f980
to
555f91b
Compare
@cpuguy83 rebased |
|
2ecbe0b
to
7a616d2
Compare
@coolljt0725 I added this to the 1.7 milestone, could you rebase the PR (again .. )? |
7a616d2
to
bdb7707
Compare
Signed-off-by: Lei Jitang <leijitang@huawei.com>
ping @LK4D4 @jfrazelle @thaJeztah rebased |
ping @cpuguy83 @LK4D4 @jfrazelle @thaJeztah any new about this? |
Still LGTM |
LGTM. Merging 🤘 |
…mon_shutdown Ensure all the running containers are killed on daemon shutdown
cherry-picked |
Signed-off-by: Lei Jitang leijitang@huawei.com
On daemon shutdown, the daemon will be killed in 10s whether the running
containers are killed or not( the running containers are supposed to be killed in 10s).
This will cause some problems for the containers which are not killed in 10s. One
problem is that it will cause an error when remove these containers on daemon next
start.The reason is that the container rootfs is still mounted. The containers' rootfs supposed
to be unmounted when the container is killed.
Steps to reproduce(
storage driver: overlay
):Step 1. Run a container
docker run -ti --name test busybox
Step 2. Shutdown the docker daemon
Step 3. restart the docker daemon
Step 4. Remove the container
test
which is the container run on step 1I have test with storage
overlay
anddevicemapper
, have the same problem.devicemapper
will show
device busy
What cause this is that container
test
isn't killed in 10s, but the docker daemon timeout andkilled, so the cleanup of the container is not done and leave the container with its rootfs still mounted.
My fixed is use
container.stop(10)
instead ofc.KillSig(15)
,container.stop(10)
will sendSIGTERM
to container and wait 10s,if the container failed to exit within 10s of SIGTERM, then send
SIGKILL
.And change the daemon shutdown timeout from
10s
to15s
WDYT? ping @cpuguy83 @estesp @LK4D4 @jfrazelle