Processes may be leaked when docker is killed repeatedly in a short time frame #41450

yujuhong · 2017-02-15T00:45:50Z

Forked from #37580

docker version: 1.11.2
OS: gci

If docker gets killed repeated in a short time frame (while kubelet is running and trying to create containers), some container processes may get reparented to PID 1 and continue running, but no longer visible from the docker daemon.

This can be produced by running the Network should recover from ip leaks node e2e test.

test creates 100 pods with the pause image.
test restarts docker (systemctl restart docker) 6 times, with 20s interval in between.
test completes successfully.
Run ps -C "pause" -f and see multiple processes running the pause command still alive.
Run docker ps and see no running container.
Running the test a few times (< 3) should reproduce the issue.

EDIT: This issue has been there since the test was first introduced in Nov 2016. We will need to check and see if this is fixed in newer docker versions.

/cc @kubernetes/sig-node-bugs

The text was updated successfully, but these errors were encountered:

vishh · 2017-02-15T00:50:21Z

Are these processes still tracked by their respective cgroups?

…

On Tue, Feb 14, 2017 at 4:46 PM, Yu-Ju Hong ***@***.***> wrote: Forked from #37580 <#37580> If docker gets killed repeated in a short time frame (while kubelet is running and trying to create containers), some container processes may get reparented to PID 1 and continue running, but no longer visible from the docker daemon. This can be produced by running the Network should recover from ip leaks - test creates 100 pods with the pause image. - test restarts docker (systemctl restart docker) 6 times, with 20s interval in between. - test completes successfully. - Run ps -C "pause" -f and see multiple processes running the pause command still alive. - Run docker ps and see no running container. Running the test a few times (< 3) should reproduce the issue. /cc @kubernetes/sig-node-bugs <https://github.com/orgs/kubernetes/teams/sig-node-bugs> — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#41450>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKHeU9rHqKgrjpsw14lsyOPWRAYwdks5rckrZgaJpZM4MBLXy> .

yujuhong · 2017-02-15T01:47:34Z

Are these processes still tracked by their respective cgroups?

Didn't check when I still had the node. Should be easy to reproduce and verify though.

yujuhong · 2017-02-15T01:51:40Z

/cc @kubernetes/rh-cluster-infra to see if anyone has encountered this issue.

feiskyer · 2017-02-15T04:02:26Z

Does docker 1.12 have same problem?

yujuhong · 2017-03-01T01:17:16Z

Does docker 1.12 have same problem?

Don't know. No one has verified yet.

fejta-bot · 2017-12-21T15:18:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

yujuhong · 2017-12-21T18:27:32Z

Need to verify whether newer docker + COS image has the same issue

fejta-bot · 2018-03-21T19:01:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-04-20T19:18:52Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-05-20T20:06:09Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

yujuhong added area/docker area/reliability sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 15, 2017

yujuhong mentioned this issue Feb 15, 2017

Improve the test: Validate OOM score adjustments once the node is setup pod infra containers oom-score-adj should be -998 and best effort container's should be 1000 #37580

Closed

yujuhong changed the title ~~Processes may be leaked when docker are killed repeatedly in a short time frame~~ Processes may be leaked when docker is killed repeatedly in a short time frame Feb 15, 2017

yujuhong mentioned this issue Mar 2, 2017

Validate docker v1.12 #28698

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2017

yujuhong removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 21, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 20, 2018

k8s-ci-robot closed this as completed May 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processes may be leaked when docker is killed repeatedly in a short time frame #41450

Processes may be leaked when docker is killed repeatedly in a short time frame #41450

yujuhong commented Feb 15, 2017 •

edited

Loading

vishh commented Feb 15, 2017 via email

yujuhong commented Feb 15, 2017

yujuhong commented Feb 15, 2017

feiskyer commented Feb 15, 2017

yujuhong commented Mar 1, 2017

fejta-bot commented Dec 21, 2017

yujuhong commented Dec 21, 2017

fejta-bot commented Mar 21, 2018

fejta-bot commented Apr 20, 2018

fejta-bot commented May 20, 2018

Processes may be leaked when docker is killed repeatedly in a short time frame #41450

Processes may be leaked when docker is killed repeatedly in a short time frame #41450

Comments

yujuhong commented Feb 15, 2017 • edited Loading

vishh commented Feb 15, 2017 via email

yujuhong commented Feb 15, 2017

yujuhong commented Feb 15, 2017

feiskyer commented Feb 15, 2017

yujuhong commented Mar 1, 2017

fejta-bot commented Dec 21, 2017

yujuhong commented Dec 21, 2017

fejta-bot commented Mar 21, 2018

fejta-bot commented Apr 20, 2018

fejta-bot commented May 20, 2018

yujuhong commented Feb 15, 2017 •

edited

Loading