Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processes may be leaked when docker is killed repeatedly in a short time frame #41450

Closed
yujuhong opened this issue Feb 15, 2017 · 10 comments
Closed
Labels
area/docker area/reliability lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@yujuhong
Copy link
Contributor

yujuhong commented Feb 15, 2017

Forked from #37580

docker version: 1.11.2
OS: gci

If docker gets killed repeated in a short time frame (while kubelet is running and trying to create containers), some container processes may get reparented to PID 1 and continue running, but no longer visible from the docker daemon.

This can be produced by running the Network should recover from ip leaks node e2e test.

  • test creates 100 pods with the pause image.
  • test restarts docker (systemctl restart docker) 6 times, with 20s interval in between.
  • test completes successfully.
  • Run ps -C "pause" -f and see multiple processes running the pause command still alive.
  • Run docker ps and see no running container.
    Running the test a few times (< 3) should reproduce the issue.

EDIT: This issue has been there since the test was first introduced in Nov 2016. We will need to check and see if this is fixed in newer docker versions.

/cc @kubernetes/sig-node-bugs

@vishh
Copy link
Contributor

vishh commented Feb 15, 2017 via email

@yujuhong
Copy link
Contributor Author

Are these processes still tracked by their respective cgroups?

Didn't check when I still had the node. Should be easy to reproduce and verify though.

@yujuhong
Copy link
Contributor Author

/cc @kubernetes/rh-cluster-infra to see if anyone has encountered this issue.

@yujuhong yujuhong changed the title Processes may be leaked when docker are killed repeatedly in a short time frame Processes may be leaked when docker is killed repeatedly in a short time frame Feb 15, 2017
@feiskyer
Copy link
Member

Does docker 1.12 have same problem?

@yujuhong
Copy link
Contributor Author

yujuhong commented Mar 1, 2017

Does docker 1.12 have same problem?

Don't know. No one has verified yet.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2017
@yujuhong yujuhong removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2017
@yujuhong
Copy link
Contributor Author

Need to verify whether newer docker + COS image has the same issue

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 21, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 20, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker area/reliability lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

5 participants