Cap docker resource cgroup's limit #9881
Labels
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
sig/node
Categorizes an issue or PR as relevant to SIG Node.
Milestone
Forked from #9788 (comment):
The only unrecovered node is caused by docker memory leakage. It is a known issue since docker 1.3.0, might even earlier version (moby/moby#9139). Docker 1.7.0-rcX (the one I am currently validating) should have a fix for it. Once you restart docker, the problem should be gone, and the node should be recovered. I saw a similar problem before. On each node, there is a monit healthchecking docker daemon process periodically, in most cases, the docker in such bad state will be restarted by monit.
Before we have such fix from docker 1.7, we can set docker's hard memory limit to 70% of node capacity since we already put docker into a cgroup with unlimited limit today. This could be a temporary workaround recovering node from bad state.
Related docker issue: moby/moby#9139
cc/ @lavalamp @bprashanth
The text was updated successfully, but these errors were encountered: