Cluster addons make large clusters unstable (user container's oom_score_adj is misconfigured) #9788
Closed
Description
- Setup a 100 node cluster with 0 extra pods
- Periodically run
kubectl get pods | sort -k 5 -u
till you notice a youger cluster addon, eg:
monitoring-heapster-v2-jnxgm 1/1 Running 0 10m
monitoring-influx-grafana-v1-7kqyf 2/2 Running 0 17m
fluentd-elasticsearch-e2e-test-beeps-minion-1khh 1/1 Running 0 1h
elasticsearch-logging-v1-abm8p 1/1 Running 0 2h
fluentd-elasticsearch-e2e-test-beeps-minion-m251 1/1 Running 0 31m
fluentd-elasticsearch-e2e-test-beeps-minion-i8ke 1/1 Running 1 7m
NAME READY REASON RESTARTS AGE
At this point you'll probably have some dead nodes, or at least a kubelet restart (related: #7815).
Additional information:
- All my dead nodes had at least 1 addon eg:
e2e-test-beeps-minion-0so1 kubernetes.io/hostname=e2e-test-beeps-minion-0so1 NotReady
e2e-test-beeps-minion-0v0x kubernetes.io/hostname=e2e-test-beeps-minion-21dm NotReady
cat ~/logs/kube-scheduler.log | grep "Attempting to bind"
I0614 03:23:58.132034 12 factory.go:312] Attempting to bind monitoring-heapster-v2-mtr9x to e2e-test-beeps-minion-0so1
I0614 04:01:28.134103 12 factory.go:312] Attempting to bind elasticsearch-logging-v1-0fjuw to e2e-test-beeps-minion-21dm
- A spike in cpu/memory usage often occurs just before the restart (and it spikes fast, in ~30m):
beeps@e2e-test-beeps-minion-21dm:~$ sudo atop -r /var/log/atop/atop_20150614 -b 3:20 -C | grep "PID" -A 3 | head -3
PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD
11230 20.09s 2m29s 0K 0K 4K 1.6G -- - S 29% java
3414 7.08s 9.21s 0K 248K 820K 1032K -- - S 3% kubelet
beeps@e2e-test-beeps-minion-21dm:~$ sudo atop -r /var/log/atop/atop_20150614 -b 4:00 -C | grep "PID" -A 3 | head -3
PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD
11230 34.74s 5m01s 0K 272.4M 799.8M 647.7M -- - S 93% java
3414 19.81s 20.57s 16392K -4684K 1.0G 2324K -- - S 11% kubelet
- This affects the sd of syncPod response times (and obviously during the outage syncPod doesn't run):
I0614 03:52:47.433688 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 24.630828ms
I0614 03:52:57.639778 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 192.807391ms
I0614 03:53:07.621054 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 167.872516ms
I0614 03:53:17.644948 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 185.237351ms
I0614 03:53:27.494427 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 29.206469ms
I0614 03:53:37.651265 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 175.375625ms
I0614 03:53:47.655602 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 32.494134ms
I0614 03:53:57.655484 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 18.381837ms
I0614 03:54:07.664289 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 24.411221ms
I0614 03:54:17.687201 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 38.721561ms
I0614 03:54:27.692326 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 33.002448ms
I0614 03:54:37.888149 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 221.385ms
I0614 03:54:47.702193 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 29.895014ms
I0614 03:54:57.707271 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 28.48375ms
I0614 03:55:08.134910 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 449.835487ms
I0614 03:55:18.859870 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 920.379739ms
I0614 03:55:28.427633 3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 302.093733ms
- Sometimes there are OOMs, these don't affect the kubelet, but the node eventually dies.
beeps@e2e-test-beeps-minion-r1ms:~$ cat /var/log/kern.log | grep "Out of"
Jun 14 03:06:48 e2e-test-beeps-minion-r1ms kernel: [31683.055354] Out of memory: Kill process 16730 (manage_accounts) score 4 or sacrifice child
Jun 14 03:06:48 e2e-test-beeps-minion-r1ms kernel: [31683.251853] Out of memory: Kill process 2343 (manage_addresse) score 4 or sacrifice child
Jun 14 03:06:55 e2e-test-beeps-minion-r1ms kernel: [31684.094101] Out of memory: Kill process 2354 (manage_clock_sy) score 4 or sacrifice child
beeps@e2e-test-beeps-minion-r1ms:~$ cat /var/log/kubelet.log | grep fork
I0614 03:06:07.173569 3392 container.go:363] Failed to update stats for container "/e84a731222ad0fcef8bdc713ca4aaa6a2e8c7df1dcdc38fe342deb05506dcfd6": du command failed on /var/lib/docker/aufs/diff/e84a731222ad0fcef8bdc713ca4aaa6a2e8c7df1dcdc38fe342deb05506dcfd6 with output - fork/exec /usr/bin/du: cannot allocate memory, continuing to push stats
I0614 03:06:07.025625 3392 container.go:363] Failed to update stats for container "/0f8c0eaf2805abf74565fac3cd0cf781e332160f27d2aaa7cd4db45aa5609d27": du command failed on /var/lib/docker/aufs/diff/0f8c0eaf2805abf74565fac3cd0cf781e332160f27d2aaa7cd4db45aa5609d27 with output - fork/exec /usr/bin/du: cannot allocate memory, continuing to push stats
... but kubelet is still healthy...
I0614 03:19:20.606510 17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.928699ms
I0614 03:19:20.708280 17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.572725ms
I0614 03:19:20.810046 17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.559437ms
I0614 03:19:20.912091 17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.864913ms
I0614 03:19:21.013862 17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.596476ms
... logs are quiet till I reset the node at 18:18 ...
I0614 18:18:49.292671 2208 server.go:269] Using root directory: /var/lib/kubelet
I0614 18:18:49.292740 2208 manager.go:126] cAdvisor running in container: "/"
- Sometimes the kubelet restart without any OOMs, grep nodecontroller logs for evidence:
(test_clean_master)$ cat ~/logs/kcm | grep PodEvi
I0614 23:42:45.843681 5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-gjds
I0614 23:50:50.927087 5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-c5ok
I0614 23:57:14.960444 5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-g6wk
I0615 00:21:55.186406 5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-hawh
related (although you don't need any extra pods to reproduce this issue): #7815