Skip to content

Cluster addons make large clusters unstable (user container's oom_score_adj is misconfigured) #9788

Closed
@bprashanth

Description

  1. Setup a 100 node cluster with 0 extra pods
  2. Periodically run kubectl get pods | sort -k 5 -u till you notice a youger cluster addon, eg:
monitoring-heapster-v2-jnxgm                       1/1       Running   0          10m
monitoring-influx-grafana-v1-7kqyf                 2/2       Running   0          17m
fluentd-elasticsearch-e2e-test-beeps-minion-1khh   1/1       Running   0          1h
elasticsearch-logging-v1-abm8p                     1/1       Running   0          2h
fluentd-elasticsearch-e2e-test-beeps-minion-m251   1/1       Running   0          31m
fluentd-elasticsearch-e2e-test-beeps-minion-i8ke   1/1       Running   1          7m
NAME                                               READY     REASON    RESTARTS   AGE

At this point you'll probably have some dead nodes, or at least a kubelet restart (related: #7815).

Additional information:

  • All my dead nodes had at least 1 addon eg:
e2e-test-beeps-minion-0so1   kubernetes.io/hostname=e2e-test-beeps-minion-0so1   NotReady
e2e-test-beeps-minion-0v0x   kubernetes.io/hostname=e2e-test-beeps-minion-21dm   NotReady


cat ~/logs/kube-scheduler.log | grep "Attempting to bind"
I0614 03:23:58.132034      12 factory.go:312] Attempting to bind monitoring-heapster-v2-mtr9x to e2e-test-beeps-minion-0so1
I0614 04:01:28.134103      12 factory.go:312] Attempting to bind elasticsearch-logging-v1-0fjuw to e2e-test-beeps-minion-21dm
  • A spike in cpu/memory usage often occurs just before the restart (and it spikes fast, in ~30m):
beeps@e2e-test-beeps-minion-21dm:~$ sudo atop -r /var/log/atop/atop_20150614 -b 3:20 -C | grep "PID" -A 3 | head -3
  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK ST EXC S  CPU CMD            
11230  20.09s   2m29s     0K     0K     4K   1.6G --   - S  29% java           
 3414   7.08s   9.21s     0K   248K   820K  1032K --   - S   3% kubelet        

beeps@e2e-test-beeps-minion-21dm:~$ sudo atop -r /var/log/atop/atop_20150614 -b 4:00 -C | grep "PID" -A 3 | head -3
  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK ST EXC S  CPU CMD            
11230  34.74s   5m01s     0K 272.4M 799.8M 647.7M --   - S  93% java           
 3414  19.81s  20.57s 16392K -4684K   1.0G  2324K --   - S  11% kubelet   
  • This affects the sd of syncPod response times (and obviously during the outage syncPod doesn't run):
I0614 03:52:47.433688    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 24.630828ms
I0614 03:52:57.639778    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 192.807391ms
I0614 03:53:07.621054    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 167.872516ms
I0614 03:53:17.644948    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 185.237351ms
I0614 03:53:27.494427    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 29.206469ms
I0614 03:53:37.651265    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 175.375625ms
I0614 03:53:47.655602    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 32.494134ms
I0614 03:53:57.655484    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 18.381837ms
I0614 03:54:07.664289    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 24.411221ms
I0614 03:54:17.687201    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 38.721561ms
I0614 03:54:27.692326    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 33.002448ms
I0614 03:54:37.888149    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 221.385ms
I0614 03:54:47.702193    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 29.895014ms
I0614 03:54:57.707271    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 28.48375ms
I0614 03:55:08.134910    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 449.835487ms
I0614 03:55:18.859870    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 920.379739ms
I0614 03:55:28.427633    3414 kubelet.go:1121] (kubelet) syncPod = net time for pod elasticsearch-logging-v1-0fjuw 302.093733ms
  • Sometimes there are OOMs, these don't affect the kubelet, but the node eventually dies.
beeps@e2e-test-beeps-minion-r1ms:~$ cat /var/log/kern.log | grep "Out of"
Jun 14 03:06:48 e2e-test-beeps-minion-r1ms kernel: [31683.055354] Out of memory: Kill process 16730 (manage_accounts) score 4 or sacrifice child
Jun 14 03:06:48 e2e-test-beeps-minion-r1ms kernel: [31683.251853] Out of memory: Kill process 2343 (manage_addresse) score 4 or sacrifice child
Jun 14 03:06:55 e2e-test-beeps-minion-r1ms kernel: [31684.094101] Out of memory: Kill process 2354 (manage_clock_sy) score 4 or sacrifice child

beeps@e2e-test-beeps-minion-r1ms:~$ cat /var/log/kubelet.log | grep fork
I0614 03:06:07.173569    3392 container.go:363] Failed to update stats for container "/e84a731222ad0fcef8bdc713ca4aaa6a2e8c7df1dcdc38fe342deb05506dcfd6": du command failed on /var/lib/docker/aufs/diff/e84a731222ad0fcef8bdc713ca4aaa6a2e8c7df1dcdc38fe342deb05506dcfd6 with output  - fork/exec /usr/bin/du: cannot allocate memory, continuing to push stats
I0614 03:06:07.025625    3392 container.go:363] Failed to update stats for container "/0f8c0eaf2805abf74565fac3cd0cf781e332160f27d2aaa7cd4db45aa5609d27": du command failed on /var/lib/docker/aufs/diff/0f8c0eaf2805abf74565fac3cd0cf781e332160f27d2aaa7cd4db45aa5609d27 with output  - fork/exec /usr/bin/du: cannot allocate memory, continuing to push stats

... but kubelet is still healthy...

I0614 03:19:20.606510   17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.928699ms
I0614 03:19:20.708280   17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.572725ms
I0614 03:19:20.810046   17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.559437ms
I0614 03:19:20.912091   17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.864913ms
I0614 03:19:21.013862   17152 manager.go:750] (kubelet) dockermanager dockertools GetPods took 1.596476ms

... logs are quiet till I reset the node at 18:18 ...

I0614 18:18:49.292671    2208 server.go:269] Using root directory: /var/lib/kubelet
I0614 18:18:49.292740    2208 manager.go:126] cAdvisor running in container: "/"
  • Sometimes the kubelet restart without any OOMs, grep nodecontroller logs for evidence:
(test_clean_master)$ cat ~/logs/kcm | grep PodEvi
I0614 23:42:45.843681       5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-gjds
I0614 23:50:50.927087       5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-c5ok
I0614 23:57:14.960444       5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-g6wk
I0615 00:21:55.186406       5 podevictor.go:110] PodEvictor is evicting Pods on Node: e2e-test-beeps-minion-hawh

related (although you don't need any extra pods to reproduce this issue): #7815

Metadata

Assignees

Labels

priority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions