This repository has been archived by the owner on Sep 30, 2020. It is now read-only.
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.
apiserver evicted under imagefs pressure and controller node never recovers #1300
Closed
Description
Using kube-aws 5780b59 and k8s 1.9.7 I have experienced that imagefs pressure on a controller node can cause the apiserver to be evicted. This by itself might not be a problem, but the node never recovers and it is not marked as unhealthy so is never replaced. Some logs from kubelet below.
The eviction might need to be reported upstream. I found kubernetes/kubernetes#44951 but that was fixed, perhaps it's a similar error.
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: W0514 02:08:58.641934 1390 eviction_manager.go:332] eviction manager: attempting to reclaim imagefs
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.641991 1390 helpers.go:1074] eviction manager: attempting to delete unused containers
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.655265 1390 helpers.go:1084] eviction manager: attempting to delete unused images
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.664897 1390 image_gc_manager.go:350] [imageGCManager]: Removing image "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" to free 746888 bytes
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.666761 1390 remote_image.go:130] RemoveImage "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" from image service failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 408a5feb4834
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.666797 1390 kuberuntime_image.go:126] Remove image "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 408a5feb4834
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: W0514 02:08:58.666827 1390 eviction_manager.go:435] eviction manager: unexpected error when attempting to reduce imagefs pressure: wanted to free 9223372036854775807 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 408a5feb4834
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.666844 1390 eviction_manager.go:346] eviction manager: must evict pod(s) to reclaim imagefs
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.667015 1390 eviction_manager.go:364] eviction manager: pods ranked for eviction: kube-apiserver-ip-10-0-30-133.eu-west-1.compute.internal_kube-system(d4b4db693950c671a53943d7a2336502), external-dns-master-external-dns-657fb949b-ntbgm_kube-system(8545f9a0-55c1-11e8-817e-0a3fef7a67be), fluentd-master-4th4n_kube-system(5952e75d-541d-11e8-817e-0a3fef7a67be), dnsmasq-node-9cxp4_kube-system(a7cda576-506e-11e8-b2d6-0693f00dd91a), kube-proxy-xrpw8_kube-system(a7d74c6e-506e-11e8-b2d6-0693f00dd91a), kube-scheduler-ip-10-0-30-133.eu-west-1.compute.internal_kube-system(4a88231d31b76d9f1e50373121d9bec5), kiam-server-lv9ft_kube-system(a7e8259f-506e-11e8-b2d6-0693f00dd91a)
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.722082 1390 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:481: Failed to watch *v1.Node: Get http://localhost:8080/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-30-133.eu-west-1.compute.internal&resourceVersion=86126847&timeoutSeconds=412&watch=true: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.722294 1390 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:472: Failed to watch *v1.Service: Get http://localhost:8080/api/v1/services?resourceVersion=85873413&timeoutSeconds=553&watch=true: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.722363 1390 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to watch *v1.Pod: Get http://localhost:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-30-133.eu-west-1.compute.internal&resourceVersion=86126855&timeoutSeconds=336&watch=true: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.829361 1390 event.go:209] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/kube-system/events: dial tcp [::1]:8080: getsockopt: connection refused' (may retry after sleeping)
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.936822 1390 eviction_manager.go:156] eviction manager: pods kube-apiserver-ip-10-0-30-133.eu-west-1.compute.internal_kube-system(d4b4db693950c671a53943d7a2336502) evicted, waiting for pod to be cleaned up
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: W0514 02:08:59.122310 1390 pod_container_deletor.go:77] Container "db11d52f4db9fa2f566855c880965e5632d8053102d490a18cc7edee1246dbb0" not found in pod's containers
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.722796 1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:481: Failed to list *v1.Node: Get http://localhost:8080/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.724020 1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:472: Failed to list *v1.Service: Get http://localhost:8080/api/v1/services?limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.725555 1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get http://localhost:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.838837 1390 event.go:209] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/kube-system/events: dial tcp [::1]:8080: getsockopt: connection refused' (may retry after sleeping)
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:09:00.723603 1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:481: Failed to list *v1.Node: Get http://localhost:8080/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:09:00.725226 1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:472: Failed to list *v1.Service: Get http://localhost:8080/api/v1/services?limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:09:00.726953 1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get http://localhost:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867350 1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "ssl-certs-kubernetes" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-kubernetes") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867449 1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "ssl-certs-host" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-host") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867520 1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "apiserver" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-apiserver") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867583 1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "var-log" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-var-log") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867658 1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-var-log" (OuterVolumeSpecName: "var-log") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "var-log". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867706 1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-kubernetes" (OuterVolumeSpecName: "ssl-certs-kubernetes") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "ssl-certs-kubernetes". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867746 1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-host" (OuterVolumeSpecName: "ssl-certs-host") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "ssl-certs-host". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867785 1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-apiserver" (OuterVolumeSpecName: "apiserver") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "apiserver". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967807 1390 reconciler.go:297] Volume detached for volume "ssl-certs-host" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-host") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967841 1390 reconciler.go:297] Volume detached for volume "apiserver" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-apiserver") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967859 1390 reconciler.go:297] Volume detached for volume "var-log" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-var-log") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967877 1390 reconciler.go:297] Volume detached for volume "ssl-certs-kubernetes" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-kubernetes") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
- kube-aws 5780b59
- k8s 1.9.7
- Linux ip-10-0-30-133.eu-west-1.compute.internal 4.14.32-coreos Enable Travis CI builds #1 SMP Tue Apr 3 05:21:26 UTC 2018 x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz GenuineIntel GNU/Linux
- "Container Linux by CoreOS 1688.5.3 (Rhyolite)"