Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

apiserver evicted under imagefs pressure and controller node never recovers #1300

Closed
cknowles opened this issue May 14, 2018 · 15 comments
Closed
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@cknowles
Copy link
Contributor

cknowles commented May 14, 2018

Using kube-aws 5780b59 and k8s 1.9.7 I have experienced that imagefs pressure on a controller node can cause the apiserver to be evicted. This by itself might not be a problem, but the node never recovers and it is not marked as unhealthy so is never replaced. Some logs from kubelet below.

The eviction might need to be reported upstream. I found kubernetes/kubernetes#44951 but that was fixed, perhaps it's a similar error.

May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: W0514 02:08:58.641934    1390 eviction_manager.go:332] eviction manager: attempting to reclaim imagefs
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.641991    1390 helpers.go:1074] eviction manager: attempting to delete unused containers
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.655265    1390 helpers.go:1084] eviction manager: attempting to delete unused images
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.664897    1390 image_gc_manager.go:350] [imageGCManager]: Removing image "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" to free 746888 bytes
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.666761    1390 remote_image.go:130] RemoveImage "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" from image service failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 408a5feb4834
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.666797    1390 kuberuntime_image.go:126] Remove image "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 408a5feb4834
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: W0514 02:08:58.666827    1390 eviction_manager.go:435] eviction manager: unexpected error when attempting to reduce imagefs pressure: wanted to free 9223372036854775807 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 408a5feb4834
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.666844    1390 eviction_manager.go:346] eviction manager: must evict pod(s) to reclaim imagefs
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.667015    1390 eviction_manager.go:364] eviction manager: pods ranked for eviction: kube-apiserver-ip-10-0-30-133.eu-west-1.compute.internal_kube-system(d4b4db693950c671a53943d7a2336502), external-dns-master-external-dns-657fb949b-ntbgm_kube-system(8545f9a0-55c1-11e8-817e-0a3fef7a67be), fluentd-master-4th4n_kube-system(5952e75d-541d-11e8-817e-0a3fef7a67be), dnsmasq-node-9cxp4_kube-system(a7cda576-506e-11e8-b2d6-0693f00dd91a), kube-proxy-xrpw8_kube-system(a7d74c6e-506e-11e8-b2d6-0693f00dd91a), kube-scheduler-ip-10-0-30-133.eu-west-1.compute.internal_kube-system(4a88231d31b76d9f1e50373121d9bec5), kiam-server-lv9ft_kube-system(a7e8259f-506e-11e8-b2d6-0693f00dd91a)
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.722082    1390 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:481: Failed to watch *v1.Node: Get http://localhost:8080/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-30-133.eu-west-1.compute.internal&resourceVersion=86126847&timeoutSeconds=412&watch=true: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.722294    1390 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:472: Failed to watch *v1.Service: Get http://localhost:8080/api/v1/services?resourceVersion=85873413&timeoutSeconds=553&watch=true: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.722363    1390 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to watch *v1.Pod: Get http://localhost:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-30-133.eu-west-1.compute.internal&resourceVersion=86126855&timeoutSeconds=336&watch=true: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:58.829361    1390 event.go:209] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/kube-system/events: dial tcp [::1]:8080: getsockopt: connection refused' (may retry after sleeping)
May 14 02:08:58 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:08:58.936822    1390 eviction_manager.go:156] eviction manager: pods kube-apiserver-ip-10-0-30-133.eu-west-1.compute.internal_kube-system(d4b4db693950c671a53943d7a2336502) evicted, waiting for pod to be cleaned up
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: W0514 02:08:59.122310    1390 pod_container_deletor.go:77] Container "db11d52f4db9fa2f566855c880965e5632d8053102d490a18cc7edee1246dbb0" not found in pod's containers
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.722796    1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:481: Failed to list *v1.Node: Get http://localhost:8080/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.724020    1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:472: Failed to list *v1.Service: Get http://localhost:8080/api/v1/services?limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.725555    1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get http://localhost:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:08:59 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:08:59.838837    1390 event.go:209] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/kube-system/events: dial tcp [::1]:8080: getsockopt: connection refused' (may retry after sleeping)
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:09:00.723603    1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:481: Failed to list *v1.Node: Get http://localhost:8080/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:09:00.725226    1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:472: Failed to list *v1.Service: Get http://localhost:8080/api/v1/services?limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: E0514 02:09:00.726953    1390 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get http://localhost:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-30-133.eu-west-1.compute.internal&limit=500&resourceVersion=0: dial tcp [::1]:8080: getsockopt: connection refused
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867350    1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "ssl-certs-kubernetes" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-kubernetes") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867449    1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "ssl-certs-host" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-host") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867520    1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "apiserver" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-apiserver") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867583    1390 reconciler.go:191] operationExecutor.UnmountVolume started for volume "var-log" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-var-log") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502")
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867658    1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-var-log" (OuterVolumeSpecName: "var-log") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "var-log". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867706    1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-kubernetes" (OuterVolumeSpecName: "ssl-certs-kubernetes") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "ssl-certs-kubernetes". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867746    1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-host" (OuterVolumeSpecName: "ssl-certs-host") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "ssl-certs-host". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.867785    1390 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-apiserver" (OuterVolumeSpecName: "apiserver") pod "d4b4db693950c671a53943d7a2336502" (UID: "d4b4db693950c671a53943d7a2336502"). InnerVolumeSpecName "apiserver". PluginName "kubernetes.io/host-path", VolumeGidValue ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967807    1390 reconciler.go:297] Volume detached for volume "ssl-certs-host" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-host") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967841    1390 reconciler.go:297] Volume detached for volume "apiserver" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-apiserver") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967859    1390 reconciler.go:297] Volume detached for volume "var-log" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-var-log") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
May 14 02:09:00 ip-10-0-30-133.eu-west-1.compute.internal kubelet-wrapper[1390]: I0514 02:09:00.967877    1390 reconciler.go:297] Volume detached for volume "ssl-certs-kubernetes" (UniqueName: "kubernetes.io/host-path/d4b4db693950c671a53943d7a2336502-ssl-certs-kubernetes") on node "ip-10-0-30-133.eu-west-1.compute.internal" DevicePath ""
  • kube-aws 5780b59
  • k8s 1.9.7
  • Linux ip-10-0-30-133.eu-west-1.compute.internal 4.14.32-coreos Enable Travis CI builds #1 SMP Tue Apr 3 05:21:26 UTC 2018 x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz GenuineIntel GNU/Linux
  • "Container Linux by CoreOS 1688.5.3 (Rhyolite)"
@cknowles
Copy link
Contributor Author

@mumoshu FYI this appears to be the cause of my dev cluster sporadically becoming unhealthy and never recovering that we chatted about.

@cknowles
Copy link
Contributor Author

cknowles commented May 14, 2018

Also seems relevant - kubernetes/kubernetes#57020 which is in k8s 1.10. Plus this comment - kubernetes/kubernetes#47526 (comment).

@gscalise
Copy link

gscalise commented Jun 6, 2018

We're being hit by this issue too. We've suffered it in v1.9.2 and now on v1.10.2

Jun  2 21:11:24 uk-test-k8s1 kubelet: I0602 21:11:24.601633    1681 container_gc.go:85] attempting to delete unused containers
Jun  2 21:11:24 uk-test-k8s1 kubelet: I0602 21:11:24.630170    1681 image_gc_manager.go:317] attempting to delete unused images
Jun  2 21:11:24 uk-test-k8s1 kubelet: I0602 21:11:24.664829    1681 image_gc_manager.go:371] [imageGCManager]: Removing image "sha256:1b3607c7eda67632301ee13b353b416a91e62ae72df7ac617e84c8a586915772" to free 37289552 bytes
Jun  2 21:11:24 uk-test-k8s1 dockerd: time="2018-06-02T21:11:24.668901374Z" level=error msg="Handler for DELETE /v1.27/images/sha256:1b3607c7eda67632301ee13b353b416a91e62ae72df7ac617e84c8a586915772 returned error: conflict: unable to delete 1b3607c7eda6 (cannot be forced) - image is being used by running co
ntainer 5cb7d5e102ea"
Jun  2 21:11:24 uk-test-k8s1 kubelet: E0602 21:11:24.669541    1681 remote_image.go:130] RemoveImage "sha256:1b3607c7eda67632301ee13b353b416a91e62ae72df7ac617e84c8a586915772" from image service failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 1b3607c7eda6 (cann
ot be forced) - image is being used by running container 5cb7d5e102ea
Jun  2 21:11:24 uk-test-k8s1 kubelet: E0602 21:11:24.669609    1681 kuberuntime_image.go:122] Remove image "sha256:1b3607c7eda67632301ee13b353b416a91e62ae72df7ac617e84c8a586915772" failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 1b3607c7eda6 (cannot be forced)
 - image is being used by running container 5cb7d5e102ea
Jun  2 21:11:24 uk-test-k8s1 kubelet: W0602 21:11:24.669660    1681 eviction_manager.go:445] eviction manager: unexpected error when attempting to reduce imagefs pressure: wanted to free 9223372036854775807 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error
response from daemon: conflict: unable to delete 1b3607c7eda6 (cannot be forced) - image is being used by running container 5cb7d5e102ea
Jun  2 21:11:24 uk-test-k8s1 kubelet: I0602 21:11:24.693532    1681 eviction_manager.go:357] eviction manager: must evict pod(s) to reclaim imagefs

Container 5cb7d5e102ea is etcd... why is the eviction manager trying to kill etcd??

@gscalise
Copy link

gscalise commented Jun 7, 2018

After looking a bit more about the issue, it seems that eviction_manager is trying to kill a running container (etcd) that is not being managed by the control plane.

In our case what triggered the issue was hitting the low threshold for imagefs utilisation. The thing is, eviction_manager should never try to remove containers that were never managed by K8s. It seems that it is assuming that every container belongs (or has belonged to) K8s.

@mumoshu
Copy link
Contributor

mumoshu commented Jun 8, 2018

@gscalise Hi! I'm confused a bit. kube-aws doesn't place etcd docker containers on controller nodes. It places etcd rkt containers on dedicated etcd nodes. Are you sure the container is etcd?

@mumoshu
Copy link
Contributor

mumoshu commented Jun 8, 2018

@c-knowles Thanks for the info. Just read all the relevant issues and it seems like you finally found the source of the problem!

Can you try K8S v1.10.x to see if that fixes the problem?

I'm also unsure how we could intentionally reproduce this in K8S 1.9.x. Can we just write a bunch of random files to make the disk almost full, so that eviction-manager removes the image of the running container?

@cknowles
Copy link
Contributor Author

cknowles commented Jun 8, 2018

@mumoshu yeah write anything that it won’t evict, for example write a tonne of data to /var/log/fluentd.log.

If we want more resilience for this specific case of host logs, GKE and kops appear to have logrotate enabled by default on most of /var/log. Maybe we want to enable that by default as well, although it might interfere with audit logs.

@mumoshu
Copy link
Contributor

mumoshu commented Jun 8, 2018

GKE and kops appear to have logrotate enabled by default on most of /var/log

Ah, sounds like a good idea. I had blindly decided that what we have today - the docker container log rotation enable by default - was enough to make controller nodes resilient to log spams.

Out of curiosity, what logs actually filled up your disk so much?

@cknowles
Copy link
Contributor Author

cknowles commented Jun 8, 2018

It was logs for fluentd itself since a typical config means it doesn’t monitor its own logs (see upstream addon config for example). In my case some unrelated problem meant fluentd was logging more than normal.

@mumoshu
Copy link
Contributor

mumoshu commented Jun 8, 2018

@c-knowles Thanks for the response. Interesting. I was thinking that the fluentd docker container managed by a daemonset would write its own log to stdout. dockerd should rotates the log every 50MB as configured by kube-aws. But it seems not always the case given your explanation.

@cknowles
Copy link
Contributor Author

cknowles commented Jun 8, 2018

@mumoshu I agree in general, just it was not always the upstream config - ref base upstream config plus daemon set change from quite a while ago. It's less of an issue given the upstream changes, so this just shows it could happen but then any other directory written to from a pod could also bloat.

How about for this task we focus on node health? Regardless of upstream updates the apiserver could stop running under other conditions, how do we ensure this node is marked unhealthy in AWS or k8s?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants