k8s-1.10: One of the kube-proxy pod failed to get up after restart #63064

saurabh-chordiya · 2018-04-24T06:55:44Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
Deployed k8s-1.10 using kubeadm (60 nodes setup) and everything was up & running.
Restarted one of the kube-proxy pod and then it failed to come up
Pod was up & running but after restart if went into error state

$ kubectl get pod --namespace=kube-system -o wide |grep kube-proxy-pj4xw
kube-proxy-pj4xw 0/1 CrashLoopBackOff 8 17m 1.0.0.76 minion-30-5-0-5

Below is the error:-
$ kubectl logs --namespace=kube-system kube-proxy-pj4xw
I0424 06:24:47.961665 1 feature_gate.go:226] feature gates: &{{} map[]}
error: unable to read certificate-authority /var/run/secrets/kubernetes.io/serviceaccount/ca.crt for default due to open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory

What you expected to happen:
Pod should up and running after restart

How to reproduce it (as minimally and precisely as possible):
delete one of kube-proxy pod

Anything else we need to know?:
Not seen this issue with k8s-1.9.1

Disabled feature MountPropagation through featureGates as with MountPropagation enable it was not working as well so gave a try with disable

Also added "MountFlags=shared" to /etc/systemd/system/multi-user.target.wants/docker.service as without this all test-pod deployment were failing

After runing command "mount --make-rshared /" and restart docker service it started to work but next time again after delete pod, it failed to come up

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:55:54Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:
Deployed using kubeadm (60 nodes setup)
OS (e.g. from /etc/os-release):

NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Kernel (e.g. uname -a):

Linux minion-30-5-0-9 4.6.0-040600-generic #201606100558 SMP Fri Jun 10 10:01:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Install tools:
Others:

##########################

docker version

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.6.2
 Git commit:   092cba3
 Built:        Thu Nov  2 20:40:23 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.6.2
 Git commit:   092cba3
 Built:        Thu Nov  2 20:40:23 2017
 OS/Arch:      linux/amd64
 Experimental: false

##########################

kubectl describe pod --namespace=kube-system kube-proxy-pj4xw

Name:           kube-proxy-pj4xw
Namespace:      kube-system
Node:           minion-30-5-0-5/1.0.0.76
Start Time:     Tue, 24 Apr 2018 06:23:59 +0000
Labels:         controller-revision-hash=1193416634
                k8s-app=kube-proxy
                pod-template-generation=1
Annotations:    <none>
Status:         Running
IP:             1.0.0.76
Controlled By:  DaemonSet/kube-proxy
Containers:
  kube-proxy:
    Container ID:  docker://a7cce480ec211e0fdf53a5216e1d4f0601feb25df0b35370e3194203ec2f5165
    Image:         k8s.gcr.io/kube-proxy-amd64:v1.10.0
    Image ID:      docker-pullable://k8s.gcr.io/kube-proxy-amd64@sha256:fc944b06c14cb442916045a630d5e374dfb9c453dfc56d3cb59ac21ea4268875
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/local/bin/kube-proxy
      --config=/var/lib/kube-proxy/config.conf
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 24 Apr 2018 06:34:44 +0000
      Finished:     Tue, 24 Apr 2018 06:34:45 +0000
    Ready:          False
    Restart Count:  7
    Environment:    <none>
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kube-proxy from kube-proxy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-proxy-token-b526l (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  kube-proxy-token-b526l:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-proxy-token-b526l
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason                 Age                From                      Message
  ----     ------                 ----               ----                      -------
  Normal   SuccessfulMountVolume  14m                kubelet, minion-30-5-0-5  MountVolume.SetUp succeeded for volume "lib-modules"
  Normal   SuccessfulMountVolume  14m                kubelet, minion-30-5-0-5  MountVolume.SetUp succeeded for volume "xtables-lock"
  Normal   SuccessfulMountVolume  14m                kubelet, minion-30-5-0-5  MountVolume.SetUp succeeded for volume "kube-proxy"
  Normal   SuccessfulMountVolume  14m                kubelet, minion-30-5-0-5  MountVolume.SetUp succeeded for volume "kube-proxy-token-b526l"
  Normal   Started                13m (x4 over 14m)  kubelet, minion-30-5-0-5  Started container
  Normal   Pulled                 13m (x5 over 14m)  kubelet, minion-30-5-0-5  Container image "k8s.gcr.io/kube-proxy-amd64:v1.10.0" already present on machine
  Normal   Created                13m (x5 over 14m)  kubelet, minion-30-5-0-5  Created container
  Warning  BackOff                4m (x47 over 14m)  kubelet, minion-30-5-0-5  Back-off restarting failed container

The text was updated successfully, but these errors were encountered:

saurabh-chordiya · 2018-04-24T07:02:46Z

/sig storage
/sig node

saurabh-chordiya · 2018-04-26T05:12:50Z

@kubernetes/sig-storage-bugs
@jsafrane

jsafrane · 2018-04-27T13:28:42Z

I am not sure it's related to mount propagation at all. It would produce different messages.

error: unable to read certificate-authority /var/run/secrets/kubernetes.io/serviceaccount/ca.crt for default due to open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory

This seems to be some issue with Secret volumes. Please check /var/lib/kubelet/pods/<your pod uid>/volumes/kubernetes.io~secret/<token name>/ if it contains ca.crt and looks "healthy".

saurabh-chordiya · 2018-04-30T05:49:28Z

Yes this doesn't seem to be related to mount propagation, but something new in 1.10 compare to 1.9 that is breaking it.

Checked below path and it has all the required data

root@minion-30-1-0-5:~# ls /var/lib/kubelet/pods/9e4ef60c-4a12-11e8-982e-222201000023/volumes/kubernetes.io~secret/default-token-xrbpk/
ca.crt  namespace  token

Mount propagation is "private,slave"

root@minion-30-1-0-5:~# findmnt -o TARGET,PROPAGATION /var/lib/kubelet/pods/9e4ef60c-4a12-11e8-982e-222201000023/volumes/kubernetes.io~secret/default-token-xrbpk/
TARGET                                                                                                      PROPAGATION
/var/lib/kubelet/pods/9e4ef60c-4a12-11e8-982e-222201000023/volumes/kubernetes.io~secret/default-token-xrbpk private,slave

After changing mount propagation to shared and docker restart it started working

mount --make-rshared /var/lib/kubelet/pods/9e4ef60c-4a12-11e8-982e-222201000023/volumes/kubernetes.io~secret/default-token-xrbpk/
root@minion-30-1-0-15:~# kubectl get pod -o wide --namespace=kube-system |grep 1.0.0.66
kube-proxy-jwdn4                           1/1       Running            12         37m       1.0.0.66   minion-30-1-0-5

There is something really wrong in 1.10 as same scenario is working fine with k8s 1.9.1

jsafrane · 2018-05-04T10:20:18Z

You should retry with #62633 (upcoming 1.10.3?) where we change the default back to private and use slave/shared only when explicitly requested.

saurabh-chordiya · 2018-05-04T10:27:46Z

Sure will retry with 1.10.3 (once available) and update.

fejta-bot · 2018-08-02T10:58:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-09-01T11:44:49Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2018-10-01T12:07:35Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2018-10-01T12:07:41Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Apr 24, 2018

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 24, 2018

saurabh-chordiya changed the title ~~k8s-1.10: One of the kube-proxy node failed to get up after restart~~ k8s-1.10: One of the kube-proxy pod failed to get up after restart Apr 24, 2018

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Apr 26, 2018

AkihiroSuda mentioned this issue May 22, 2018

image/rundocker: bind-mount /var/lib/kubelet/pods and /var/logs/pods kubernetes-retired/kubeadm-dind-cluster#113

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 1, 2018

k8s-ci-robot closed this as completed Oct 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s-1.10: One of the kube-proxy pod failed to get up after restart #63064

k8s-1.10: One of the kube-proxy pod failed to get up after restart #63064

saurabh-chordiya commented Apr 24, 2018 •

edited

Loading

saurabh-chordiya commented Apr 24, 2018 •

edited

Loading

saurabh-chordiya commented Apr 26, 2018 •

edited

Loading

jsafrane commented Apr 27, 2018

saurabh-chordiya commented Apr 30, 2018 •

edited

Loading

jsafrane commented May 4, 2018

saurabh-chordiya commented May 4, 2018 •

edited

Loading

fejta-bot commented Aug 2, 2018

fejta-bot commented Sep 1, 2018

fejta-bot commented Oct 1, 2018

k8s-ci-robot commented Oct 1, 2018

k8s-1.10: One of the kube-proxy pod failed to get up after restart #63064

k8s-1.10: One of the kube-proxy pod failed to get up after restart #63064

Comments

saurabh-chordiya commented Apr 24, 2018 • edited Loading

docker version

kubectl describe pod --namespace=kube-system kube-proxy-pj4xw

saurabh-chordiya commented Apr 24, 2018 • edited Loading

saurabh-chordiya commented Apr 26, 2018 • edited Loading

jsafrane commented Apr 27, 2018

saurabh-chordiya commented Apr 30, 2018 • edited Loading

jsafrane commented May 4, 2018

saurabh-chordiya commented May 4, 2018 • edited Loading

fejta-bot commented Aug 2, 2018

fejta-bot commented Sep 1, 2018

fejta-bot commented Oct 1, 2018

k8s-ci-robot commented Oct 1, 2018

saurabh-chordiya commented Apr 24, 2018 •

edited

Loading

saurabh-chordiya commented Apr 24, 2018 •

edited

Loading

saurabh-chordiya commented Apr 26, 2018 •

edited

Loading

saurabh-chordiya commented Apr 30, 2018 •

edited

Loading

saurabh-chordiya commented May 4, 2018 •

edited

Loading