-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lot of restarts on Kube-proxy pod #61901
Comments
/sig cluster-ops |
The issue is still open if anyone can take a look at this?? |
@kubernetes/sig-node-bugs It seems like two instances of kube-proxy were running and causd one of them failed to bind any ports? |
I had the same issue after. Happened couple of days after 1.7.x -> 1.9.x cluster upgrade. Initially, everything was fine, but then I was deploying more services one node became NotReady, and the rest shortly followed. We're using Kops on AWS. Problem was solved by a complete cluster restart (workers scaled to 0, each master restarted, then workers scaled back to initial # |
I am facing the same issue here. I have a cluster of 10 nodes, 1 master, 1 etcd. Services on 6 nodes (out of 10) can not "reach"/"be reached from" other containers. However, the other 4 nodes work perfectly and if I put pods (using nodeSelector) on them they can "reach"/"be reached from" other containers placed on the other nodes (4 nodes). I checked the describe node, kube-proxy and calico logs for the 6 nodes: describe node:
kube-proxy:
calico-node: I also checked the nodes which I have issues on them and I realized that two different processes (different pids) with kube-proxy name are active, but only one of them is listening on port 10249 and 10256:
however on healthy nodes one pid does the job:
Does anybody know what causes this issue and how I should resolve this? |
FYI, deleting the kube-proxy and calico pods on the node with the issue resolved the problem form me.
|
Did you check if there were more than one kube-proxy process? I had the issue on 1 node in GKE where 2 kube-proxy processes where live. Fixed the issue by killing the processes on the node and deleting the corresponding pod. |
We also have multiple instances of kube-proxy on one of our nodes, but we have no idea how that occured, is this a known issue? |
We have noticed similar problem with exactly the same symptoms as described in this issue. As @arminmor mentioned in his description we too have noticed exactly the same This makes me think if there is some kind of correlation in cause->effect fashion i.e. This comment in a different issue mentions running out of disk space as a potential cause of similar problems. We too have noticed some "correlation" between resource pressure on the node and the |
the problem was solved with those versions Kube 1.10.3 |
Do you know the root cause @aabed ? Which one of those components were at fault and how? |
I really don't know the root cause after upgrading to 1.10.3 it disappeared in both cases Calico versions were the same so I'd say it was solved by upgrading kubernetes itself |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
What happened:
I am recently facing issues with Kubernetes version 1.8.9 as I am getting a lot of restarts on Kube-proxy pod for few nodes. Due to which all service pods scheduled on that node are crashing again and again.
What you expected to happen:
Stable setup
How to reproduce it (as minimally and precisely as possible):
Happening randomly on few of the nodes so not able to reproduce
Anything else we need to know?:
Environment:
kubectl version
): 1.8.9uname -a
): 4.9.0-6-amd64@kubernetes/sig-node
Could someone please help me with this as this is causing a lot of problems in our setup.
The text was updated successfully, but these errors were encountered: