-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port collision between kube-proxy and node-problem-detector (standalone mode) #49263
Comments
/cc |
Automatic merge from submit-queue Use custom port for node-problem-detector It fixes kubernetes#49263 ```release-note Use port 20256 for node-problem-detector in standalone mode. ```
…sion Automatic merge from submit-queue Bump up gce minNodesHealthCheckVersion due to known issues **What this PR does / why we need it**: There are some known issues in previous 1.7 versions causing kube-proxy not correctly responding healthz traffic. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263. **Special notes for your reviewer**: /assign @nicksardo @freehan cc @bowei @thockin **Release note**: ```release-note GCE Cloud Provider: New created LoadBalancer type Service will have health checks for nodes by default if all nodes have version >= v1.7.2. ```
Automatic merge from submit-queue (batch tested with PRs 49409, 49352, 49266, 48418) [e2e] Also verify content returned by kube-proxy healthz url **What this PR does / why we need it**: Enhance kube-proxy url test. This helps to detect the port collision case --- node-problem-detector also serves /healthz to return 200 ok. Verify the content to confirm /healthz is served by kube-proxy. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263 **Special notes for your reviewer**: /assign @freehan @nicksardo **Release note**: ```release-note NONE ```
…ails Automatic merge from submit-queue (batch tested with PRs 49992, 48861, 49267, 49356, 49886) Emit event and retry when fail to start healthz server on kube-proxy **What this PR does / why we need it**: Enhance kube-proxy's logic when fail to start healthz server. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263. **Special notes for your reviewer**: /assign @thockin @nicksardo @bowei **Release note**: ```release-note NONE ```
I am facing this issue on Kubernetes 1.8.9 (After upgrading via Kops from 1.7.11). Is the fix available for 1.8.9? |
@kumudt This should not longer be an issue for 1.7.2+ cluster. Are you observing the "bind: address already in use" error from kube-proxy logs? Could you confirm what component is occupying the same port? |
@MrHohn Yes. In the describe nodes, I can see an event with message Failed to start node healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use. K8s Version: v1.8.9 |
@kumudt Could you confirm what component is occupying port 10256? Have you tried |
@MrHohn didn't find much in netstat -l |
@kumudt Sorry should have included the program flag, try |
This shows that 10256 is listening by kube-proxy
But, the node events and kube proxy logs shows the same error log.
There are lot of retries on the kube-proxy pod though. Maybe it is trying to collide with it's own process. This is happening on only one of the nodes in our cluster. Tried restarting / deleting the node but no use. |
Just another observation. |
@kumudt Sounds like you are having a different issue other than just port collision. It would be better to file a new bug (with sig-node?). |
I am facing the same issue here. I have a cluster of 10 nodes, 1 master, 1 etcd. Services on 6 nodes (out of 10) can not "reach"/"be reached from" other containers. However, the other 4 nodes work perfectly and if I put pods (using nodeSelector) on them they can "reach"/"be reached from" other containers placed on the other nodes (4 nodes). I checked the describe node, kube-proxy and calico logs for the 6 nodes: describe node:
kube-proxy:
calico-node: Does anybody know what causes this issue and how I should resolve this? |
I am suffering from the same problem |
the same problem, I am using EKS (Kubernetes 1.10.3) 👎 |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Since k8s v1.7, kube-proxy starts listening on 0.0.0.0:10256 to serve default /healthz traffic from L4 external/internal loadbalancer (#44968).
When node-problem-detector runs in
standalone
mode, it is a real daemon on node. By default, it will bind 127.0.0.1:10256 for the use of node problem detector server (https://github.com/kubernetes/node-problem-detector/blob/v0.4/cmd/options/options.go#L66-L67).This breaks kube-proxy's default healthz server:
What you expected to happen:
kube-proxy and node-problem-detector should use different ports.
How to reproduce it (as minimally and precisely as possible):
Create a 1.7 cluster with node-problem-detector enabled with standalone mode.
Anything else we need to know?:
/assign
/sig network
cc @ajitak @Random-Liu @nicksardo
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: