Port collision between kube-proxy and node-problem-detector (standalone mode) #49263

MrHohn · 2017-07-20T06:07:58Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Since k8s v1.7, kube-proxy starts listening on 0.0.0.0:10256 to serve default /healthz traffic from L4 external/internal loadbalancer (#44968).

When node-problem-detector runs in standalone mode, it is a real daemon on node. By default, it will bind 127.0.0.1:10256 for the use of node problem detector server (https://github.com/kubernetes/node-problem-detector/blob/v0.4/cmd/options/options.go#L66-L67).

This breaks kube-proxy's default healthz server:

E0719 22:13:54.263011       5 healthcheck.go:302] Failed to start healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use

What you expected to happen:
kube-proxy and node-problem-detector should use different ports.

How to reproduce it (as minimally and precisely as possible):
Create a 1.7 cluster with node-problem-detector enabled with standalone mode.

Anything else we need to know?:
/assign
/sig network
cc @ajitak @Random-Liu @nicksardo

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration**:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

dixudx · 2017-07-20T07:33:08Z

/cc

Automatic merge from submit-queue Use custom port for node-problem-detector It fixes kubernetes#49263 ```release-note Use port 20256 for node-problem-detector in standalone mode. ```

@nicksardo

…sion Automatic merge from submit-queue Bump up gce minNodesHealthCheckVersion due to known issues **What this PR does / why we need it**: There are some known issues in previous 1.7 versions causing kube-proxy not correctly responding healthz traffic. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263. **Special notes for your reviewer**: /assign @nicksardo @freehan cc @bowei @thockin **Release note**: ```release-note GCE Cloud Provider: New created LoadBalancer type Service will have health checks for nodes by default if all nodes have version >= v1.7.2. ```

@freehan

Automatic merge from submit-queue (batch tested with PRs 49409, 49352, 49266, 48418) [e2e] Also verify content returned by kube-proxy healthz url **What this PR does / why we need it**: Enhance kube-proxy url test. This helps to detect the port collision case --- node-problem-detector also serves /healthz to return 200 ok. Verify the content to confirm /healthz is served by kube-proxy. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263 **Special notes for your reviewer**: /assign @freehan @nicksardo **Release note**: ```release-note NONE ```

@thockin

…ails Automatic merge from submit-queue (batch tested with PRs 49992, 48861, 49267, 49356, 49886) Emit event and retry when fail to start healthz server on kube-proxy **What this PR does / why we need it**: Enhance kube-proxy's logic when fail to start healthz server. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263. **Special notes for your reviewer**: /assign @thockin @nicksardo @bowei **Release note**: ```release-note NONE ```

kumudt · 2018-03-19T08:37:24Z

I am facing this issue on Kubernetes 1.8.9 (After upgrading via Kops from 1.7.11). Is the fix available for 1.8.9?

MrHohn · 2018-03-19T17:10:15Z

@kumudt This should not longer be an issue for 1.7.2+ cluster. Are you observing the "bind: address already in use" error from kube-proxy logs? Could you confirm what component is occupying the same port?

kumudt · 2018-03-26T13:44:12Z

@MrHohn Yes. In the describe nodes, I can see an event with message Failed to start node healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use.

K8s Version: v1.8.9
Installed with Kops 1.8.1 on AWS
OS Image: Debian GNU/Linux 9 (stretch)
Kernel Version: 4.9.0-6-amd64
Docker Run Time: docker://1.13.1

MrHohn · 2018-03-26T17:09:49Z

@kumudt Could you confirm what component is occupying port 10256? Have you tried netstat -l on the node?

kumudt · 2018-03-26T19:19:49Z

@MrHohn didn't find much in netstat -l
For port 10256, I could only see this.
tcp6 0 0 [::]:10256 [::]:* LISTEN

MrHohn · 2018-03-26T20:28:27Z

@kumudt Sorry should have included the program flag, try sudo netstat -tulpn | grep LISTEN?

kumudt · 2018-03-27T08:40:04Z

This shows that 10256 is listening by kube-proxy

tcp6       0      0 :::10256                :::*                    LISTEN      30662/kube-proxy

But, the node events and kube proxy logs shows the same error log.
This is happening for metrics-server as well.

server.go:480] starting metrics server failed: listen tcp 127.0.0.1:10249: bind: address already in use

tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      30662/kube-proxy

There are lot of retries on the kube-proxy pod though. Maybe it is trying to collide with it's own process. This is happening on only one of the nodes in our cluster. Tried restarting / deleting the node but no use.

kumudt · 2018-03-27T12:23:58Z

Just another observation.
This is happening not just for the metrics server / healthz, it's happening for all the node ports that are exposed.

MrHohn · 2018-03-27T17:10:52Z

@kumudt Sounds like you are having a different issue other than just port collision. It would be better to file a new bug (with sig-node?).

disha94 · 2018-04-09T12:24:33Z

@MrHohn Facing a similar issue..
#61901

arminmor · 2018-04-26T18:59:46Z

I am facing the same issue here.

I have a cluster of 10 nodes, 1 master, 1 etcd.

Services on 6 nodes (out of 10) can not "reach"/"be reached from" other containers. However, the other 4 nodes work perfectly and if I put pods (using nodeSelector) on them they can "reach"/"be reached from" other containers placed on the other nodes (4 nodes).

I checked the describe node, kube-proxy and calico logs for the 6 nodes:

describe node:

Events:
 Type Reason Age From Message
 ---- ------ ---- ---- -------
 Warning FailedToStartNodeHealthcheck 3m (x73391 over 50d) kube-proxy, sael0688 Failed to start node healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use

kube-proxy:

1 server.go:483] starting metrics server failed: listen tcp 127.0.0.1:10249: bind: address already in use
1 proxier.go:1379] can't open "nodePort for ingress-nginx/ingress-nginx:http" (:30868/tcp), skipping this nodePort: listen tcp :30868: bind: address already in use
1 proxier.go:1379] can't open "nodePort for ingress-nginx/ingress-nginx:https" (:30344/tcp), skipping this nodePort: listen tcp :30344: bind: address already in use
1 healthcheck.go:317] Failed to start node healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use

calico-node:
[ERROR][101] health.go 193: Health endpoint failed, trying to restart it... error=listen tcp :9099: bind: address already in use

Does anybody know what causes this issue and how I should resolve this?

MrHohn · 2018-04-26T21:25:27Z

I am facing the same issue here.

@arminmor That is not the same issue. The issue reported here was specifically for the port collision between kube-proxy and node-problem-detector on port 10256, which was a configuration mistake.

What your encountered seems more like #61901 (opened by @disha94).

arminmor · 2018-04-27T13:29:26Z

@arminmor That is not the same issue. The issue reported here was specifically for the port collision between kube-proxy and node-problem-detector on port 10256, which was a configuration mistake.

What your encountered seems more like #61901 (opened by @disha94).

@MrHohn thanks! I sent my comments (updated) in #61901.

aabed · 2018-05-31T21:55:28Z

I am suffering from the same problem
Kubernetes 1.10.3 on AWS using kops

pablolibo · 2018-06-28T22:56:29Z

the same problem, I am using EKS (Kubernetes 1.10.3) 👎

k8s-ci-robot assigned MrHohn Jul 20, 2017

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jul 20, 2017

This was referenced Jul 20, 2017

[e2e] Also verify content returned by kube-proxy healthz url #49266

Merged

Emit event and retry when fail to start healthz server on kube-proxy #49267

Merged

This was referenced Jul 20, 2017

change NPD port as there is a port collision with kube-proxy kubernetes/node-problem-detector#128

Merged

Use custom port for node-problem-detector #49316

Merged

nicksardo mentioned this issue Jul 20, 2017

FEATURE REQUEST: Support GCE Internal Load Balancer #33483

Closed

MrHohn mentioned this issue Jul 20, 2017

Bump up gce minNodesHealthCheckVersion due to known issues #49330

Merged

k8s-github-robot closed this as completed in #49316 Jul 20, 2017

MrHohn mentioned this issue Jul 20, 2017

Automated cherry pick of #49316 #49330 #49339

Merged

MrHohn mentioned this issue Aug 4, 2017

Add livenessProbe to kube-proxy templates #50118

Closed

cknowles mentioned this issue Jul 7, 2018

Constant calico node health endpoint failure logs kubernetes-retired/kube-aws#1381

Closed

martinstraesser mentioned this issue May 28, 2021

Proxy healthz start failure after node CRI maintenance #102392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port collision between kube-proxy and node-problem-detector (standalone mode) #49263

Port collision between kube-proxy and node-problem-detector (standalone mode) #49263

MrHohn commented Jul 20, 2017

dixudx commented Jul 20, 2017

kumudt commented Mar 19, 2018

MrHohn commented Mar 19, 2018

kumudt commented Mar 26, 2018

MrHohn commented Mar 26, 2018

kumudt commented Mar 26, 2018

MrHohn commented Mar 26, 2018

kumudt commented Mar 27, 2018 •

edited

Loading

kumudt commented Mar 27, 2018

MrHohn commented Mar 27, 2018

disha94 commented Apr 9, 2018

arminmor commented Apr 26, 2018 •

edited

Loading

MrHohn commented Apr 26, 2018

arminmor commented Apr 27, 2018

aabed commented May 31, 2018

pablolibo commented Jun 28, 2018

Port collision between kube-proxy and node-problem-detector (standalone mode) #49263

Port collision between kube-proxy and node-problem-detector (standalone mode) #49263

Comments

MrHohn commented Jul 20, 2017

dixudx commented Jul 20, 2017

kumudt commented Mar 19, 2018

MrHohn commented Mar 19, 2018

kumudt commented Mar 26, 2018

MrHohn commented Mar 26, 2018

kumudt commented Mar 26, 2018

MrHohn commented Mar 26, 2018

kumudt commented Mar 27, 2018 • edited Loading

kumudt commented Mar 27, 2018

MrHohn commented Mar 27, 2018

disha94 commented Apr 9, 2018

arminmor commented Apr 26, 2018 • edited Loading

MrHohn commented Apr 26, 2018

arminmor commented Apr 27, 2018

aabed commented May 31, 2018

pablolibo commented Jun 28, 2018

kumudt commented Mar 27, 2018 •

edited

Loading

arminmor commented Apr 26, 2018 •

edited

Loading