Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port collision between kube-proxy and node-problem-detector (standalone mode) #49263

Closed
MrHohn opened this issue Jul 20, 2017 · 16 comments · Fixed by #49316
Closed

Port collision between kube-proxy and node-problem-detector (standalone mode) #49263

MrHohn opened this issue Jul 20, 2017 · 16 comments · Fixed by #49316
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@MrHohn
Copy link
Member

MrHohn commented Jul 20, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Since k8s v1.7, kube-proxy starts listening on 0.0.0.0:10256 to serve default /healthz traffic from L4 external/internal loadbalancer (#44968).

When node-problem-detector runs in standalone mode, it is a real daemon on node. By default, it will bind 127.0.0.1:10256 for the use of node problem detector server (https://github.com/kubernetes/node-problem-detector/blob/v0.4/cmd/options/options.go#L66-L67).

This breaks kube-proxy's default healthz server:

E0719 22:13:54.263011       5 healthcheck.go:302] Failed to start healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use

What you expected to happen:
kube-proxy and node-problem-detector should use different ports.

How to reproduce it (as minimally and precisely as possible):
Create a 1.7 cluster with node-problem-detector enabled with standalone mode.

Anything else we need to know?:
/assign
/sig network
cc @ajitak @Random-Liu @nicksardo

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration**:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jul 20, 2017
@dixudx
Copy link
Member

dixudx commented Jul 20, 2017

/cc

hh pushed a commit to ii/kubernetes that referenced this issue Jul 20, 2017
Automatic merge from submit-queue

Use custom port for node-problem-detector

It fixes kubernetes#49263

```release-note
Use port 20256 for node-problem-detector in standalone mode.
```
k8s-github-robot pushed a commit that referenced this issue Jul 21, 2017
…sion

Automatic merge from submit-queue

Bump up gce minNodesHealthCheckVersion due to known issues

**What this PR does / why we need it**: There are some known issues in previous 1.7 versions causing kube-proxy not correctly responding healthz traffic.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263.

**Special notes for your reviewer**:
/assign @nicksardo @freehan 
cc @bowei @thockin 

**Release note**:

```release-note
GCE Cloud Provider: New created LoadBalancer type Service will have health checks for nodes by default if all nodes have version >= v1.7.2.
```
k8s-github-robot pushed a commit that referenced this issue Jul 22, 2017
Automatic merge from submit-queue (batch tested with PRs 49409, 49352, 49266, 48418)

[e2e] Also verify content returned by kube-proxy healthz url

**What this PR does / why we need it**: Enhance kube-proxy url test. This helps to detect the port collision case --- node-problem-detector also serves /healthz to return 200 ok. Verify the content to confirm /healthz is served by kube-proxy.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263

**Special notes for your reviewer**:
/assign @freehan @nicksardo 

**Release note**:

```release-note
NONE
```
k8s-github-robot pushed a commit that referenced this issue Aug 2, 2017
…ails

Automatic merge from submit-queue (batch tested with PRs 49992, 48861, 49267, 49356, 49886)

Emit event and retry when fail to start healthz server on kube-proxy

**What this PR does / why we need it**: Enhance kube-proxy's logic when fail to start healthz server.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: From #49263.

**Special notes for your reviewer**:
/assign @thockin @nicksardo @bowei 

**Release note**:

```release-note
NONE
```
@kumudt
Copy link

kumudt commented Mar 19, 2018

I am facing this issue on Kubernetes 1.8.9 (After upgrading via Kops from 1.7.11). Is the fix available for 1.8.9?

@MrHohn
Copy link
Member Author

MrHohn commented Mar 19, 2018

@kumudt This should not longer be an issue for 1.7.2+ cluster. Are you observing the "bind: address already in use" error from kube-proxy logs? Could you confirm what component is occupying the same port?

@kumudt
Copy link

kumudt commented Mar 26, 2018

@MrHohn Yes. In the describe nodes, I can see an event with message Failed to start node healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use.

K8s Version: v1.8.9
Installed with Kops 1.8.1 on AWS
OS Image: Debian GNU/Linux 9 (stretch)
Kernel Version: 4.9.0-6-amd64
Docker Run Time: docker://1.13.1

@MrHohn
Copy link
Member Author

MrHohn commented Mar 26, 2018

@kumudt Could you confirm what component is occupying port 10256? Have you tried netstat -l on the node?

@kumudt
Copy link

kumudt commented Mar 26, 2018

@MrHohn didn't find much in netstat -l
For port 10256, I could only see this.
tcp6 0 0 [::]:10256 [::]:* LISTEN

@MrHohn
Copy link
Member Author

MrHohn commented Mar 26, 2018

@kumudt Sorry should have included the program flag, try sudo netstat -tulpn | grep LISTEN?

@kumudt
Copy link

kumudt commented Mar 27, 2018

This shows that 10256 is listening by kube-proxy

tcp6       0      0 :::10256                :::*                    LISTEN      30662/kube-proxy  

But, the node events and kube proxy logs shows the same error log.
This is happening for metrics-server as well.

server.go:480] starting metrics server failed: listen tcp 127.0.0.1:10249: bind: address already in use
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      30662/kube-proxy 

There are lot of retries on the kube-proxy pod though. Maybe it is trying to collide with it's own process. This is happening on only one of the nodes in our cluster. Tried restarting / deleting the node but no use.

@kumudt
Copy link

kumudt commented Mar 27, 2018

Just another observation.
This is happening not just for the metrics server / healthz, it's happening for all the node ports that are exposed.

@MrHohn
Copy link
Member Author

MrHohn commented Mar 27, 2018

@kumudt Sounds like you are having a different issue other than just port collision. It would be better to file a new bug (with sig-node?).

@disha94
Copy link

disha94 commented Apr 9, 2018

@MrHohn Facing a similar issue..
#61901

@arminmor
Copy link

arminmor commented Apr 26, 2018

I am facing the same issue here.

I have a cluster of 10 nodes, 1 master, 1 etcd.

Services on 6 nodes (out of 10) can not "reach"/"be reached from" other containers. However, the other 4 nodes work perfectly and if I put pods (using nodeSelector) on them they can "reach"/"be reached from" other containers placed on the other nodes (4 nodes).

I checked the describe node, kube-proxy and calico logs for the 6 nodes:

describe node:

Events:
 Type Reason Age From Message
 ---- ------ ---- ---- -------
 Warning FailedToStartNodeHealthcheck 3m (x73391 over 50d) kube-proxy, sael0688 Failed to start node healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use

kube-proxy:

1 server.go:483] starting metrics server failed: listen tcp 127.0.0.1:10249: bind: address already in use
1 proxier.go:1379] can't open "nodePort for ingress-nginx/ingress-nginx:http" (:30868/tcp), skipping this nodePort: listen tcp :30868: bind: address already in use
1 proxier.go:1379] can't open "nodePort for ingress-nginx/ingress-nginx:https" (:30344/tcp), skipping this nodePort: listen tcp :30344: bind: address already in use
1 healthcheck.go:317] Failed to start node healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use

calico-node:
[ERROR][101] health.go 193: Health endpoint failed, trying to restart it... error=listen tcp :9099: bind: address already in use

Does anybody know what causes this issue and how I should resolve this?

@MrHohn
Copy link
Member Author

MrHohn commented Apr 26, 2018

I am facing the same issue here.

@arminmor That is not the same issue. The issue reported here was specifically for the port collision between kube-proxy and node-problem-detector on port 10256, which was a configuration mistake.

What your encountered seems more like #61901 (opened by @disha94).

@arminmor
Copy link

@arminmor That is not the same issue. The issue reported here was specifically for the port collision between kube-proxy and node-problem-detector on port 10256, which was a configuration mistake.

What your encountered seems more like #61901 (opened by @disha94).

@MrHohn thanks! I sent my comments (updated) in #61901.

@aabed
Copy link

aabed commented May 31, 2018

I am suffering from the same problem
Kubernetes 1.10.3 on AWS using kops

@pablolibo
Copy link

the same problem, I am using EKS (Kubernetes 1.10.3) 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants