Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ztunnel fails with 'failed to bind to address [::1]:15053: Cannot assign requested address' #52858

Closed
2 tasks done
kovaxur opened this issue Aug 26, 2024 · 5 comments · Fixed by istio/ztunnel#1284
Closed
2 tasks done
Labels
area/ambient Issues related to ambient mesh area/networking

Comments

@kovaxur
Copy link

kovaxur commented Aug 26, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

Hi,
We had a strange issue, where the Istio gateway reported upstream connect error or disconnect/reset before headers. reset reason: connection termination randomly sometimes for certain endpoints. I went into the http1.1/2/timeout rabbit hole but then I realized, that the pods, where we get this error are not reachable from the gateway at all, when I try to manually curl to the pod, I just get a connection refused error for both 15008 and 8080 (app port).

Then I realized, that the ztunnel pod on a node of the cluster is in "not ready" state and logs the following error:

2024-08-26T16:58:16.711297Z    info    dns::server    starting local DNS server    address=localhost:15053 component="dns"                                                                
2024-08-26T16:58:16.711448Z    info    inpod::statemanager    retrying workload failed: failed to bind to address [::1]:15053: Cannot assign requested address (os error 99)    uid="c9df3bc2-8943-4f2d-8441-9beb07aa0a04"

container name: istio-proxy
container image: gcr.io/istio-testing/ztunnel:1.24-alpha.d334295f1866d584af78164ad99e86bedd44a6ac-distroless

The container is stuck in ready=false state, has 0 restarts.

ztunnel-x4kkj:/root$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 ztunnel-x4kkj:41944     istiod.istio-system.svc.cluster.local:15012 ESTABLISHED 
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:41562 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15020     grafana-agent-metrics-0.grafana-XXXXX.svc.cluster.local:52036 ESTABLISHED 
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:47706 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:34532 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:43054 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:44438 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:40054 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:49342 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:34520 TIME_WAIT   
tcp        0      0 ztunnel-x4kkj:15021     10-201-0-26.kubelet.default.svc.cluster.local:43056 TIME_WAIT   
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node Path
unix  3      [ ]         SEQPACKET  CONNECTED      32937 /var/run/ztunnel/ztunnel.sock
unix  3      [ ]         SEQPACKET  CONNECTED      32936 
unix  3      [ ]         STREAM     CONNECTED      32909 
unix  3      [ ]         STREAM     CONNECTED      32908 
ztunnel-x4kkj:/root$ netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:15000         0.0.0.0:*               LISTEN      -
tcp        0      0 ::1:15000               :::*                    LISTEN      -
tcp        0      0 :::15020                :::*                    LISTEN      -
tcp        0      0 :::15021                :::*                    LISTEN      -
ztunnel-x4kkj:/root$ netstat -unlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
ztunnel-x4kkj:/root$ lsof -i :15053
1	/bin/bash	0	/dev/pts/0
1	/bin/bash	1	/dev/pts/0
1	/bin/bash	2	/dev/pts/0
1	/bin/bash	255	/dev/pts/0

Trying to bind to port 15020 -> fails but it's ok, trying to bind to 15053 works without issue

ztunnel-x4kkj:/root$ nc -l 15020
nc: Address in use
ztunnel-x4kkj:/root$ nc -u -l 15053



^C

I'm using the alpha version due to #52260. Can this be related?

Version

Istio:
client version: 1.22.1
control plane version: 1.24-alpha.d334295f1866d584af78164ad99e86bedd44a6ac
data plane version: 1.23.0 (3 proxies), 1.24-alpha.d334295f1866d584af78164ad99e86bedd44a6ac (52 proxies)

Kubernetes:
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6-eks-db838b0

Additional Information

No response

@istio-policy-bot istio-policy-bot added area/ambient Issues related to ambient mesh area/networking labels Aug 26, 2024
@istio istio deleted a comment from MinecraftEarthVillage Aug 26, 2024
@howardjohn
Copy link
Member

Is there any chance you can run those same commands in the pod network namespace? and additional ip addr?

In the meantime, defaults.meshConfig.defaultConfig.proxyMetadata.ISTIO_META_DNS_CAPTURE is a known workaround to this problem. We are investigating in istio/ztunnel#1272. Right now, we have many users report it, but have been unable to get our hands on the debugging info we need to solve it (or a reproduction)

@joke
Copy link

joke commented Aug 27, 2024

I'm facing the same problem.

Unfortunately the workaround (ISTIO_META_DNS_CAPTURE=false) causes DNS resolution failures:

sts.eu-central-1.amazonaws.com on 172.20.0.10:53: read udp 100.64.109.34:57420->172.20.0.10:53: read: connection refused

@howardjohn
Copy link
Member

We really need the information (netstat -ntlp; netstat -nlup; ip addr; ip link from inside the failing pod network namespace (not ztunnel) to make progress here. We have had many users report this but no one provide this information, so we cannot do anything to solve it really.

I would be happy to even jump on a video call to walk someone through it. Please feel free to ping me on slack if you have this issue and are willing to troubleshoot.


Unfortunately the workaround (ISTIO_META_DNS_CAPTURE=false) causes DNS resolution failures:

Make sure you don't have values.cni.ambient.dnsCapture=true set (which will cause the redirection to the dns server you just disabled).

@joke
Copy link

joke commented Aug 28, 2024

@howardjohn contacted you via slack

howardjohn added a commit to howardjohn/ztunnel that referenced this issue Aug 28, 2024
howardjohn added a commit to howardjohn/ztunnel that referenced this issue Aug 28, 2024
istio-testing pushed a commit to istio-testing/ztunnel that referenced this issue Aug 28, 2024
@howardjohn
Copy link
Member

Thank you @joke and @bleggett for your help on slack, we have a fix ready in istio/ztunnel#1284. I was able to reproduce the issue both in a unit test and a live cluster, and the fix resolves the issue in both of these.

One word of warning is the fix is to make it so the retry of the failure succeeds. You may still see the error message, but it should resolve itself (the bug was that it never resolves).

I've slotted the fix to be cherrypicked to 1.23, so we should get this in for the upcoming 1.23.1 release.

As others have noted, one workaround for this problem in the meantime is to restart ztunnel or the istio-cni pod.

@howardjohn howardjohn changed the title Ztunnel pod: failed to bind to address 15053 Ztunnel fails with 'failed to bind to address [::1]:15053: Cannot assign requested address' Aug 28, 2024
istio-testing added a commit to istio/ztunnel that referenced this issue Aug 28, 2024
* zds: fix retrying a bad netns

Fixes istio/istio#52858

* Fix 1.23 changes

---------

Co-authored-by: John Howard <john.howard@solo.io>
antonioberben pushed a commit to antonioberben/ztunnel that referenced this issue Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh area/networking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants