-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSH Dialer can hang forever. #23835
Comments
Looks like the SSH timeout was just added to crypt/ssh 10 days ago, so it isn't in any released version of go yet... |
And CJ's stack trace makes it clear that the problem isn't actually in net.Dial hanging, but ssh.NewClientConn, which relies on net.Dial having successfully returned. I'll open a new issue against the crypt/ssh package, but it looks like we may have to hack our own timeout logic in on top of ssh.Dial.
|
Given how hard this looks like it should be to trigger, it feels more like a P1 than a P0. Although we've seen it twice in the last couple days alone... |
Opened golang/go#15113 |
CJ will add in some sort of timing out in our code as a stop-gap until the underlying issue can be fixed in crypt/ssh. It'll involve leaking the stuck goroutines, but is better than apiservers getting into an unrecoverable state of having no working tunnels. |
Automatic merge from submit-queue Add a timeout to the sshDialer to prevent indefinite hangs. Prevents the SSH Dialer from hanging forever. Fixes a problem where SSH Tunnels get stuck trying to open. Addresses #23835.
Fixed by #23843. Released in version 1.2.2. |
Automatic merge from submit-queue Add a customized ssh dialer that will timeout Fix #23835. @a-robinson @cjcullen @lavalamp
BUG 1753012: UPSTREAM: 82830: Do not query the cloud if PV has all the labels Origin-commit: c3e5798054c3c1dbb006f46205106c981c008c26
The ssh default timeout is "0" (no timeout). This appears to cause problem for ssh tunnels. A tunnel open attempt can hang indefinitely, and we don't try to kill in-progress open attempts. Most of the time, a failed open attempt will time out after the standard 127 second TCP timeout, but it appears that sometimes it hangs after the TCP connection is established, in the TLS handshake.
The text was updated successfully, but these errors were encountered: