Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH Dialer can hang forever. #23835

Closed
cjcullen opened this issue Apr 4, 2016 · 6 comments
Closed

SSH Dialer can hang forever. #23835

cjcullen opened this issue Apr 4, 2016 · 6 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@cjcullen
Copy link
Member

cjcullen commented Apr 4, 2016

The ssh default timeout is "0" (no timeout). This appears to cause problem for ssh tunnels. A tunnel open attempt can hang indefinitely, and we don't try to kill in-progress open attempts. Most of the time, a failed open attempt will time out after the standard 127 second TCP timeout, but it appears that sometimes it hangs after the TCP connection is established, in the TLS handshake.

@a-robinson a-robinson added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Apr 4, 2016
@a-robinson
Copy link
Contributor

Looks like the SSH timeout was just added to crypt/ssh 10 days ago, so it isn't in any released version of go yet...
golang/crypto@c7e3b0e

@a-robinson
Copy link
Contributor

And CJ's stack trace makes it clear that the problem isn't actually in net.Dial hanging, but ssh.NewClientConn, which relies on net.Dial having successfully returned.

I'll open a new issue against the crypt/ssh package, but it looks like we may have to hack our own timeout logic in on top of ssh.Dial.

goroutine 85851 [IO wait, 178 minutes]:
net.(*pollDesc).Wait(0xc20ae2e840, 0x72, 0x0, 0x0)
    /usr/src/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc20ae2e840, 0x0, 0x0)
    /usr/src/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc20ae2e7e0, 0xc20c38b2d0, 0x1, 0x1, 0x0, 0x7f20600e52a0, 0xc20c38b2d8)
    /usr/src/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20b7add18, 0xc20c38b2d0, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/net.go:121 +0xdc
io.ReadAtLeast(0x7f20600eb598, 0xc20b7add18, 0xc20c38b2d0, 0x1, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/src/go/src/io/io.go:298 +0xf1
io.ReadFull(0x7f20600eb598, 0xc20b7add18, 0xc20c38b2d0, 0x1, 0x1, 0x40, 0x0, 0x0)
    /usr/src/go/src/io/io.go:316 +0x6d
golang.org/x/crypto/ssh.readVersion(0x7f20600eb598, 0xc20b7add18, 0x0, 0x0, 0x0, 0x0, 0x0)
    /go/src/k8s.io/kubernetes/Godeps/_workspace/src/golang.org/x/crypto/ssh/transport.go:303 +0x167
golang.org/x/crypto/ssh.exchangeVersions(0x7f205ff485c0, 0xc20b7add18, 0xc20c38b2c0, 0xa, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0)
    /go/src/k8s.io/kubernetes/Godeps/_workspace/src/golang.org/x/crypto/ssh/transport.go:287 +0x2f1
golang.org/x/crypto/ssh.(*connection).clientHandshake(0xc20cdb9100, 0xc20bdf6380, 0x11, 0xc20b56e1e0, 0x0, 0x0)
    /go/src/k8s.io/kubernetes/Godeps/_workspace/src/golang.org/x/crypto/ssh/client.go:91 +0x132
golang.org/x/crypto/ssh.NewClientConn(0x7f20600ea500, 0xc20b7add18, 0xc20bdf6380, 0x11, 0xc20b56e140, 0x0, 0x0, 0x0, 0xe, 0x0, ...)
    /go/src/k8s.io/kubernetes/Godeps/_workspace/src/golang.org/x/crypto/ssh/client.go:74 +0x140
golang.org/x/crypto/ssh.Dial(0x1d48080, 0x3, 0xc20bdf6380, 0x11, 0xc20b56e140, 0x11, 0x0, 0x0)
    /go/src/k8s.io/kubernetes/Godeps/_workspace/src/golang.org/x/crypto/ssh/client.go:176 +0xf9
k8s.io/kubernetes/pkg/ssh.(*SSHTunnel).Open(0xc20d0490e0, 0x0, 0x0)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/ssh/ssh.go:114 +0xbc
k8s.io/kubernetes/pkg/ssh.(*SSHTunnelList).createAndAddTunnel(0xc208228360, 0xc2085c6720, 0xe)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/ssh/ssh.go:433 +0x331
created by k8s.io/kubernetes/pkg/ssh.(*SSHTunnelList).removeAndReAdd
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/ssh/ssh.go:345 +0x3ae

@a-robinson
Copy link
Contributor

Given how hard this looks like it should be to trigger, it feels more like a P1 than a P0. Although we've seen it twice in the last couple days alone...

@a-robinson a-robinson added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Apr 4, 2016
@a-robinson
Copy link
Contributor

Opened golang/go#15113

@a-robinson a-robinson assigned cjcullen and unassigned a-robinson Apr 4, 2016
@a-robinson
Copy link
Contributor

CJ will add in some sort of timing out in our code as a stop-gap until the underlying issue can be fixed in crypt/ssh. It'll involve leaking the stuck goroutines, but is better than apiservers getting into an unrecoverable state of having no working tunnels.

k8s-github-robot pushed a commit that referenced this issue Apr 5, 2016
Automatic merge from submit-queue

Add a timeout to the sshDialer to prevent indefinite hangs.

Prevents the SSH Dialer from hanging forever. Fixes a problem where SSH Tunnels get stuck trying to open.

Addresses #23835.
@cjcullen
Copy link
Member Author

Fixed by #23843. Released in version 1.2.2.

k8s-github-robot pushed a commit that referenced this issue Jul 14, 2016
Automatic merge from submit-queue

Add a customized ssh dialer that will timeout

Fix #23835.

@a-robinson @cjcullen @lavalamp
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this issue Sep 20, 2019
BUG 1753012: UPSTREAM: 82830: Do not query the cloud if PV has all the labels

Origin-commit: c3e5798054c3c1dbb006f46205106c981c008c26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

2 participants