-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't try to read/write if connection is closed #46
Conversation
5f8c436
to
7d6b2c7
Compare
Hopefully this will fix kubernetes/kubernetes#4882. If you comment out https://github.com/docker/spdystream/pull/46/files#diff-8f8a79f3f89c69a28dd878de98384cf9R92 and https://github.com/docker/spdystream/pull/46/files#diff-8f8a79f3f89c69a28dd878de98384cf9R109, it's pretty easy to make the test fail using this loop:
|
7d6b2c7
to
3f68bce
Compare
@dmcgowan I'm not sure if this is the best way to fix the issue on OS X, but it does seem to resolve the problem. |
I think the proper solution would involve a lock around the call to keep a close from occurring when a read/write is attempted. There is always a mismatch between reading/writing from channels and reading/writing from streams, a select cannot handle both. This solution is not the correct one since at the time WriteFrame is called there is still no protection from close. |
What about the case where the remote end closes the connection? There's no way to keep a close from occurring in that instance. Also, we discussed last week the option of creating a drain goroutine that runs to drain resetChan. That's not pretty, but it would also fix this particular issue. |
Or are you just talking about putting a lock around conn.closeChan? |
The write lock would need to be around closeChan and read lock in WriteFrame and ReadFrame. |
Even with a lock, couldn't you still end up in the situation where the monitor goroutine exits and attempts to read from/write to the connection via the framer succeed even though the connection is closed (what we're seeing on OS X)? |
What if instead, we have a bool in the idleAwareFramer that indicates if the monitor goroutine is alive or not? We can put a lock around that, and have ReadFrame and WriteFrame check that bool's status before continuing. |
There appears to be an issue on OS X where the following sequence of events occurs, leading to a deadlock: 1. Remote side closes its connection 2. Local side receives io.EOF trying to read the next frame 3. Local idleAwareFramer monitor goroutine exits, so nothing is receiving from resetChan any more 4. Code using spdystream isn't aware of the EOF and tries to Reset() all streams and Close() the connection 5. The attempt to write the stream reset frame does not return an error when it logically should 6. The idleAwareFramer tries to send to the resetChan 7. In some instances, resetChan's buffer is full (because the monitor isn't running any more) 8. We block here To address this, add checks at the beginning of ReadFrame and WriteFrame in idleAwareFramer to see if the connection's closeChan is closed. If so, immediately return an error instead of trying to read/write using the underlying framer. Also remove some timeouts in the test code - these should be handled by passing -test.timeout to the test executable instead. Signed-off-by: Andy Goldstein <agoldste@redhat.com>
Reset logic currently allows for writing to the reset chan after monitor has exited. Once monitor has detected a close, no more frames should be written. Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
3f68bce
to
63b695f
Compare
LGTM |
Don't try to read/write if connection is closed
There appears to be an issue on OS X where the following sequence of
events occurs, leading to a deadlock:
receiving from resetChan any more
streams and Close() the connection
when it logically should
isn't running any more)
To address this, add checks at the beginning of ReadFrame and WriteFrame
in idleAwareFramer to see if the connection's closeChan is closed. If
so, immediately return an error instead of trying to read/write using
the underlying framer.
Also remove some timeouts in the test code - these should be handled by
passing -test.timeout to the test executable instead.