Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't try to read/write if connection is closed #46

Merged
merged 2 commits into from
Mar 9, 2015

Conversation

ncdc
Copy link
Contributor

@ncdc ncdc commented Feb 27, 2015

There appears to be an issue on OS X where the following sequence of
events occurs, leading to a deadlock:

  1. Remote side closes its connection
  2. Local side receives io.EOF trying to read the next frame
  3. Local idleAwareFramer monitor goroutine exits, so nothing is
    receiving from resetChan any more
  4. Code using spdystream isn't aware of the EOF and tries to Reset() all
    streams and Close() the connection
  5. The attempt to write the stream reset frame does not return an error
    when it logically should
  6. The idleAwareFramer tries to send to the resetChan
  7. In some instances, resetChan's buffer is full (because the monitor
    isn't running any more)
  8. We block here

To address this, add checks at the beginning of ReadFrame and WriteFrame
in idleAwareFramer to see if the connection's closeChan is closed. If
so, immediately return an error instead of trying to read/write using
the underlying framer.

Also remove some timeouts in the test code - these should be handled by
passing -test.timeout to the test executable instead.

@ncdc ncdc force-pushed the conn-close-deadlock branch from 5f8c436 to 7d6b2c7 Compare February 27, 2015 21:08
@ncdc
Copy link
Contributor Author

ncdc commented Feb 27, 2015

@dmcgowan @smarterclayton

Hopefully this will fix kubernetes/kubernetes#4882. If you comment out https://github.com/docker/spdystream/pull/46/files#diff-8f8a79f3f89c69a28dd878de98384cf9R92 and https://github.com/docker/spdystream/pull/46/files#diff-8f8a79f3f89c69a28dd878de98384cf9R109, it's pretty easy to make the test fail using this loop:

while true; do go test -run TestFramingAfterRemoteConnectionClosed -timeout 1s; if [ $? != 0 ]; then break; fi; done

@ncdc ncdc mentioned this pull request Mar 2, 2015
@ncdc ncdc force-pushed the conn-close-deadlock branch from 7d6b2c7 to 3f68bce Compare March 2, 2015 16:43
@ncdc
Copy link
Contributor Author

ncdc commented Mar 2, 2015

@dmcgowan I'm not sure if this is the best way to fix the issue on OS X, but it does seem to resolve the problem.

@dmcgowan
Copy link
Member

dmcgowan commented Mar 2, 2015

I think the proper solution would involve a lock around the call to keep a close from occurring when a read/write is attempted. There is always a mismatch between reading/writing from channels and reading/writing from streams, a select cannot handle both. This solution is not the correct one since at the time WriteFrame is called there is still no protection from close.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 2, 2015

What about the case where the remote end closes the connection? There's no way to keep a close from occurring in that instance.

Also, we discussed last week the option of creating a drain goroutine that runs to drain resetChan. That's not pretty, but it would also fix this particular issue.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 2, 2015

Or are you just talking about putting a lock around conn.closeChan?

@dmcgowan
Copy link
Member

dmcgowan commented Mar 2, 2015

The write lock would need to be around closeChan and read lock in WriteFrame and ReadFrame.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 2, 2015

Even with a lock, couldn't you still end up in the situation where the monitor goroutine exits and attempts to read from/write to the connection via the framer succeed even though the connection is closed (what we're seeing on OS X)?

@ncdc
Copy link
Contributor Author

ncdc commented Mar 2, 2015

What if instead, we have a bool in the idleAwareFramer that indicates if the monitor goroutine is alive or not? We can put a lock around that, and have ReadFrame and WriteFrame check that bool's status before continuing.

Andy Goldstein and others added 2 commits March 9, 2015 15:45
There appears to be an issue on OS X where the following sequence of
events occurs, leading to a deadlock:

1. Remote side closes its connection
2. Local side receives io.EOF trying to read the next frame
3. Local idleAwareFramer monitor goroutine exits, so nothing is
receiving from resetChan any more
4. Code using spdystream isn't aware of the EOF and tries to Reset() all
streams and Close() the connection
5. The attempt to write the stream reset frame does not return an error
when it logically should
6. The idleAwareFramer tries to send to the resetChan
7. In some instances, resetChan's buffer is full (because the monitor
isn't running any more)
8. We block here

To address this, add checks at the beginning of ReadFrame and WriteFrame
in idleAwareFramer to see if the connection's closeChan is closed. If
so, immediately return an error instead of trying to read/write using
the underlying framer.

Also remove some timeouts in the test code - these should be handled by
passing -test.timeout to the test executable instead.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
Reset logic currently allows for writing to the reset chan after monitor has exited. Once monitor has detected a close, no more frames should be written.

Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
@ncdc ncdc force-pushed the conn-close-deadlock branch from 3f68bce to 63b695f Compare March 9, 2015 19:46
@dmcgowan
Copy link
Member

dmcgowan commented Mar 9, 2015

LGTM

dmcgowan added a commit that referenced this pull request Mar 9, 2015
Don't try to read/write if connection is closed
@dmcgowan dmcgowan merged commit e731c8f into moby:master Mar 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants