-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for max_concurrent_streams issue - Call mark_stream_closed before… #12463
Conversation
READY FOR REVIEW |
07b7235
to
a6e36e9
Compare
|
|
|
|
|
c1ffbc8
to
e45b3ec
Compare
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started reviewing then realized I had the same question each time.
mark_stream_closed calls fail_pending_writes. Is the race caused because fail_pending_writes needs to be called earlier? In that case could we just change mark_stream_closed?
It seems a little funny that we have to duplicate some checks, call fail_pending_writes, which is then called again from inside mark_stream_closed
@@ -1650,6 +1650,7 @@ static void force_client_rst_stream(grpc_exec_ctx *exec_ctx, void *sp, | |||
&t->qbuf, grpc_chttp2_rst_stream_create(s->id, GRPC_HTTP2_NO_ERROR, | |||
&s->stats.outgoing)); | |||
grpc_chttp2_initiate_write(exec_ctx, t, "force_rst_stream"); | |||
grpc_chttp2_fail_pending_writes(exec_ctx, t, s, GRPC_ERROR_NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this not be invoked from mark_stream_closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue is not with fail_pending_writes, but with a race while sending trailing metadata, and actually deleting the stream from the stream_map. The stream_map size is currently used to check the number of streams in use currently. Earlier, the server would send the trailing metadata and only remove the stream after that, which gave rise to the possibility of the client sending another stream before the original was removed, creating the flake that we see.
The solution was to call mark_stream_closed before sending the data. To do this, I had to remove fail_pending_writes (from mark_stream_closed) which was causing a few issues.
(I actually got confused when I saw your comment. I thought maybe that I had left removing the line from mark_stream_closed, but then I realized that your link points to the current grpc master, and not the pull request :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[accidentally wrote and delete comment. Re writing]
Oops, I was half reviewing your code, and half reviewing an old chrome tab open at master. The dangers of too many tabs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this just reminded me something. I changed the fix a bit and this was after noticing the fail_pending_writes. The fail_pending_writes change may no longer be required. The last I talked to Craig, as long as something is put on the qbuf, it should go out. Let me try adding the fail_pending_writes back in.
@@ -103,6 +103,9 @@ grpc_error *grpc_chttp2_rst_stream_parser_parse(grpc_exec_ctx *exec_ctx, | |||
GRPC_ERROR_INT_HTTP2_ERROR, (intptr_t)reason); | |||
gpr_free(message); | |||
} | |||
if (!s->write_closed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. Seems like this will occur from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work tracking down a difficult, racy bug. LGTM
@@ -1650,6 +1650,7 @@ static void force_client_rst_stream(grpc_exec_ctx *exec_ctx, void *sp, | |||
&t->qbuf, grpc_chttp2_rst_stream_create(s->id, GRPC_HTTP2_NO_ERROR, | |||
&s->stats.outgoing)); | |||
grpc_chttp2_initiate_write(exec_ctx, t, "force_rst_stream"); | |||
grpc_chttp2_fail_pending_writes(exec_ctx, t, s, GRPC_ERROR_NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[accidentally wrote and delete comment. Re writing]
Oops, I was half reviewing your code, and half reviewing an old chrome tab open at master. The dangers of too many tabs...
The change is much cleaner and smaller now. Earlier, I had added the mark_stream_closed much earlier than necessary which was causing the fail_pending_writes to result in issues. At the current placement of mark_stream_closed in writing.c, there is no need for changing mark_stream_closed or bring out fail_pending_writes :) |
Much cleaner, nice. re-LGTM |
Jenkins: retest this please |
|
cbc92c7
to
a6ede3b
Compare
|
|
Fix for max_concurrent_streams issue - Call mark_stream_closed before…