Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for max_concurrent_streams issue - Call mark_stream_closed before… #12463

Merged
merged 3 commits into from
Sep 13, 2017

Conversation

yashykt
Copy link
Member

@yashykt yashykt commented Sep 8, 2017

Fix for max_concurrent_streams issue - Call mark_stream_closed before…

@yashykt
Copy link
Member Author

yashykt commented Sep 8, 2017

READY FOR REVIEW

@yashykt yashykt force-pushed the max_concurrent_streams_fix branch from 07b7235 to a6e36e9 Compare September 9, 2017 00:25
@grpc-kokoro
Copy link

Corrupt JSON data (indicates timeout or crash): 
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_256k_256k_4M.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_16M_16M_4M.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_256k_256k_64M.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_256k_256k_256k.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_1_256k_64M.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_1_256k_4M.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_1_16M_4M.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_16M_16M_256k.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_1_16M_256k.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_1_256k_16k.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_256k_256k_16k.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_1_16M_64M.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_1_256k_256k.opt.new: 4
    bm_fullstack_trickle.BM_PumpUnbalancedUnary_Trickle_16M_16M_64M.opt.new: 4


[trickle] No significant performance differences

@grpc-kokoro
Copy link

[trickle] No significant performance differences

@grpc-kokoro
Copy link

[microbenchmarks] Performance differences noted:
Benchmark                                                    cpu_time    real_time
-----------------------------------------------------------  ----------  -----------
BM_HpackParserParseHeader<NonIndexedBinaryElem<100, false>>  +4%         +5%

@grpc-kokoro
Copy link

[trickle] No significant performance differences

@grpc-kokoro
Copy link

[microbenchmarks] No significant performance differences

@yashykt
Copy link
Member Author

yashykt commented Sep 11, 2017

Known Issues
Basic Tests C & C++ Linux [opt] : #11737
Basic Tests Linux (opt) : #11737
Cloud To Prod Interop : #10763
Msan C (Internal CI) : #12307
UBsan C (Internal CI) : #11528

@yashykt yashykt force-pushed the max_concurrent_streams_fix branch from c1ffbc8 to e45b3ec Compare September 11, 2017 17:40
@yashykt yashykt requested a review from ncteisen September 11, 2017 18:01
@grpc-kokoro
Copy link

[trickle] No significant performance differences

@grpc-kokoro
Copy link

[microbenchmarks] No significant performance differences

@ncteisen ncteisen self-assigned this Sep 11, 2017
Copy link
Contributor

@ncteisen ncteisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing then realized I had the same question each time.

mark_stream_closed calls fail_pending_writes. Is the race caused because fail_pending_writes needs to be called earlier? In that case could we just change mark_stream_closed?

It seems a little funny that we have to duplicate some checks, call fail_pending_writes, which is then called again from inside mark_stream_closed

@@ -1650,6 +1650,7 @@ static void force_client_rst_stream(grpc_exec_ctx *exec_ctx, void *sp,
&t->qbuf, grpc_chttp2_rst_stream_create(s->id, GRPC_HTTP2_NO_ERROR,
&s->stats.outgoing));
grpc_chttp2_initiate_write(exec_ctx, t, "force_rst_stream");
grpc_chttp2_fail_pending_writes(exec_ctx, t, s, GRPC_ERROR_NONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not be invoked from mark_stream_closed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is not with fail_pending_writes, but with a race while sending trailing metadata, and actually deleting the stream from the stream_map. The stream_map size is currently used to check the number of streams in use currently. Earlier, the server would send the trailing metadata and only remove the stream after that, which gave rise to the possibility of the client sending another stream before the original was removed, creating the flake that we see.

The solution was to call mark_stream_closed before sending the data. To do this, I had to remove fail_pending_writes (from mark_stream_closed) which was causing a few issues.
(I actually got confused when I saw your comment. I thought maybe that I had left removing the line from mark_stream_closed, but then I realized that your link points to the current grpc master, and not the pull request :))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[accidentally wrote and delete comment. Re writing]

Oops, I was half reviewing your code, and half reviewing an old chrome tab open at master. The dangers of too many tabs...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this just reminded me something. I changed the fix a bit and this was after noticing the fail_pending_writes. The fail_pending_writes change may no longer be required. The last I talked to Craig, as long as something is put on the qbuf, it should go out. Let me try adding the fail_pending_writes back in.

@@ -103,6 +103,9 @@ grpc_error *grpc_chttp2_rst_stream_parser_parse(grpc_exec_ctx *exec_ctx,
GRPC_ERROR_INT_HTTP2_ERROR, (intptr_t)reason);
gpr_free(message);
}
if (!s->write_closed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question. Seems like this will occur from here

Copy link
Contributor

@ncteisen ncteisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work tracking down a difficult, racy bug. LGTM

@@ -1650,6 +1650,7 @@ static void force_client_rst_stream(grpc_exec_ctx *exec_ctx, void *sp,
&t->qbuf, grpc_chttp2_rst_stream_create(s->id, GRPC_HTTP2_NO_ERROR,
&s->stats.outgoing));
grpc_chttp2_initiate_write(exec_ctx, t, "force_rst_stream");
grpc_chttp2_fail_pending_writes(exec_ctx, t, s, GRPC_ERROR_NONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[accidentally wrote and delete comment. Re writing]

Oops, I was half reviewing your code, and half reviewing an old chrome tab open at master. The dangers of too many tabs...

@yashykt
Copy link
Member Author

yashykt commented Sep 11, 2017

The change is much cleaner and smaller now. Earlier, I had added the mark_stream_closed much earlier than necessary which was causing the fail_pending_writes to result in issues. At the current placement of mark_stream_closed in writing.c, there is no need for changing mark_stream_closed or bring out fail_pending_writes :)

@ncteisen
Copy link
Contributor

Much cleaner, nice. re-LGTM

@yashykt
Copy link
Member Author

yashykt commented Sep 12, 2017

Jenkins: retest this please

@grpc-kokoro
Copy link

[trickle] No significant performance differences

@yashykt yashykt force-pushed the max_concurrent_streams_fix branch from cbc92c7 to a6ede3b Compare September 12, 2017 22:20
@grpc-kokoro
Copy link

[trickle] No significant performance differences

@grpc-kokoro
Copy link

Corrupt JSON data (indicates timeout or crash): 
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_InProcessCHTTP2_NoOpMutator_NoOpMutator__2M_2_0.opt.old: 1
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_InProcessCHTTP2_NoOpMutator_NoOpMutator__2M_1_0.counters.old: 1


[microbenchmarks] No significant performance differences

@yashykt
Copy link
Member Author

yashykt commented Sep 13, 2017

Known issues : #12527, #11055, #11737, #12307

@yashykt yashykt deleted the max_concurrent_streams_fix branch March 14, 2018 22:12
@lock lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants