-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: reuse gRPC streams across unary BatchRequest RPCs #136648
rpc: reuse gRPC streams across unary BatchRequest RPCs #136648
Conversation
This looks great, thank you for putting that together so quickly. |
I cherry-picked the stream reuse PR cockroachdb#136648 so that both are more likely to evolve in unison, and because I anticipated "piggybacking" on top of it. This did not materialize, but the important integration point is now side by side and should lend itself well to a future change that adds stream reuse for drpc as well. This commit roughly encompasses the following: 1. we change `*rpc.Connection` to also maintain a `drpcpool.Conn`. It took me a while to grok how the `drpcpool.Pool` is architected. Essentially, its `Get()` method gives you a handle to a `drpcpool.Conn` that reflects a specific use case of the pool. When a client is created on it, it gets assigned an actual `drpc.Conn` from the pool (dialing if necessary), and when the client is closed, this conn is returned to the pool. So in a sense, `drpcpool.Conn` is an actual pool for some keyed use case; `drpcpool.Pool` is simply the bucket in which the actual connections get pooled, but they're never pulled from or returned to it directly. We don't currently use the key since we create a pool per remote node, and if we're not sharing TCP conns they all look the same to us anyway (i.e. there's no point in DefaultClass vs SystemClass). If we squint, a `drpcoool.Conn` parallels `*grpc.ClientConn` in the sense that you can "just" make multiple clients on top of it. Of course internally a `*grpc.ClientConn` multiplexes multiple clients over one HTTP2 connection, whereas a `drpcpool.Conn` represents multiple independent TCP connections. The lifecycle of these pools is currently completely broken, and I was honestly surprised TestTestServerDRPC passes! Future commits will need to clean this up to the point where we can at least run a representative benchmark.
99a78b7
to
5b2e383
Compare
5b2e383
to
f334557
Compare
f334557
to
69262d5
Compare
@tbg this should be good for a review, at your convenience. You should also be able to take this and extend it to work with dRPC, though it will need some changes if dRPC does not support multiple streams sharing a single TCP connection. |
69262d5
to
2c2eb87
Compare
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
5ca46ff
to
4596eb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. A few small comments but
Are there any roachtests we think might find new unexpected behavior? For example, anything that has to do with DistSQL circuit breaking or failover (failover/*
?). Looking at the code I don't see what that something might be, so I'm not sure it's worth running them manually, especially since KV owns the failover
tests anyway.
Reviewed 1 of 1 files at r4, 22 of 22 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @arjunmahishi, @cthumuluru-crdb, @kyle-a-wong, and @nvanbenschoten)
pkg/rpc/nodedialer/nodedialer.go
line 342 at r5 (raw file):
return client } return &BatchStreamPoolClient{InternalClient: client, pool: pool}
Curious if you think this deserves to be pooled. It's not a large allocation, but it is an allocation; at what point is the hassle of pooling it worth it? I think it should be easy to pool this one.
pkg/rpc/stream_pool.go
line 220 at r5 (raw file):
// Bind sets the gRPC connection and context for the streamPool. This must be // called before streamPool.Send.
// called once before streamPool.Send.
Code quote:
// called before streamPool.Send.
pkg/rpc/stream_pool_test.go
line 290 at r5 (raw file):
require.Nil(t, res.resp) require.Error(t, res.err) require.ErrorIs(t, res.err, context.Canceled)
And the stream should not be back in the pool, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing: In the rel note, do you want to mention the cluster setting to turn off this feature?
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @arjunmahishi, @cthumuluru-crdb, @kyle-a-wong, and @nvanbenschoten)
I cherry-picked the stream reuse PR cockroachdb#136648 so that both are more likely to evolve in unison, and because I anticipated "piggybacking" on top of it. This did not materialize, but the important integration point is now side by side and should lend itself well to a future change that adds stream reuse for drpc as well. This commit roughly encompasses the following: 1. we change `*rpc.Connection` to also maintain a `drpcpool.Conn`. It took me a while to grok how the `drpcpool.Pool` is architected. Essentially, its `Get()` method gives you a handle to a `drpcpool.Conn` that reflects a specific use case of the pool. When a client is created on it, it gets assigned an actual `drpc.Conn` from the pool (dialing if necessary), and when the client is closed, this conn is returned to the pool. So in a sense, `drpcpool.Conn` is an actual pool for some keyed use case; `drpcpool.Pool` is simply the bucket in which the actual connections get pooled, but they're never pulled from or returned to it directly. We don't currently use the key since we create a pool per remote node, and if we're not sharing TCP conns they all look the same to us anyway (i.e. there's no point in DefaultClass vs SystemClass). If we squint, a `drpcoool.Conn` parallels `*grpc.ClientConn` in the sense that you can "just" make multiple clients on top of it. Of course internally a `*grpc.ClientConn` multiplexes multiple clients over one HTTP2 connection, whereas a `drpcpool.Conn` represents multiple independent TCP connections. The lifecycle of these pools is currently completely broken, and I was honestly surprised TestTestServerDRPC passes! Future commits will need to clean this up to the point where we can at least run a representative benchmark.
I cherry-picked the stream reuse PR cockroachdb#136648 so that both are more likely to evolve in unison, and because I anticipated "piggybacking" on top of it. This did not materialize, but the important integration point is now side by side and should lend itself well to a future change that adds stream reuse for drpc as well. This commit roughly encompasses the following: 1. we change `*rpc.Connection` to also maintain a `drpcpool.Conn`. It took me a while to grok how the `drpcpool.Pool` is architected. Essentially, its `Get()` method gives you a handle to a `drpcpool.Conn` that reflects a specific use case of the pool. When a client is created on it, it gets assigned an actual `drpc.Conn` from the pool (dialing if necessary), and when the client is closed, this conn is returned to the pool. So in a sense, `drpcpool.Conn` is an actual pool for some keyed use case; `drpcpool.Pool` is simply the bucket in which the actual connections get pooled, but they're never pulled from or returned to it directly. We don't currently use the key since we create a pool per remote node, and if we're not sharing TCP conns they all look the same to us anyway (i.e. there's no point in DefaultClass vs SystemClass). If we squint, a `drpcoool.Conn` parallels `*grpc.ClientConn` in the sense that you can "just" make multiple clients on top of it. Of course internally a `*grpc.ClientConn` multiplexes multiple clients over one HTTP2 connection, whereas a `drpcpool.Conn` represents multiple independent TCP connections. The lifecycle of these pools is currently completely broken, and I was honestly surprised TestTestServerDRPC passes! Future commits will need to clean this up to the point where we can at least run a representative benchmark.
Closes cockroachdb#136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` Manual tests: ``` Running in a similar configuration to sysbench/oltp_read_write/nodes=3/cpu=8/conc=64, but with a benchmarking related cluster settings (before and after) to reduce variance. -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the rpc.batch_stream_pool.enabled cluster setting.
This commit rearranges the InternalClient implementations returned by Dialer.DialInternalClient to avoid multiple heap allocations. With gRPC stream pooling: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 30.7µs ± 6% 30.3µs ± 4% ~ (p=0.436 n=9+9) Sysbench/KV/1node_remote/oltp_read_only-10 743µs ± 3% 736µs ± 1% ~ (p=0.721 n=8+8) Sysbench/KV/1node_remote/oltp_read_write-10 1.50ms ± 3% 1.51ms ± 6% ~ (p=0.730 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 6.61kB ± 0% 6.58kB ± 0% -0.49% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 656kB ± 0% 656kB ± 0% ~ (p=0.743 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 883kB ± 0% 882kB ± 1% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 63.0 ± 0% 61.0 ± 0% -3.17% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 1.90k ± 0% 1.87k ± 0% -1.49% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_read_write-10 3.51k ± 0% 3.46k ± 0% -1.35% (p=0.000 n=9+8) ``` Without gRPC stream pooling: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_read_write-10 1.88ms ± 3% 1.83ms ± 2% -2.16% (p=0.029 n=10+10) Sysbench/KV/1node_remote/oltp_point_select-10 46.9µs ± 1% 46.7µs ± 1% -0.49% (p=0.022 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 976µs ± 2% 973µs ± 4% ~ (p=0.497 n=9+10) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 16.2kB ± 0% -0.06% (p=0.000 n=8+7) Sysbench/KV/1node_remote/oltp_read_only-10 788kB ± 0% 788kB ± 0% ~ (p=0.353 n=10+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 0% 1.11MB ± 0% ~ (p=0.436 n=10+10) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 209 ± 0% 208 ± 0% -0.48% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_write-10 7.07k ± 0% 7.04k ± 0% -0.36% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.95k ± 0% 3.94k ± 0% -0.35% (p=0.008 n=6+7) ``` Epic: None Release note: None
4596eb9
to
7f4ffeb
Compare
This will help prototype drpc stream pooling.
7f4ffeb
to
db408d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Are there any roachtests we think might find new unexpected behavior? For example, anything that has to do with DistSQL circuit breaking or failover (
failover/*
?). Looking at the code I don't see what that something might be, so I'm not sure it's worth running them manually, especially since KV owns thefailover
tests anyway.
This is a good question. I'm not aware of any new failure modes with this, because we were already just laying unary RPCs over long-lived gRPC connections, which are the actual resource that can fail. I'll keep an eye on the failover
tests over the next few weeks (which I've been doing anyway for leader leases).
One more thing: In the rel note, do you want to mention the cluster setting to turn off this feature?
Done.
The "rpc: make batch stream pool general over Conn" commit from here might be something you could cherry-pick into your grpc stream pooling PR, btw. Especially if you're going to make any code changes in that PR I'd appreciate that.
(from Slack)
Done.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @arjunmahishi, @cthumuluru-crdb, @kyle-a-wong, and @tbg)
pkg/rpc/stream_pool.go
line 220 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
// called once before streamPool.Send.
Done.
pkg/rpc/nodedialer/nodedialer.go
line 342 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Curious if you think this deserves to be pooled. It's not a large allocation, but it is an allocation; at what point is the hassle of pooling it worth it? I think it should be easy to pool this one.
Done in a new commit using a similar approach to what @RaduBerinde suggested in #136941.
I'll let you give that commit a review before merging.
pkg/rpc/stream_pool_test.go
line 290 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
And the stream should not be back in the pool, right?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r6, 2 of 2 files at r7, 3 of 3 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @arjunmahishi, @cthumuluru-crdb, @kyle-a-wong, and @nvanbenschoten)
-- commits
line 118 at r7:
nice that this shows up!
TFTR! bors r=tbg |
136648: rpc: reuse gRPC streams across unary BatchRequest RPCs r=tbg a=nvanbenschoten Closes #136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. ### Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` ### Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` ### Manual tests: Running in a similar configuration to `sysbench/oltp_read_write/nodes=3/cpu=8/conc=64`, but with a benchmarking related cluster settings (before and after) to reduce variance. ``` -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` ---- Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the `rpc.batch_stream_pool.enabled` cluster setting. 137059: catalog/lease: deflake TestDescriptorRefreshOnRetry r=rafiss a=rafiss The test was flaky since the background thread to refresh leases could run and cause the acquisition counts to be off. fixes #137033 Release note: None 137067: roachtest: update mt-upgrade test owner to db-server r=rimadeodhar a=rimadeodhar This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to `t-db-server` for triage. Epic: none Release note: none Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: rimadeodhar <rima@cockroachlabs.com>
Build failed (retrying...): |
I cherry-picked the stream reuse PR cockroachdb#136648 so that both are more likely to evolve in unison, and because I anticipated "piggybacking" on top of it. This did not materialize, but the important integration point is now side by side and should lend itself well to a future change that adds stream reuse for drpc as well. This commit roughly encompasses the following: 1. we change `*rpc.Connection` to also maintain a `drpcpool.Conn`. It took me a while to grok how the `drpcpool.Pool` is architected. Essentially, its `Get()` method gives you a handle to a `drpcpool.Conn` that reflects a specific use case of the pool. When a client is created on it, it gets assigned an actual `drpc.Conn` from the pool (dialing if necessary), and when the client is closed, this conn is returned to the pool. So in a sense, `drpcpool.Conn` is an actual pool for some keyed use case; `drpcpool.Pool` is simply the bucket in which the actual connections get pooled, but they're never pulled from or returned to it directly. We don't currently use the key since we create a pool per remote node, and if we're not sharing TCP conns they all look the same to us anyway (i.e. there's no point in DefaultClass vs SystemClass). If we squint, a `drpcoool.Conn` parallels `*grpc.ClientConn` in the sense that you can "just" make multiple clients on top of it. Of course internally a `*grpc.ClientConn` multiplexes multiple clients over one HTTP2 connection, whereas a `drpcpool.Conn` represents multiple independent TCP connections. The lifecycle of these pools is currently completely broken, and I was honestly surprised TestTestServerDRPC passes! Future commits will need to clean this up to the point where we can at least run a representative benchmark.
I cherry-picked the stream reuse PR cockroachdb#136648 so that both are more likely to evolve in unison, and because I anticipated "piggybacking" on top of it. This did not materialize, but the important integration point is now side by side and should lend itself well to a future change that adds stream reuse for drpc as well. This commit roughly encompasses the following: 1. we change `*rpc.Connection` to also maintain a `drpcpool.Conn`. It took me a while to grok how the `drpcpool.Pool` is architected. Essentially, its `Get()` method gives you a handle to a `drpcpool.Conn` that reflects a specific use case of the pool. When a client is created on it, it gets assigned an actual `drpc.Conn` from the pool (dialing if necessary), and when the client is closed, this conn is returned to the pool. So in a sense, `drpcpool.Conn` is an actual pool for some keyed use case; `drpcpool.Pool` is simply the bucket in which the actual connections get pooled, but they're never pulled from or returned to it directly. We don't currently use the key since we create a pool per remote node, and if we're not sharing TCP conns they all look the same to us anyway (i.e. there's no point in DefaultClass vs SystemClass). If we squint, a `drpcoool.Conn` parallels `*grpc.ClientConn` in the sense that you can "just" make multiple clients on top of it. Of course internally a `*grpc.ClientConn` multiplexes multiple clients over one HTTP2 connection, whereas a `drpcpool.Conn` represents multiple independent TCP connections. The lifecycle of these pools is currently completely broken, and I was honestly surprised TestTestServerDRPC passes! Future commits will need to clean this up to the point where we can at least run a representative benchmark.
Closes #136572.
This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke).
The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand.
The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is.
Microbenchmarks:
Roachtests:
Manual tests:
Running in a similar configuration to
sysbench/oltp_read_write/nodes=3/cpu=8/conc=64
, but with a benchmarking related cluster settings (before and after) to reduce variance.Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the
rpc.batch_stream_pool.enabled
cluster setting.