-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: reuse gRPC streams across unary BatchRequest invocations (~ 11.2% cpu) #136572
Labels
branch-master
Failures and bugs on the master branch.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
o-perf-efficiency
Related to performance efficiency
P-1
Issues/test failures with a fix SLA of 1 month
Comments
tbg
added
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
branch-master
Failures and bugs on the master branch.
P-1
Issues/test failures with a fix SLA of 1 month
o-perf-efficiency
Related to performance efficiency
labels
Dec 3, 2024
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 4, 2024
Informs cockroachdb#136572. This is broken in at least all of the following ways: * context cancellation is not respected * pooled streams are never closed when idle * pooled streams don't interact properly with circuit breakers Still, it works well enough to demonstrate the potential performance benefits of reusing streams instead of creating a new stream for each BatchRequest RPC, which is what gRPC does under the hood of a unary RPC. ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ```
ajstorm
changed the title
rpc: reuse gRPC streams across unary BatchRequest invocations
rpc: reuse gRPC streams across unary BatchRequest invocations (~ 11.2% cpu)
Dec 4, 2024
tbg
pushed a commit
to tbg/cockroach
that referenced
this issue
Dec 6, 2024
Informs cockroachdb#136572. This is broken in at least all of the following ways: * context cancellation is not respected * pooled streams are never closed when idle * pooled streams don't interact properly with circuit breakers Still, it works well enough to demonstrate the potential performance benefits of reusing streams instead of creating a new stream for each BatchRequest RPC, which is what gRPC does under the hood of a unary RPC. ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ```
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 7, 2024
Informs cockroachdb#136572. This is broken in at least all of the following ways: * context cancellation is not respected * pooled streams are never closed when idle * pooled streams don't interact properly with circuit breakers Still, it works well enough to demonstrate the potential performance benefits of reusing streams instead of creating a new stream for each BatchRequest RPC, which is what gRPC does under the hood of a unary RPC. ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ```
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 7, 2024
Closes cockroachdb#136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` Manual tests: ``` Running in a similar configuration to sysbench/oltp_read_write/nodes=3/cpu=8/conc=64, but with a benchmarking related cluster settings (before and after) to reduce variance. -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 7, 2024
Closes cockroachdb#136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` Manual tests: ``` Running in a similar configuration to sysbench/oltp_read_write/nodes=3/cpu=8/conc=64, but with a benchmarking related cluster settings (before and after) to reduce variance. -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 7, 2024
Closes cockroachdb#136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` Manual tests: ``` Running in a similar configuration to sysbench/oltp_read_write/nodes=3/cpu=8/conc=64, but with a benchmarking related cluster settings (before and after) to reduce variance. -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 7, 2024
Closes cockroachdb#136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` Manual tests: ``` Running in a similar configuration to sysbench/oltp_read_write/nodes=3/cpu=8/conc=64, but with a benchmarking related cluster settings (before and after) to reduce variance. -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 7, 2024
Closes cockroachdb#136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` Manual tests: ``` Running in a similar configuration to sysbench/oltp_read_write/nodes=3/cpu=8/conc=64, but with a benchmarking related cluster settings (before and after) to reduce variance. -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access.
craig bot
pushed a commit
that referenced
this issue
Dec 10, 2024
136648: rpc: reuse gRPC streams across unary BatchRequest RPCs r=tbg a=nvanbenschoten Closes #136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. ### Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` ### Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` ### Manual tests: Running in a similar configuration to `sysbench/oltp_read_write/nodes=3/cpu=8/conc=64`, but with a benchmarking related cluster settings (before and after) to reduce variance. ``` -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` ---- Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the `rpc.batch_stream_pool.enabled` cluster setting. 137059: catalog/lease: deflake TestDescriptorRefreshOnRetry r=rafiss a=rafiss The test was flaky since the background thread to refresh leases could run and cause the acquisition counts to be off. fixes #137033 Release note: None 137067: roachtest: update mt-upgrade test owner to db-server r=rimadeodhar a=rimadeodhar This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to `t-db-server` for triage. Epic: none Release note: none Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: rimadeodhar <rima@cockroachlabs.com>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 10, 2024
Closes cockroachdb#136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` Manual tests: ``` Running in a similar configuration to sysbench/oltp_read_write/nodes=3/cpu=8/conc=64, but with a benchmarking related cluster settings (before and after) to reduce variance. -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the rpc.batch_stream_pool.enabled cluster setting.
craig bot
pushed a commit
that referenced
this issue
Dec 10, 2024
136258: kvserver: add TestFlowControlSendQueueRangeSplitMerge test r=sumeerbhola a=kvoli Add a new rac2 flow control integration test, `TestFlowControlSendQueueRangeSplitMerge`. This test takes the following steps: ```sql -- We will exhaust the tokens across all streams while admission is blocked on -- n3, using a single 4 MiB (deduction, the write itself is small) write. Then, -- we will write a 1 MiB put to the range, split it, write a 1 MiB put to the -- LHS range, merge the ranges, and write a 1 MiB put to the merged range. We -- expect that at each stage where a send queue develops n1->s3, the send queue -- will be flushed by the range merge and range split range operations.``sql ``` Note that the RHS is not written to post-split, pre-merge. See the relevant comments, this will be resolved via #136649, or some variation, which enforces the timely replication on subsume requests. Part of: #132614 Release note: None 136648: rpc: reuse gRPC streams across unary BatchRequest RPCs r=tbg a=nvanbenschoten Closes #136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. ### Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` ### Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` ### Manual tests: Running in a similar configuration to `sysbench/oltp_read_write/nodes=3/cpu=8/conc=64`, but with a benchmarking related cluster settings (before and after) to reduce variance. ``` -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` ---- Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the `rpc.batch_stream_pool.enabled` cluster setting. 137019: roachtest: increase the token return time with disk bandwidth limit r=kvoli a=andrewbaptist Previously the test would wait 10m for tokens to be returned. Without the disk bandwidth limit set, they typically return almost immediately but with a limit they can take ~30m to return in some cases even after the workload is stopped and the system is idle. This change fixes some of the perturbation/metamorphic/* tests that are hitting this slow token return. Epic: none Fixes: #136982 Fixes: #136553 Informs: #137017 Release note: None 137044: kvserver: deflake TestConsistencyQueueRecomputeStats r=miraradeva a=miraradeva The test manually adds voters and expects a leaseholder to be established before forcing a stats re-computation (which runs on the leaseholder). With leader leases, it might take an extra election timeout for the leader lease to be established after adding the new voters, so the test flaked if the re-computation ran (and failed) before the leaseholder was ready. This commit makes the test retry the re-computation until a leasholder is established. Fixes: #136596 Release note: None 137059: catalog/lease: deflake TestDescriptorRefreshOnRetry r=rafiss a=rafiss The test was flaky since the background thread to refresh leases could run and cause the acquisition counts to be off. fixes #137033 Release note: None 137099: kvcoord: deflake TestDistSenderReplicaStall r=miraradeva a=miraradeva The test runs with expiration leases but when fortification is enabled the lease doesn't move off of the stalled replica because the deadlocked leader doesn't step down while it's receiving store liveness support. This commit ensures fortification is off when expiration leases are used for the test. Fixes: #136564 Release note: None 137118: crosscluster/logical: update udf test to expect at-least-once r=dt a=dt We don't provide exactly once so we don't want to test for it. Release note: none. Epic: none. Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com> Co-authored-by: Mira Radeva <mira@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
branch-master
Failures and bugs on the master branch.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
o-perf-efficiency
Related to performance efficiency
P-1
Issues/test failures with a fix SLA of 1 month
There could be a sizeable win here in light of the results in #136558.
We should prototype this approach, measure the improvement, and decide whether this is something we can do in production.
Epic: CRDB-42584
Jira issue: CRDB-45138
The text was updated successfully, but these errors were encountered: