rpc: consider adopting drpc (12% improvement on sysbench oltp_read_write) #137252
Labels
branch-master
Failures and bugs on the master branch.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
o-perf-efficiency
Related to performance efficiency
P-2
Issues/test failures with a fix SLA of 3 months
In the wake of disappointing gRPC performance (#136278 (comment)), we've started looking at alternatives.
drpc performed much better than grpc across the board in microbenchmarks1.
We then prototyped drpc for our critical BatchRequest-BatchResponse path in #136794. This prototype does not port the gRPC interceptors and custom codec to drpc (so we don't authorize drpc requests or use snappy compression), so the drpc numbers should be slightly discounted. In light of the definite win in the microbenchmarks though, it is likely that the comparison is generally valid.
The screenshot below compares the performance of grpc vs drpc on CockroachDB running ten-minute 3x8vcpu sysbench oltp_read_write workloads (100m rows total across 10 tables, with a number of cluster settings tuned to reduce variance).
grpc-pool
is a recent significant improvement gRPC performance improvement achieved by re-using streams (#136648) and is now CockroachDB's default.drpc
usesdrpc
instead ofgrpc
but no stream pooling;drpc-pool
then adopts the optimization in #136648 as well. We can see that adopting drpc (including the streaming optimization) can be expected to boost qps by up to 12% on this workload.This is a very significant improvement and so we should seriously consider moving off gRPC to a more performant option, drpc being the current lead candidate. Despite drpc showing promise, we should also evaluate other options (frpc seemed "too alpha" and not as great a fit, connect tries hard to be a drop-in replacement for grpc but it doesn't seem focused on performance either, forking grpc would likely be more work than extending drpc as needed).
DRPC
drpc is a project by storj but in practice appears to be primarily driven by a single author, meaning there's a much less robust community behind it (vs grpc). It's <4000 lines of code in an approachable, high-quality codebase, with only infrequent changes. There isn't much downside risk to adopting this codebase, except if we find that we need features that add significant complexity. One notable feature is the absence of TCP connection sharing in drpc (i.e. can't multiplex multiple clients over a single TCP conn). I'm not sure this is actually a downside or an immediate deal-breaker, but it is certainly something we have to evaluate as it would lead to many more TCP connections being held active by any individual node.
The prototype hosts drpc and grpc side by side, so there is a clear migration path.
We'd need to find out what to do about grpc-gateway - likely we would want to remove it; drpc has a simple http-serving facility as well, see https://github.com/storj/drpc/blob/main/examples/drpc_and_http/server/main.go.
List of known limitations in the prototype/items that would be required to remove grpc:
api_drpc_hacky.go
in the prototype PR)MarshalAppend
as done in https://github.com/cockroachlabs/perf-efficiency-team/pull/36#issuecomment-2547916166.tlsConfig.InsecureSkipVerify
; check that this is intentional (seeTestTestClusterDRPC
and(*peer).dial
)Epic: CRDB-42584
Jira issue: CRDB-45491
Footnotes
https://github.com/cockroachlabs/perf-efficiency-team/pull/19#issue-2716024819 ↩
see https://github.com/cockroachdb/cockroach/blob/fb2b661c8a15215c330b079a0a8e421184be29da/pkg/rpc/context.go#L94 ↩
https://github.com/storj/drpc/blob/0075ac8716613e0869c41714a92cdaacf761416d/drpcmetadata/README.md ↩
https://github.com/cockroachlabs/perf-efficiency-team/blob/100ca12e298d6b92d062005bba8cdeecb212d65a/rpctoy/drpc_test.go#L102-L118; we can wrap the conn into something that closes the underlying tcp conn when RPCs return in errors, so this could be fixed without a fork. ↩
The text was updated successfully, but these errors were encountered: