Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: consider adopting drpc (12% improvement on sysbench oltp_read_write) #137252

Open
17 tasks
tbg opened this issue Dec 11, 2024 · 0 comments
Open
17 tasks

rpc: consider adopting drpc (12% improvement on sysbench oltp_read_write) #137252

tbg opened this issue Dec 11, 2024 · 0 comments
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months

Comments

@tbg
Copy link
Member

tbg commented Dec 11, 2024

In the wake of disappointing gRPC performance (#136278 (comment)), we've started looking at alternatives.

drpc performed much better than grpc across the board in microbenchmarks1.

We then prototyped drpc for our critical BatchRequest-BatchResponse path in #136794. This prototype does not port the gRPC interceptors and custom codec to drpc (so we don't authorize drpc requests or use snappy compression), so the drpc numbers should be slightly discounted. In light of the definite win in the microbenchmarks though, it is likely that the comparison is generally valid.

The screenshot below compares the performance of grpc vs drpc on CockroachDB running ten-minute 3x8vcpu sysbench oltp_read_write workloads (100m rows total across 10 tables, with a number of cluster settings tuned to reduce variance).

grpc-pool is a recent significant improvement gRPC performance improvement achieved by re-using streams (#136648) and is now CockroachDB's default. drpc uses drpc instead of grpc but no stream pooling; drpc-pool then adopts the optimization in #136648 as well. We can see that adopting drpc (including the streaming optimization) can be expected to boost qps by up to 12% on this workload.

image

This is a very significant improvement and so we should seriously consider moving off gRPC to a more performant option, drpc being the current lead candidate. Despite drpc showing promise, we should also evaluate other options (frpc seemed "too alpha" and not as great a fit, connect tries hard to be a drop-in replacement for grpc but it doesn't seem focused on performance either, forking grpc would likely be more work than extending drpc as needed).

DRPC

drpc is a project by storj but in practice appears to be primarily driven by a single author, meaning there's a much less robust community behind it (vs grpc). It's <4000 lines of code in an approachable, high-quality codebase, with only infrequent changes. There isn't much downside risk to adopting this codebase, except if we find that we need features that add significant complexity. One notable feature is the absence of TCP connection sharing in drpc (i.e. can't multiplex multiple clients over a single TCP conn). I'm not sure this is actually a downside or an immediate deal-breaker, but it is certainly something we have to evaluate as it would lead to many more TCP connections being held active by any individual node.

The prototype hosts drpc and grpc side by side, so there is a clear migration path.

We'd need to find out what to do about grpc-gateway - likely we would want to remove it; drpc has a simple http-serving facility as well, see https://github.com/storj/drpc/blob/main/examples/drpc_and_http/server/main.go.

List of known limitations in the prototype/items that would be required to remove grpc:

  • codegen is manual; need to properly implement drpc codegen into our bazel setup (current generated code was copy-pasted and edited from rpctoy; see api_drpc_hacky.go in the prototype PR)
  • drpc only registers BatchRequest server (should be straightforward to fix after codegen)
  • extend drpc's gogoproto codec with MarshalAppend as done in https://github.com/cockroachlabs/perf-efficiency-team/pull/36#issuecomment-2547916166.
  • drpc conn pool needs to scale down more smartly.
  • drpc not fully integrated in rpc heartbeats; these currently use grpc always. (Need to make it dependent on cluster setting, i.e. achieve parity between the two).
  • need to set tlsConfig.InsecureSkipVerify; check that this is intentional (see TestTestClusterDRPC and (*peer).dial)
  • grpc server interceptors not implemented for drpc server2
    • need to switch to drpc's metadata mechanism3
  • grpc client interceptors not implemented (need to wrap the created clients directly)
  • port the existing metrics (including some cross-region metrics; maybe not all can be ported verbatim).
  • migration path to remove grpc-gateway
  • drpc connections are not torn down on client write failures4; we may need to improve this for failure detection
  • need to evaluate whether drpc's one-tcp-conn-per-stream model is acceptable or whether conn sharing needs to be implemented
  • there's likely a bit of annoying work around grpc errors and error codes, i.e. what are the drpc corresponding versions of this and maybe this will change too; need to look in more detail but it's expected that something will break around rpc error detection since a lot of this was customized for what we saw with grpc.
  • the failover roachtest suite should not regress.
  • add a cluster setting that disables the grpc server for testing
  • the analysis has focused on microbenchmarks (across various payload sizes) and sysbench-on-CRDB (which has fairly small payloads). A wider round of benchmarking is advisable.

Epic: CRDB-42584

Jira issue: CRDB-45491

Footnotes

  1. https://github.com/cockroachlabs/perf-efficiency-team/pull/19#issue-2716024819

  2. see https://github.com/cockroachdb/cockroach/blob/fb2b661c8a15215c330b079a0a8e421184be29da/pkg/rpc/context.go#L94

  3. https://github.com/storj/drpc/blob/0075ac8716613e0869c41714a92cdaacf761416d/drpcmetadata/README.md

  4. https://github.com/cockroachlabs/perf-efficiency-team/blob/100ca12e298d6b92d062005bba8cdeecb212d65a/rpctoy/drpc_test.go#L102-L118; we can wrap the conn into something that closes the underlying tcp conn when RPCs return in errors, so this could be fixed without a fork.

@tbg tbg added C-performance Perf of queries or internals. Solution not expected to change functional behavior. branch-master Failures and bugs on the master branch. P-2 Issues/test failures with a fix SLA of 3 months o-perf-efficiency Related to performance efficiency labels Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months
Projects
None yet
Development

No branches or pull requests

1 participant