Keep LB policy alive during high freq of resolver updates #10645

dgquintas · 2017-04-13T22:43:02Z

Trigger:
Resolvers providing very frequent updates, O(ms), recreate lb policies at that same rate.

Current state:
Older policies may have received requests for picks. With no time to make progress on those, the application-provided callback is put in the policy's "pending picks" list. When the new policy arrives shortly thereafter, the old policy's single reference count is unref'd, triggering a policy shutdown. The application callback is invoked with error as part of the policy's shutdown process for pending picks.

Solution:
Pending application picks should increase the reference count of the policy they are requesting the pick from. The corresponding unref should happen right before scheduling the application callback. In other words, an application's pending pick must keep its associated lb policy alive.

grpc-kokoro · 2017-04-14T00:26:23Z

Performance differences noted:
Benchmark                                                                cpu_time    real_time
---------------------------------------------------------------------  ----------  -----------
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<3, false>>/0/16k       +9.00        +9.00

markdroth

Comments are all minor.

markdroth · 2017-04-14T14:16:51Z

src/core/ext/filters/client_channel/client_channel.c

+  GPR_ASSERT(wc_arg->wrapped_closure != NULL);
+  GPR_ASSERT(wc_arg->lb_policy != NULL);
+  GPR_ASSERT(wc_arg->free_when_done != NULL);
+  grpc_closure_sched(exec_ctx, wc_arg->wrapped_closure, GRPC_ERROR_REF(error));


I think we can use grpc_closure_run() here.

markdroth · 2017-04-14T14:20:32Z

src/core/ext/filters/client_channel/client_channel.c

+  grpc_lb_policy *lb_policy;
+
+  /* heap memory to be freed upon closure execution. Usually this arg. */
+  void *free_when_done;


Given that we're always setting this to the wrapped_on_pick_closure_arg struct, we probably don't need it. Instead, we can just have the callback unconditionally free the arg.

markdroth · 2017-04-14T14:25:51Z

src/core/ext/filters/client_channel/client_channel.c

-        /* mask= */ GRPC_INITIAL_METADATA_WAIT_FOR_READY,
-        /* check= */ 0, GRPC_ERROR_REF(error));
+  if (chand->lb_policy != NULL) {
+    if (state == GRPC_CHANNEL_TRANSIENT_FAILURE) {


Let's put a TODO here about improving how we handle failure cases. In particular:

At the moment, I suspect it's not possible for any LB policy to actually return TRANSIENT_FAILURE. We should probably fix that, which I think will require changing round_robin to be able to proactively unref and recreate subchannels for failed connections.

If we are replacing the LB policy, we should ideally re-submit any pending picks to the new LB policy instead of failing them directly.

grpc-kokoro · 2017-04-14T22:59:15Z

Performance differences noted:
Benchmark                                                    cpu_time    real_time
---------------------------------------------------------  ----------  -----------
BM_HpackParserParseHeader<NonIndexedBinaryElem<10, true>>      -33.00       -33.00
BM_IsolatedFilter<DummyFilter, SendEmptyMetadata>               -6.50        -6.50

dgquintas · 2017-04-15T17:28:33Z

Issues: #10614 #9542

Keep LB policy alive during high freq of resolver updates

956f700

dgquintas added lang/core area/core labels Apr 13, 2017

dgquintas requested a review from markdroth April 13, 2017 22:43

googlebot added the cla: yes label Apr 13, 2017

markdroth approved these changes Apr 14, 2017

View reviewed changes

Comments

3725128

dgquintas merged commit a8b8aea into grpc:master Apr 15, 2017

lock bot locked as resolved and limited conversation to collaborators Jan 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep LB policy alive during high freq of resolver updates #10645

Keep LB policy alive during high freq of resolver updates #10645

dgquintas commented Apr 13, 2017

grpc-kokoro commented Apr 14, 2017

markdroth left a comment

markdroth Apr 14, 2017

dgquintas Apr 14, 2017

markdroth Apr 14, 2017

dgquintas Apr 14, 2017

markdroth Apr 14, 2017

dgquintas Apr 14, 2017

grpc-kokoro commented Apr 14, 2017

dgquintas commented Apr 15, 2017

Keep LB policy alive during high freq of resolver updates #10645

Keep LB policy alive during high freq of resolver updates #10645

Conversation

dgquintas commented Apr 13, 2017

grpc-kokoro commented Apr 14, 2017

markdroth left a comment

Choose a reason for hiding this comment

markdroth Apr 14, 2017

Choose a reason for hiding this comment

dgquintas Apr 14, 2017

Choose a reason for hiding this comment

markdroth Apr 14, 2017

Choose a reason for hiding this comment

dgquintas Apr 14, 2017

Choose a reason for hiding this comment

markdroth Apr 14, 2017

Choose a reason for hiding this comment

dgquintas Apr 14, 2017

Choose a reason for hiding this comment

grpc-kokoro commented Apr 14, 2017

dgquintas commented Apr 15, 2017