Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep LB policy alive during high freq of resolver updates #10645

Merged

Conversation

dgquintas
Copy link
Contributor

Trigger:
Resolvers providing very frequent updates, O(ms), recreate lb policies at that same rate.

Current state:
Older policies may have received requests for picks. With no time to make progress on those, the application-provided callback is put in the policy's "pending picks" list. When the new policy arrives shortly thereafter, the old policy's single reference count is unref'd, triggering a policy shutdown. The application callback is invoked with error as part of the policy's shutdown process for pending picks.

Solution:
Pending application picks should increase the reference count of the policy they are requesting the pick from. The corresponding unref should happen right before scheduling the application callback. In other words, an application's pending pick must keep its associated lb policy alive.

@grpc-kokoro
Copy link

Performance differences noted:
Benchmark                                                                cpu_time    real_time
---------------------------------------------------------------------  ----------  -----------
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<3, false>>/0/16k       +9.00        +9.00

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are all minor.

GPR_ASSERT(wc_arg->wrapped_closure != NULL);
GPR_ASSERT(wc_arg->lb_policy != NULL);
GPR_ASSERT(wc_arg->free_when_done != NULL);
grpc_closure_sched(exec_ctx, wc_arg->wrapped_closure, GRPC_ERROR_REF(error));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use grpc_closure_run() here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

grpc_lb_policy *lb_policy;

/* heap memory to be freed upon closure execution. Usually this arg. */
void *free_when_done;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're always setting this to the wrapped_on_pick_closure_arg struct, we probably don't need it. Instead, we can just have the callback unconditionally free the arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/* mask= */ GRPC_INITIAL_METADATA_WAIT_FOR_READY,
/* check= */ 0, GRPC_ERROR_REF(error));
if (chand->lb_policy != NULL) {
if (state == GRPC_CHANNEL_TRANSIENT_FAILURE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put a TODO here about improving how we handle failure cases. In particular:

  1. At the moment, I suspect it's not possible for any LB policy to actually return TRANSIENT_FAILURE. We should probably fix that, which I think will require changing round_robin to be able to proactively unref and recreate subchannels for failed connections.

  2. If we are replacing the LB policy, we should ideally re-submit any pending picks to the new LB policy instead of failing them directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@grpc-kokoro
Copy link

Performance differences noted:
Benchmark                                                    cpu_time    real_time
---------------------------------------------------------  ----------  -----------
BM_HpackParserParseHeader<NonIndexedBinaryElem<10, true>>      -33.00       -33.00
BM_IsolatedFilter<DummyFilter, SendEmptyMetadata>               -6.50        -6.50

@dgquintas
Copy link
Contributor Author

Issues: #10614 #9542

@dgquintas dgquintas merged commit a8b8aea into grpc:master Apr 15, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants