-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
balancer/endpointsharding: Call ExitIdle() on child if child reports IDLE #7782
balancer/endpointsharding: Call ExitIdle() on child if child reports IDLE #7782
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7782 +/- ##
=========================================
Coverage ? 81.81%
=========================================
Files ? 374
Lines ? 38166
Branches ? 0
=========================================
Hits ? 31224
Misses ? 5675
Partials ? 1267
|
@@ -123,6 +124,7 @@ func (es *endpointSharding) UpdateClientConnState(state balancer.ClientConnState | |||
bal.Balancer = gracefulswitch.NewBalancer(bal, es.bOpts) | |||
} | |||
newChildren.Set(endpoint, bal) | |||
es.childMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can safely take this once at the stop and defer the unlock. The only things that happen concurrently are:
This & Close (guaranteed to be non-concurrent) & the ExitIdle goroutine, which can be concurrent but it should be safe for that to block all of UpdateCCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, same thing can apply to Close
, too, I'm pretty sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResolverError can also race with ExitIdle (all three balancer.Balancer operations downward can race). Grabbed mutex for the whole operation and added close guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I do defer an updateState() call which will now update upward with mutex held but since you mentioned can assume can't call back inline I'll go ahead and add it.
if ei, ok := bw.Balancer.(balancer.ExitIdler); state.ConnectivityState == connectivity.Idle && ok { | ||
go func() { | ||
bw.es.childMu.Lock() | ||
ei.ExitIdle() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to avoid this racing with Close? Not worried about a deadlock or concurrency, but this sequence:
- Child reports IDLE
- This goroutine gets scheduled
- Parent calls
Close()
- This goroutine runs and tells the child to
ExitIdle()
afterClose()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added close guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I didn't even think of this constraint of the balancer.Balancer API of nothing called after Close(). So I guess ExitIdle even though a separate interface is the part of same API.
I did an grpcsync.Event, but let me know if you would rather have a bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bool here makes more sense since you already have a lock. No need for 2 layers of synchronization.
5b89213
to
75b2737
Compare
This PR adds the functionality of calling ExitIdle on any child that reports IDLE, to automatically retrigger the child to try and start connecting.
In the future, we could make this configurable through a config (which we will also need to add), but for all the use cases of this endpoint sharding policy the desired behavior is to trigger ExitIdle on the child so we can leave default for now.
This is because petiole policies used to be able to call Connect() on SubConns, but now that pick_first is the universal leaf, that will be the only balancer that will contain the logic for SubConn management, so petioles have no way of triggering this reconnection, as previously present in the system.
RELEASE NOTES: