Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balancer/endpointsharding: Call ExitIdle() on child if child reports IDLE #7782

Merged
merged 3 commits into from
Oct 29, 2024

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Oct 25, 2024

This PR adds the functionality of calling ExitIdle on any child that reports IDLE, to automatically retrigger the child to try and start connecting.

In the future, we could make this configurable through a config (which we will also need to add), but for all the use cases of this endpoint sharding policy the desired behavior is to trigger ExitIdle on the child so we can leave default for now.

This is because petiole policies used to be able to call Connect() on SubConns, but now that pick_first is the universal leaf, that will be the only balancer that will contain the logic for SubConn management, so petioles have no way of triggering this reconnection, as previously present in the system.

RELEASE NOTES:

  • balancer/endpointsharding: Call ExitIdle() on child if child reports IDLE

@zasweq zasweq requested a review from dfawley October 25, 2024 20:05
@zasweq zasweq added this to the 1.69 Release milestone Oct 25, 2024
@zasweq zasweq added the Type: Behavior Change Behavior changes not categorized as bugs label Oct 25, 2024
@zasweq zasweq changed the title balancer/endpointsharding: Call ExitIdle on child if child reports IDLE balancer/endpointsharding: Call ExitIdle() on child if child reports IDLE Oct 25, 2024
Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 9 lines in your changes missing coverage. Please review.

Please upload report for BASE (master@192ee33). Learn more about missing BASE report.
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
balancer/endpointsharding/endpointsharding.go 50.00% 8 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master    #7782   +/-   ##
=========================================
  Coverage          ?   81.81%           
=========================================
  Files             ?      374           
  Lines             ?    38166           
  Branches          ?        0           
=========================================
  Hits              ?    31224           
  Misses            ?     5675           
  Partials          ?     1267           
Files with missing lines Coverage Δ
balancer/endpointsharding/endpointsharding.go 63.03% <50.00%> (ø)

@@ -123,6 +124,7 @@ func (es *endpointSharding) UpdateClientConnState(state balancer.ClientConnState
bal.Balancer = gracefulswitch.NewBalancer(bal, es.bOpts)
}
newChildren.Set(endpoint, bal)
es.childMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely take this once at the stop and defer the unlock. The only things that happen concurrently are:

This & Close (guaranteed to be non-concurrent) & the ExitIdle goroutine, which can be concurrent but it should be safe for that to block all of UpdateCCS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, same thing can apply to Close, too, I'm pretty sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolverError can also race with ExitIdle (all three balancer.Balancer operations downward can race). Grabbed mutex for the whole operation and added close guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I do defer an updateState() call which will now update upward with mutex held but since you mentioned can assume can't call back inline I'll go ahead and add it.

if ei, ok := bw.Balancer.(balancer.ExitIdler); state.ConnectivityState == connectivity.Idle && ok {
go func() {
bw.es.childMu.Lock()
ei.ExitIdle()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to avoid this racing with Close? Not worried about a deadlock or concurrency, but this sequence:

  • Child reports IDLE
  • This goroutine gets scheduled
  • Parent calls Close()
  • This goroutine runs and tells the child to ExitIdle() after Close()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added close guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I didn't even think of this constraint of the balancer.Balancer API of nothing called after Close(). So I guess ExitIdle even though a separate interface is the part of same API.

I did an grpcsync.Event, but let me know if you would rather have a bool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bool here makes more sense since you already have a lock. No need for 2 layers of synchronization.

@dfawley dfawley assigned zasweq and unassigned dfawley Oct 28, 2024
@zasweq zasweq assigned dfawley and unassigned zasweq Oct 28, 2024
@zasweq zasweq force-pushed the trigger-exit-idle-in-endpoint-sharding branch from 5b89213 to 75b2737 Compare October 29, 2024 00:01
@dfawley dfawley assigned zasweq and unassigned dfawley Oct 29, 2024
@zasweq zasweq merged commit d66fc3a into grpc:master Oct 29, 2024
15 checks passed
misvivek pushed a commit to misvivek/grpc-go that referenced this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Behavior Change Behavior changes not categorized as bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants