balancer/pickfirst: Add pick first metrics #7839

zasweq · 2024-11-13T22:45:20Z

This PR adds pick first metrics according to A78, and tests as well.

RELEASE NOTES:

balancer/pickfirst: Emit Metrics from pick_first load balancing policy

codecov · 2024-11-14T02:56:21Z

Codecov Report

Attention: Patch coverage is 70.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 81.93%. Comparing base (7d53957) to head (f2d97e7).
Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go	66.66%	5 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7839   +/-   ##
=======================================
  Coverage   81.92%   81.93%           
=======================================
  Files         375      375           
  Lines       37979    38007   +28     
=======================================
+ Hits        31114    31140   +26     
- Misses       5572     5575    +3     
+ Partials     1293     1292    -1

Files with missing lines	Coverage Δ
internal/testutils/stats/test_metrics_recorder.go	`76.42% <100.00%> (+1.06%)`	⬆️
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go	`87.88% <66.66%> (-0.96%)`	⬇️

... and 22 files with indirect coverage changes

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

dfawley · 2024-11-19T22:21:08Z

Would you mind reviewing this @arjan-bal?

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

zasweq · 2024-11-20T02:48:16Z

Thanks for the pass Easwar, I did this work with Doug who knows a lot more about the state transitions so I want to hear your input on Easwar's concerns @dfawley.

dfawley · 2024-11-20T16:15:58Z

I think it would be better for you to reply to them first @zasweq.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

balancer/pickfirst/pickfirstleaf/metrics_test.go

easwars · 2024-11-20T19:42:01Z

balancer/pickfirst/pickfirstleaf/metrics_test.go

+	}
+
+	ss.Stop()
+	if err = pollForDisconnectedMetrics(ctx, tmr); err != nil {


There is a single caller for this. Prefer getting rid of the function and inlining it here.

The reason I had this in a separate function is I would have to do a bool local.

Switched to that. Let me know what you think.

You don't need a boolean. If you change the body of the for to return when you see grpc.lb.pick_first.disconnections (instead of breaking), then the only case where you will execute code below the for would be when the context expires. And therefore you can check for ctx.Err() != nil instead of checking for the boolean.

If I return early I don't even need the ctx.Err check right? The only reason it would hit that would be for the ctx to expire without having found disconnection/returned right?

So my failure mode is an uncondtional t.Fatalf("timeout waiting for grpc.lb.pick_first.disconnections metric").

Actually, with Arjan's point about await state I can use that to poll/sync and then the metrics test can be deterministic one time check. Thanks Arjan for suggestion.

balancer/pickfirst/pickfirstleaf/metrics_test.go

easwars · 2024-11-20T19:47:08Z

balancer/pickfirst/pickfirstleaf/metrics_test.go

I think we should have tests for happy eyeballs cases. Did you consider enhancing existing tests to check for metric values at the end of the test? Going down that path would cover a lot of scenarios instead of very simple ones being tested here.

I never considered that. Would you rather me write my own happy eyeballs or scale up the existing? I know there was lots of previous contention about adding too many assertions to tests.

That's a good idea. I would suggest adding assertions to the same test because asserting both behaviours in the same test would give a better indication that the features work together. If this severely hampers readability, we can have separate tests.

Agree with @arjan-bal.

I'm OK with it being a separate PR as well.

Scaled up two basic happy eyeballs tests (in a separate commit).

Note that there is a limitation in our test metrics recorder that it only persist the most recently emitted metric value for a given metric name. The reason for this is vast non determinism in other unit tests that use this component, where the most recent was deterministic but not the total metric. It's too far down the rabbit hole now to change I think, but let me know if y'all feel strongly about trying to add a summation assertion/if you want me to add metrics assertions to more happy eyeballs tests/what you think about the additional assertions in general.

I'll be happy to add more, remove the commit, leave as is, etc.

I added a test to the other test that test with a ClientConn, so I think that should be good enough outside the summation thing, which I don't think should be addressed in this PR if at all unless found assertions inadequate.

I saw other tests test interleaving addresses by sending TF, but figured this coverage should be good enough. Thanks for suggestion.

balancer/pickfirst/pickfirstleaf/metrics_test.go

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

zasweq · 2024-11-20T23:40:33Z

Thanks for the comments y'all. Got to all comments (split out implementation and testing changes in separate commits).

dfawley · 2024-11-20T23:55:18Z

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

@@ -57,7 +58,28 @@ var (
 	// Name is the name of the pick_first_leaf balancer.
 	// It is changed to "pick_first" in init() if this balancer is to be
 	// registered as the default pickfirst.
-	Name = "pick_first_leaf"
+	Name                 = "pick_first_leaf"
+	disconnectionsMetric = estats.RegisterInt64Count(estats.MetricDescriptor{


This one is suffixed with Metric and the others are not. Let's be consistent.

Good point. Done. The RLS/WRR metrics are also suffixed with Metric, just checked.

dfawley · 2024-11-20T23:55:58Z

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

-		cc:                    cc,
+		cc:              cc,
+		target:          bo.Target.String(),
+		metricsRecorder: bo.MetricsRecorder, // ClientConn will always create a Metrics Recorder so guaranteed to be non nil.


This comment seems like it's unnecessary or in the wrong place? It's not being dereferenced here.

This was Easwar's suggestion after our back and forth: #7839 (comment). What do you think about this. I'll drop "so guaranteed to be non nil" from this one because it's documented on the field.

dfawley · 2024-11-21T00:02:02Z

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

@@ -575,6 +607,12 @@ func (b *pickfirstBalancer) updateSubConnState(sd *scData, newState balancer.Sub
 		// the first address when the picker is used.
 		b.shutdownRemainingLocked(sd)
 		b.state = connectivity.Idle
+		// READY SubConn interspliced in between CONNECTING and IDLE, need to
+		// account for that.
+		if oldState == connectivity.Connecting && newState.ConnectivityState == connectivity.Idle {


+1 to something like:

if oldState == Connecting { // A known issue causes a race that prevents the READY state change notification. This works around it.

Also it would be great if we could create an issue for that problem and link it here so we know to come back and delete this one it's resolved.

Removed second conditional. Created a Github issue and linked to it.

balancer/pickfirst/pickfirstleaf/metrics_test.go

arjan-bal · 2024-11-21T15:02:06Z

balancer/pickfirst/pickfirstleaf/metrics_test.go

That's a good idea. I would suggest adding assertions to the same test because asserting both behaviours in the same test would give a better indication that the features work together. If this severely hampers readability, we can have separate tests.

zasweq · 2024-11-22T23:10:36Z

Thanks for the passes. Got to all comments and scaled up two very basic happy eyeballs tests. Let me know what you think.

balancer/pickfirst/pickfirstleaf/pickfirstleaf_ext_test.go

zasweq · 2024-11-25T16:57:27Z

Thanks for the comments. Great suggestions.

arjan-bal

LGTM

zasweq requested a review from dfawley November 13, 2024 22:45

zasweq assigned dfawley Nov 13, 2024

zasweq added this to the 1.69 Release milestone Nov 13, 2024

zasweq added the Type: Feature New features or improvements in behavior label Nov 13, 2024

zasweq force-pushed the pf-metrics branch 2 times, most recently from 1cca48a to e214b5f Compare November 14, 2024 02:51

zasweq force-pushed the pf-metrics branch from e214b5f to 4e672c2 Compare November 14, 2024 03:01

dfawley assigned arjan-bal and unassigned dfawley Nov 19, 2024

dfawley requested a review from arjan-bal November 19, 2024 22:20

easwars reviewed Nov 19, 2024

View reviewed changes

easwars assigned zasweq Nov 19, 2024

zasweq assigned dfawley and unassigned zasweq Nov 20, 2024

arjan-bal reviewed Nov 20, 2024

View reviewed changes

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

arjan-bal reviewed Nov 20, 2024

View reviewed changes

arjan-bal assigned zasweq and unassigned arjan-bal Nov 20, 2024

easwars reviewed Nov 20, 2024

View reviewed changes

zasweq added 3 commits November 20, 2024 15:37

Add pick first metrics

2456fc2

Responded to implementation comments

5cf2c51

Responded to testing comments

d302b4a

zasweq force-pushed the pf-metrics branch from 5c9ca2e to 6090656 Compare November 20, 2024 23:39

zasweq assigned easwars and arjan-bal and unassigned dfawley Nov 20, 2024

zasweq removed their assignment Nov 20, 2024

Fix GA

12011f0

zasweq force-pushed the pf-metrics branch from 6090656 to 12011f0 Compare November 20, 2024 23:57

dfawley reviewed Nov 21, 2024

View reviewed changes

zasweq added 2 commits November 20, 2024 16:14

Responded to Doug's comments

dbd4905

Deflake waiting for IDLE

d1046ef

arjan-bal reviewed Nov 21, 2024

View reviewed changes

arjan-bal assigned zasweq and unassigned arjan-bal Nov 21, 2024

easwars approved these changes Nov 21, 2024

View reviewed changes

easwars removed their assignment Nov 21, 2024

zasweq added 2 commits November 22, 2024 14:34

Responded to comments

f399170

Scale up some happy eyeballs tests

325979f

zasweq assigned easwars and arjan-bal and unassigned zasweq Nov 22, 2024

Add another happy eyeballs test

f2d97e7

arjan-bal reviewed Nov 25, 2024

View reviewed changes

arjan-bal assigned zasweq and unassigned arjan-bal Nov 25, 2024

Responded to Arjan's comments

7a063fa

zasweq assigned arjan-bal and unassigned zasweq Nov 25, 2024

arjan-bal approved these changes Nov 26, 2024

View reviewed changes

arjan-bal removed their assignment Nov 26, 2024

zasweq merged commit 967ba46 into grpc:master Nov 26, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

balancer/pickfirst: Add pick first metrics #7839

balancer/pickfirst: Add pick first metrics #7839

zasweq commented Nov 13, 2024 •

edited by arjan-bal

Loading

codecov bot commented Nov 14, 2024 •

edited

Loading

dfawley commented Nov 19, 2024

zasweq commented Nov 20, 2024

dfawley commented Nov 20, 2024

easwars Nov 20, 2024

zasweq Nov 20, 2024

easwars Nov 21, 2024

zasweq Nov 22, 2024

zasweq Nov 22, 2024

zasweq Nov 22, 2024

easwars Nov 20, 2024

zasweq Nov 20, 2024

arjan-bal Nov 21, 2024

easwars Nov 21, 2024

zasweq Nov 22, 2024

zasweq Nov 22, 2024 •

edited

Loading

zasweq commented Nov 20, 2024

dfawley Nov 20, 2024

zasweq Nov 21, 2024

dfawley Nov 20, 2024

zasweq Nov 21, 2024

dfawley Nov 21, 2024

zasweq Nov 21, 2024

arjan-bal Nov 21, 2024

zasweq commented Nov 22, 2024

zasweq commented Nov 25, 2024

arjan-bal left a comment

balancer/pickfirst: Add pick first metrics #7839

balancer/pickfirst: Add pick first metrics #7839

Conversation

zasweq commented Nov 13, 2024 • edited by arjan-bal Loading

codecov bot commented Nov 14, 2024 • edited Loading

Codecov Report

dfawley commented Nov 19, 2024

zasweq commented Nov 20, 2024

dfawley commented Nov 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zasweq Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

zasweq commented Nov 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zasweq commented Nov 22, 2024

zasweq commented Nov 25, 2024

arjan-bal left a comment

Choose a reason for hiding this comment

zasweq commented Nov 13, 2024 •

edited by arjan-bal

Loading

codecov bot commented Nov 14, 2024 •

edited

Loading

zasweq Nov 22, 2024 •

edited

Loading