Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ringhash: fix normalizeWeights #7156

Merged
merged 7 commits into from
May 29, 2024
Merged

ringhash: fix normalizeWeights #7156

merged 7 commits into from
May 29, 2024

Conversation

arvindbr8
Copy link
Member

RELEASE NOTES:

  • ringhash: fix normalizeWeights in the event of just a weight update

Copy link
Collaborator

@atollena atollena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing because some subConn are instantiated with a default weight of 0 at

testSubConnMap.Set(testAddrs[0], &subConn{addr: "a"})
testSubConnMap.Set(testAddrs[1], &subConn{addr: "b"})
testSubConnMap.Set(testAddrs[2], &subConn{addr: "c"})
.

xds/internal/balancer/ringhash/ring.go Outdated Show resolved Hide resolved
xds/internal/balancer/ringhash/ring.go Outdated Show resolved Hide resolved
xds/internal/balancer/ringhash/ringhash_test.go Outdated Show resolved Hide resolved
@arvindbr8
Copy link
Member Author

Thanks for the review @atollena and @aranjans.

I still think this diff is not ready to be merged yet. There is still the open question why getWeightAttribute(a) doesnt work in this case, even though (*subConn).weight is set using getWeightAttribute(a). I wanted to check with @zasweq if he knows what's going on.

@zasweq
Copy link
Contributor

zasweq commented Apr 22, 2024

There's two other maintainers actively looking at this so I'll remove myself.

@zasweq zasweq assigned aranjans and unassigned zasweq Apr 22, 2024
@zasweq zasweq removed their request for review April 22, 2024 17:24
@atollena
Copy link
Collaborator

Thanks for the review @atollena and @aranjans.

I still think this diff is not ready to be merged yet. There is still the open question why getWeightAttribute(a) doesnt work in this case, even though (*subConn).weight is set using getWeightAttribute(a). I wanted to check with @zasweq if he knows what's going on.

Ha. I thought that you had that figured out. I took a look at the code and, with my limited understanding, it looks like the "bug" is in resolver.AddressMap.Set: https://github.com/grpc/grpc-go/blob/master/resolver/map.go#L81-L90

At L86, when we found that we already had the address in the map, the code only updates the value, not the addr. So if the BalancerAttribute changed, the addr isn't updated. As a result, if the weight changes, which is a BalancerAttribute, then the resolver map does not update the entry with the new address.

From the documentation of resolver.AddressMap, it sounds like resolver.BalancerAttribute should always be ignored, and two addresses with all fields equal except for resolver.BalancerAttribute and Metadata have the same key. But then, it's not super clear whether calling Set twice with two addresses having varying resolver.BalancerAttribute should update the key.

I see two options to fix this:

  1. Make resolver.AddressMap update Metadata and BalancerAttribute when setting the value associated with an address.
  2. Make resolver.AddressMap always clear Metadata and BalancerAttribute before storing the address entry. I think it would be a bit less confusing, and make it clear that Metadata and BalancerAttribute should be stored separately when using an address map, e.g. in the value. This is a breaking change though, since resolver.AddressMap is public (it might not hurt to mark it experimental -- FWIW we have 1 usage of it internally at datadog, for a custom balancer which is a variation of weighted round robin).

I'd sollicit Doug and Zach's input on this because I don't have the context of when address map is useful (looks like it's mostly to find existing subconn associated with an address in O(1)?), but option 1 seems more realistic.

@aranjans aranjans assigned arvindbr8 and unassigned aranjans Apr 23, 2024
@atollena atollena added the Type: Feature New features or improvements in behavior label Apr 26, 2024
@atollena atollena added this to the 1.64 Release milestone Apr 26, 2024
@arjan-bal
Copy link
Contributor

arjan-bal commented Apr 29, 2024

If we are open to making a breaking change, another option to consider, in addition to those listed in #7156 (comment) :

  1. Introduce a new struct type to serve as the key for AddressMap.m, call it MapKey. MapKey will contain only the Addr and ServerName strings as members. Change the AddressMap.Keys() method to return objects of MapKey. This way the compiler ensures that callers access only those fields of the key that are set.

We could also achieve the same by first deprecating the existing Keys method and introducing a new method with the behaviour described above.

@arvindbr8 arvindbr8 linked an issue Apr 30, 2024 that may be closed by this pull request
@arvindbr8 arvindbr8 modified the milestones: 1.64 Release, 1.65 Release May 9, 2024
@arvindbr8
Copy link
Member Author

From the documentation of resolver.AddressMap, it sounds like resolver.BalancerAttribute should always be ignored, and two addresses with all fields equal except for resolver.BalancerAttribute and Metadata have the same key. But then, it's not super clear whether calling Set twice with two addresses having varying resolver.BalancerAttribute should update the key.

what the docstring on resolver.AddressMap really talks about is that we should not be relying on the balancerAttributes in the key, but use value instead. And in ringHash, the value is of type ringhash.subConn which actually contains the weight information.

The actual bug is with using the rignhash.getWeightAttribute helper. This is helpful when getting the balancerattribute weight when updating addresses. But is incorrect when used to get weights from the AddressMap. Instead, we should type assert to ringhash.subConn and read weights from that instead.

cc'ing: @easwars; since he has most context on ringhash.

@easwars easwars self-assigned this May 14, 2024
min := 1.0
for _, a := range scVals {
scInfo := a.(*subConn)
// (*subConn).weight returns 1 if the weight attribute is not found on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it doesn't "return"

Suggested change
// (*subConn).weight returns 1 if the weight attribute is not found on
// (*subConn).weight is set to 1 if the weight attribute is not found on the address.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Done.

xds/internal/balancer/ringhash/ring.go Show resolved Hide resolved
xds/internal/balancer/ringhash/ring.go Show resolved Hide resolved
min := 1.0
for _, a := range scVals {
scInfo := a.(*subConn)
// (*subConn).weight returns 1 if the weight attribute is not found on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@easwars easwars removed their assignment May 28, 2024
Copy link
Member Author

@arvindbr8 arvindbr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated per comments. PTAL

xds/internal/balancer/ringhash/ring.go Show resolved Hide resolved
xds/internal/balancer/ringhash/ring.go Show resolved Hide resolved
min := 1.0
for _, a := range scVals {
scInfo := a.(*subConn)
// (*subConn).weight returns 1 if the weight attribute is not found on
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Done.

@arvindbr8 arvindbr8 removed their assignment May 28, 2024
@arvindbr8 arvindbr8 requested a review from easwars May 28, 2024 23:18
// each struct contains a subConn and its corresponding weight. The function
// also returns the minimum weight among all subConns.
//
// The weight of each subConn is calculated by dividing its weight attribute by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/The weight of each subConn/The normalized weight of each subConn/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

@easwars easwars assigned arvindbr8 and unassigned easwars May 29, 2024
@arvindbr8
Copy link
Member Author

Testing / tests (tests, 1.22, 386) (pull_request)

Something is wrong with this presubmit check. Will need to investigate.

Merging since this change is not expected to break 386 variant of go1.22 test

@arvindbr8 arvindbr8 merged commit 33faea8 into grpc:master May 29, 2024
11 checks passed
@arvindbr8 arvindbr8 deleted the dbug_6927 branch May 29, 2024 23:50
@arvindbr8 arvindbr8 added Type: Bug and removed Type: Feature New features or improvements in behavior labels May 29, 2024
@arvindbr8 arvindbr8 restored the dbug_6927 branch June 25, 2024 22:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XDS Ring Hash inconsistent after client and server restarts
6 participants