-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: implement ADS stream flow control mechanism #7458
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7458 +/- ##
==========================================
+ Coverage 81.57% 81.61% +0.04%
==========================================
Files 354 357 +3
Lines 27076 27267 +191
==========================================
+ Hits 22088 22255 +167
- Misses 3798 3804 +6
- Partials 1190 1208 +18
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor implementation comments; will look at new tests next pass.
select { | ||
case <-watcher1.updateCh: | ||
case <-ctx.Done(): | ||
t.Fatalf("Timed out waiting for update to reach watcher 1") | ||
} | ||
select { | ||
case <-watcher2.updateCh: | ||
case <-ctx.Done(): | ||
t.Fatalf("Timed out waiting for update to reach watcher 2") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these receives flaky? The sends have a default and there's no guarantee this testing goroutine gets to these lines before the OnUpdate callbacks send on the channel right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I guess since it's verifying something has updated it (but this would pass if numerous updates since next updates would get dropped). Do we care about this/is it important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these receives flaky? The sends have a default and there's no guarantee this testing goroutine gets to these lines before the OnUpdate callbacks send on the channel right?
No, they aren't flaky because the channels have a buffer of size 1
. So, even if testing goroutine is not reading on the channels when the update fires, the writes on the channels will go through.
Oh I guess since it's verifying something has updated it (but this would pass if numerous updates since next updates would get dropped). Do we care about this/is it important?
Our concern is not the number of times an update/error callback is invoked. Instead, it is the about when the next read happens. And if there is a bug in the xDS client which invokes the callbacks multiple times for a single update, that should be caught by other tests.
Here, we ensure that multiple reads on the ADS stream is not happening because we do have a check for that, a few lines below.
Hope this is satisfactory for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright sounds good.
// management server and therefore the watcher's OnResourceDoesNotExist() | ||
// callback is expected to be invoked. Verifies that no further reads are | ||
// attempted until the callback is completely handled by the watcher. | ||
func (s) TestADSFlowControl_ResourceDoesNotExist(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems similar to the test above albeit a few different operations. Can we refactor the two into a t-test? There seems to be a lot of shared code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is about 30 lines of common code at the start of these two tests, but writing a table driven test to cater to these two scenarios would make the logic inside the table complicated. As you can see for the error case, we simply configure a resource on the management server which is expected to NACKed by the xDS client. But for the resource-not-found case, we initially need to have a resource which is expected to be ACKed by the xDS client and then we remove it from the management server triggering the resource-not-found error.
My preference is to keep tests simpler to read and maintain, over tests that are shorter and share more code, but end up being complicated.
Let me know your thoughts. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright reading the test bodies I guess the operations are different enough. I guess refactoring into setup would be hard across tests since variable listeners and routes.
Reviewed tests and responded/resolved the implementation comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds support for ADS stream flow control. See grpc/grpc#34099 for complete context.
At a high level, we want watch callbacks from the xDS client to include a function to be invoked by the watcher when it has completely processed the update. And the ADS stream will block the next read until the previous update has been completely consumed by all watchers.
Summary of changes:
DoneNotifier
ADSFlowControl
to implement the logic of flow control. This has methods to increment the number of pending watchers, decrement the number of pending watchers and wait till the number of pending watchers is0
.ADSFlowControl
instance for the stream is passed to theauthority
along with the update.OnDone
callback will be invoked eventually.DoneNotifier
when they are done with an update.Schedule
have been changed with calls toScheduleOr
. This includescds
andxds_resolver
clusterresolver
.OnDone
. Seexds_server
.#ads-stream-flow-control
Fixes #7382
RELEASE NOTES: