Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orca: fix race at producer startup #6245

Merged
merged 2 commits into from
May 3, 2023
Merged

orca: fix race at producer startup #6245

merged 2 commits into from
May 3, 2023

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented May 2, 2023

With #6236, an existing race was discovered (exposed in some new situations):

  1. orca.RegisterOOBListener is called for a SubChannel.
  2. The ORCA producer is created, an its run goroutine is started.
  3. run() could start the OOB stream with the server at any moment now, despite no registered listeners.
  4. Back in orca.RegisterOOBListener, the listener is registered.

Between 3 and 4, we could start the stream with a request interval of 0. The server's lower bound would apply to protect it from aggressive clients, but this causes at least some tests to fail. (Note that a similar race can theoretically happen between unregistering the final listener in a producer & registering a new one, but that would require the two operations to happen at the same time, which is unusual.)

The solution is to add a channel that is created whenever there are no listeners and closed (and nilled) if it exists when a listener is registered.

RELEASE NOTES:

  • orca: fix a race at startup of out-of-band metric subscriptions that would cause the report interval to request 0

@dfawley dfawley added this to the 1.56 Release milestone May 2, 2023
@dfawley dfawley requested a review from easwars May 2, 2023 21:07
orca/producer.go Outdated
mu sync.Mutex
intervals map[time.Duration]int // map from interval time to count of listeners requesting that time
listeners map[OOBListener]struct{} // set of registered listeners
hasIntervals chan struct{} // created when intervals is empty; closed and nilled when non-empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a sync.Cond instead? It seems like an appropriate use case for a condition variable. I wanted to try it out before asking for it. And it does seem simpler to use a condition variable instead of closing and recreating the channel.

@easwars easwars assigned dfawley and unassigned easwars May 2, 2023
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

var min time.Duration
// recomputeMinInterval sets p.minInterval to the minimum key's value in
// p.intervals.
func (p *producer) recomputeMinInterval() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/recomputeMinInterval/recomputeMinIntervalLocked/ ?

@dfawley dfawley merged commit 47b3c55 into grpc:master May 3, 2023
1 check passed
@dfawley dfawley deleted the orcarace branch May 3, 2023 20:47
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants