Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alts: Queue ALTS handshakes once limit is reached rather than dropping. #6884

Merged

Conversation

matthewstevenson88
Copy link
Contributor

See b/312467484.

RELEASE NOTES: none

@matthewstevenson88 matthewstevenson88 added the Type: Feature New features or improvements in behavior label Dec 19, 2023
@matthewstevenson88 matthewstevenson88 added this to the 1.61 Release milestone Dec 19, 2023
@matthewstevenson88 matthewstevenson88 self-assigned this Dec 19, 2023
Copy link

codecov bot commented Dec 19, 2023

Codecov Report

Merging #6884 (0fbd287) into master (c109241) will decrease coverage by 0.15%.
The diff coverage is 0.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6884      +/-   ##
==========================================
- Coverage   83.75%   83.60%   -0.15%     
==========================================
  Files         286      286              
  Lines       30792    30792              
==========================================
- Hits        25789    25745      -44     
- Misses       3948     3981      +33     
- Partials     1055     1066      +11     
Files Coverage Δ
credentials/alts/internal/handshaker/handshaker.go 69.58% <0.00%> (-6.19%) ⬇️

... and 25 files with indirect coverage changes

@matthewstevenson88 matthewstevenson88 marked this pull request as ready for review December 19, 2023 16:08
if !clientHandshakes.TryAcquire(1) {
return nil, nil, errDropped
if err := clientHandshakes.Acquire(ctx, 1); err != nil {
return nil, nil, err
Copy link
Contributor

@cesarghali cesarghali Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify how returning the actual error help queuing instead of dropping? Is there another PR after this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code, we call TryAcquire on the semaphore. If the semaphore is capped out, then this immediately returns false, so we immediately return errDropped to the user.

When we replace the TryAcquire call with Acquire instead, we will block on trying to acquire the semaphore until ctx times out. Internally to the semaphore, there is a queue of "acquire attempts", so calling Acquire effectively adds the current goroutine to this queue.

As explained in b/312467484, we want to do this so that we have uniform behavior among the 3 languages.

Does that make sense / help clarify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. This makes sense. Can you please add a comment that Acquire blocks until it can acquire? Thank you!

@matthewstevenson88
Copy link
Contributor Author

@easwars If you have time, would you also be able to take a quick pass?

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this queueing behavior? Something like an e2e style test would be nice.

@matthewstevenson88
Copy link
Contributor Author

matthewstevenson88 commented Dec 19, 2023

Thanks for taking a look!

Is there a test for this queueing behavior? Something like an e2e style test would be nice.

There is an e2e test that ensures that this PR works correctly:

func (s) TestConcurrentHandshakes(t *testing.T) {

Before this PR, that test will just get several errors, log these errors, and have to retry. With this PR, the same thing occurs but instead of returning an error and logging, we are waiting until one of the ongoing handshakes completes, then proceeding. From the user perspective, there is very little difference, but we're making this change to be aligned with the C++ and Java behavior.

@matthewstevenson88 matthewstevenson88 merged commit adc7685 into grpc:master Dec 19, 2023
14 checks passed
matthewstevenson88 added a commit that referenced this pull request Dec 28, 2023
matthewstevenson88 added a commit that referenced this pull request Dec 28, 2023
matthewstevenson88 added a commit that referenced this pull request Jan 2, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants