-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCE: Fix operation polling and error handling #64630
Conversation
/cc |
9210eaa
to
72f68e2
Compare
/cc @rramkumar1 |
f69e15c
to
20b3bcc
Compare
@@ -27,6 +27,10 @@ import ( | |||
ga "google.golang.org/api/compute/v1" | |||
) | |||
|
|||
const ( | |||
maxOperationGetErrorStreak = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this maxConsecutiveOperationGetErrors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
// If an error occurs retrieving the operation, the loop will continue for `maxOpGetRetries` then | ||
// finally error. This is to prevent a transient error from bubbling up to controller-level logic. | ||
func (s *Service) pollOperation(ctx context.Context, op operation) error { | ||
var errorStreak int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consecutiveErrors
156bff2
to
891080a
Compare
3632b39
to
20e662b
Compare
/retest |
@nicksardo: You must be a member of the kubernetes-milestone-maintainers github team to set the milestone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
PTAL @bowei |
|
||
// Error returns a string representation including the HTTP Status code, GCE's error code | ||
// and a human readable message. | ||
func (e GCEOperationError) Error() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(e *GCEOperationError)
} | ||
|
||
// Error returns a string representation including the last poll error encountered. | ||
func (e OperationPollingError) Error() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pointer
// acceptor is an object which blocks within Accept until a call is allowed to run. | ||
// Accept is a behavior of the flowcontrol.RateLimiter interface. | ||
type acceptor interface { | ||
Accept() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golint
|
||
// AcceptRateLimiter wraps an Acceptor with RateLimiter parameters. | ||
type AcceptRateLimiter struct { | ||
Acceptor acceptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golint
// MinimumRateLimiter wraps a RateLimiter and will only call its Accept until the minimum | ||
// duration has been met or the context is cancelled. | ||
type MinimumRateLimiter struct { | ||
RateLimiter RateLimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golint
// returning ctx.Err(). | ||
select { | ||
case <-ctx.Done(): | ||
glog.V(5).Infof("op.pollOperation(%v, %v) not completed, poll count = %v, ctx.Err = %v", ctx, op, pollCount, ctx.Err()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use %d for ints
glog.V(5).Infof("op.isDone(%v) complete; op = %v", ctx, op) | ||
return nil | ||
|
||
glog.V(5).Infof("op.isDone(%v) complete; op = %v, poll count = %v, op.err = %v", ctx, op, pollCount, op.error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use %d for ints
) | ||
|
||
func TestPollOperation(t *testing.T) { | ||
var attempts int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const totalAttempts = 11
func TestPollOperation(t *testing.T) { | ||
var attempts int | ||
fo := &fakeOperation{isDoneFunc: func(ctx context.Context) (bool, error) { | ||
if attempts <= 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
< totalAttemps
t.Errorf("pollOperation() = %v, want nil", err) | ||
} | ||
if attempts != 11 { | ||
t.Errorf("`attempts` = %v, want 11", attempts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want %d", ..., totalAttempts
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bowei, nicksardo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/milestone v1.11 |
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process Pull Request Labels
|
/test pull-kubernetes-bazel-build |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 64272, 64630). If you want to cherry-pick this change to another branch, please follow the instructions here. |
@nicksardo: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
e := op.Error.Errors[0] | ||
o.err = &GCEOperationError{HTTPStatusCode: op.HTTPStatusCode, Code: e.Code, Message: e.Message} | ||
} | ||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of these code are very similar. actually you can embed a struct with an error
method. When I wrote this, I realized that it also applies to String()
and isDone()
. feel free to ignore, totally optional. maybe i will submit a PR later.
Cloud functions using the generated API are bursting operation GET calls because we don't wait a minimum amount of time.
Fixes #64712
Fixes #64858
Changes
operationPollInterval
is now 1 second instead of 3 seconds.operationPollRateLimiter
is now configured with 5 QPS / 5 burst instead of 10 QPS / 10 burst.gceRateLimiter
is now configured with aMinimumRateLimiter
to wait the aboveoperationPollInterval
duration before waiting on the token rate limiter.DONE
or context times out (even if operations.get fails continuously).DONE
.ingress-gce
will need to update its vendor and utilize theMinimumRateLimiter
as well. Since ingress creates rate limiters based off flags, we'll need to check the resource type and operation while parsing the flags and wrap the appropriate one.Special notes for your reviewer:
/assign bowei
/cc bowei
Fix Example
Creating an external load balancer
without fix: https://pastebin.com/raw/NNkeNWS3
with fix: https://pastebin.com/raw/x2iMLW5S (a difference of about 200 GET calls)
Release note: