-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout and Max-in-flight don't report non-resource URLs correctly. #49678
Conversation
694d535
to
67ac843
Compare
85a0ecf
to
d88f091
Compare
func NewTooManyRequests(message string, retryAfterSeconds int) *StatusError { | ||
return &StatusError{metav1.Status{ | ||
Status: metav1.StatusFailure, | ||
Code: http.StatusServiceUnavailable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't keep them all in cache, but isn't this a 503, not a 429?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, yes, I don't know how this didn't get copied. I must have autocompleted the wrong field. Terrifying.
double check the status code I pointed out. otherwise lgtm. |
d88f091
to
d89508e
Compare
All of these errors are now part of the standard HTTP method. Formalize those into our error types and remove duplication and unclear separation.
This potentially has high cardinality, however we can rate limit based on queries to these endpoints as well.
Our rules are that code of the error must match code of the response. We were also not setting the correct reason. This updates the timeout filter to be consistent with other clients, without changing the error code (504 is correct). The new message properly indicates the request may still be running, which the old message did not do.
Also uses the standard error constructor for TooManyRequests and clarifies *why* a disruption is rejected.
SuggestClientDelay is returning whether the server has requested that the client delay their next action. It is *not* about whether the client should retry the action. Webhook was using it incorrectly, and the method is now up to date.
124d536
to
8c18db0
Compare
8c18db0
to
ddbc2ad
Compare
@deads2k see the SuggestClientDelay bit |
@@ -90,6 +90,11 @@ func WithExponentialBackoff(initialBackoff time.Duration, webhookFn func() error | |||
var err error | |||
wait.ExponentialBackoff(backoff, func() (bool, error) { | |||
err = webhookFn() | |||
// these errors indicate a need to retry an authentication check | |||
if apierrors.IsServerTimeout(err) || apierrors.IsTimeout(err) || apierrors.IsTooManyRequests(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these not trigger the lower check on a specified client delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. We don't actually honor the delay right now, but we have two classes of things:
- errors that mean retry
- an error or response (that doesn't automatically mean retry) that still asks you to retry for some other reason
lgtm |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton Associated issue requirement bypassed by: smarterclayton The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Unify error reporting for 429 and 504 to be correct for timeout and max in flight and eviction. Add better messages to eviction (removing a todo). Return the correct body content for timeouts (reason and code should be correct).
This potentially increases cardinality of 429, but because non-api urls may be under the max-inflight budget we need to report them somewhere (if something breaks and starts fetching API versions endlessly).