Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout and Max-in-flight don't report non-resource URLs correctly. #49678

Merged
merged 7 commits into from
Aug 5, 2017

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Jul 27, 2017

Unify error reporting for 429 and 504 to be correct for timeout and max in flight and eviction. Add better messages to eviction (removing a todo). Return the correct body content for timeouts (reason and code should be correct).

This potentially increases cardinality of 429, but because non-api urls may be under the max-inflight budget we need to report them somewhere (if something breaks and starts fetching API versions endlessly).

The 504 timeout error was returning a JSON error body that indicated it was a 500.  The body contents now correctly report a 500 error.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 27, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Jul 27, 2017
@smarterclayton smarterclayton added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jul 27, 2017
@smarterclayton smarterclayton force-pushed the 429_metric branch 2 times, most recently from 85a0ecf to d88f091 Compare July 27, 2017 02:53
func NewTooManyRequests(message string, retryAfterSeconds int) *StatusError {
return &StatusError{metav1.Status{
Status: metav1.StatusFailure,
Code: http.StatusServiceUnavailable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't keep them all in cache, but isn't this a 503, not a 429?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, yes, I don't know how this didn't get copied. I must have autocompleted the wrong field. Terrifying.

@deads2k
Copy link
Contributor

deads2k commented Jul 27, 2017

double check the status code I pointed out. otherwise lgtm.

All of these errors are now part of the standard HTTP method. Formalize
those into our error types and remove duplication and unclear
separation.
This potentially has high cardinality, however we can rate limit based
on queries to these endpoints as well.
Our rules are that code of the error must match code of the response. We
were also not setting the correct reason. This updates the timeout
filter to be consistent with other clients, without changing the error
code (504 is correct). The new message properly indicates the request
may still be running, which the old message did not do.
Also uses the standard error constructor for TooManyRequests and
clarifies *why* a disruption is rejected.
SuggestClientDelay is returning whether the server has requested that
the client delay their next action. It is *not* about whether the client
should retry the action. Webhook was using it incorrectly, and the
method is now up to date.
@smarterclayton smarterclayton force-pushed the 429_metric branch 3 times, most recently from 124d536 to 8c18db0 Compare July 31, 2017 21:28
@smarterclayton
Copy link
Contributor Author

@deads2k see the SuggestClientDelay bit

@@ -90,6 +90,11 @@ func WithExponentialBackoff(initialBackoff time.Duration, webhookFn func() error
var err error
wait.ExponentialBackoff(backoff, func() (bool, error) {
err = webhookFn()
// these errors indicate a need to retry an authentication check
if apierrors.IsServerTimeout(err) || apierrors.IsTimeout(err) || apierrors.IsTooManyRequests(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these not trigger the lower check on a specified client delay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. We don't actually honor the delay right now, but we have two classes of things:

  1. errors that mean retry
  2. an error or response (that doesn't automatically mean retry) that still asks you to retry for some other reason

@deads2k
Copy link
Contributor

deads2k commented Aug 3, 2017

lgtm

@smarterclayton
Copy link
Contributor Author

/approve no-issue

@smarterclayton smarterclayton added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

Associated issue requirement bypassed by: smarterclayton

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 8d6bbaa into kubernetes:master Aug 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants