-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporating feedback on 119341 #120087
Incorporating feedback on 119341 #120087
Conversation
Hi @divyasri537. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
a415565
to
e250232
Compare
/assign @liggitt |
@@ -169,8 +169,7 @@ func (a *mutatingDispatcher) Dispatch(ctx context.Context, attr admission.Attrib | |||
switch err := err.(type) { | |||
case *webhookutil.ErrCallingWebhook: | |||
if ctx.Err() == context.Canceled { | |||
klog.Warningf("Context Canceled when calling webhook %v", hook.Name) | |||
return err | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to break here either, we probably want to flow through to the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added break statement here, so the metric counter in next steps wont pick the canceled context.
@@ -169,8 +169,7 @@ func (a *mutatingDispatcher) Dispatch(ctx context.Context, attr admission.Attrib | |||
switch err := err.(type) { | |||
case *webhookutil.ErrCallingWebhook: | |||
if ctx.Err() == context.Canceled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you look into checking err
instead of ctx.Err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I checked and added the comments in the old PR #119341
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err that is from callAttrMutatingHook method [ref] based on the provided ctx . The err.Reason returns customized error response that picks up the ctx error so I don’t think anything wrong by checking this way ctx.Err() == context.Canceled as it is similar to checking the err obj.
Also The err.Reason returns
%!(EXTRA string=failed to call webhook: Post "https://127.0.0.1:21375/validate?timeout=2s": context canceled) which is a customized error and may break if someone modifies the error message so I think its better to go with checking ctx.Err().
@@ -174,8 +174,7 @@ func (d *validatingDispatcher) Dispatch(ctx context.Context, attr admission.Attr | |||
switch err := err.(type) { | |||
case *webhookutil.ErrCallingWebhook: | |||
if ctx.Err() == context.Canceled { | |||
klog.Warningf("Context Canceled when calling webhook %v", hook.Name) | |||
return | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing with break here
/ok-to-test |
/kind bug |
staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go
Outdated
Show resolved
Hide resolved
if !ignoreClientCallFailures { | ||
rejected = true | ||
admissionmetrics.Metrics.ObserveWebhookRejection(ctx, hook.Name, "validating", string(versionedAttr.Attributes.GetOperation()), admissionmetrics.WebhookRejectionCallingWebhookError, int(err.Status.ErrStatus.Code)) | ||
if !errors.Is(err.Reason, context.Canceled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment, it doesn't only apply to failopen metrics
klog.Warningf("Context Canceled when calling webhook %v", hook.Name) | ||
errCh <- apierrors.NewInternalError(err) | ||
return | ||
} | ||
if ignoreClientCallFailures { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so here just doing
errCh <- apierrors.NewInternalError(err)
return
is not necessarily the same behavior. The if ignoreClientCallFailures
block uses utilruntime.HandlerError while the other block does the errCh <- apierrors.NewInternalError(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets see what liggitt thinks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it probably makes sense to always do errCh <- apierrors.NewInternalError(err)
for context cancelled but will defer to @liggitt
@@ -202,6 +201,10 @@ func (a *mutatingDispatcher) Dispatch(ctx context.Context, attr admission.Attrib | |||
} | |||
|
|||
if callErr, ok := err.(*webhookutil.ErrCallingWebhook); ok { | |||
if errors.Is(callErr.Reason, context.Canceled) { | |||
klog.Warningf("Context Canceled when calling webhook %v", hook.Name) | |||
return apierrors.NewInternalError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, not necessarily the same behavior. The blocks for ignoreClientCallFailures
and outside that block handle the error differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utilruntime.HandleError(callErr) returns "failed calling webhook "eks-aws-auth-configmap-validation-webhook.amazonaws.com": failed to call webhook: Post "https://127.0.0.1:21375/mutate?timeout=2s": context canceled" as we wanted to put customized log msg for failopen context canceled so keeping this change and return apierrors.NewInternalError(err).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @liggitt I think if context is cancelled it makes sense to just return this error. Previous behavior was it would continue
if ignoreClientCallFailures {
but we can probably just return and bail out
/triage accepted |
@@ -169,6 +169,7 @@ func (a *mutatingDispatcher) Dispatch(ctx context.Context, attr admission.Attrib | |||
if err != nil { | |||
switch err := err.(type) { | |||
case *webhookutil.ErrCallingWebhook: | |||
// Ignore context canceled from failopen metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't only apply to failopen metric
rejected = true | ||
admissionmetrics.Metrics.ObserveWebhookRejection(ctx, hook.Name, "admit", string(versionedAttr.Attributes.GetOperation()), admissionmetrics.WebhookRejectionCallingWebhookError, int(err.Status.ErrStatus.Code)) | ||
} | ||
admissionmetrics.Metrics.ObserveWebhook(ctx, hook.Name, time.Since(t), rejected, versionedAttr.Attributes, "admit", int(err.Status.ErrStatus.Code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipping the webhook rejection metric for cancelled requests seems ok, but do we really want to lose visibility to the fact that the webhook was called at all and the latency we hit for canceled requests?
I was hoping we could revert back to the state at 5e5b302 and narrowly skip the ObserveWebhookRejection
and ObserveWebhookFailOpen
/addFailedOpenAnnotation
bits when the reason was a context cancel
something like this:
$ git diff -w 5e5b3029f3bbfc93c3569f07ad300a5c6057fc58 HEAD -- staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go
diff --git a/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go b/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go
index c1d1ca6ff6b..61b17932df1 100644
--- a/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go
+++ b/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go
@@ -20,6 +20,7 @@ package mutating
import (
"context"
+ "errors"
"fmt"
"time"
@@ -164,14 +165,18 @@ func (a *mutatingDispatcher) Dispatch(ctx context.Context, attr admission.Attrib
annotator := newWebhookAnnotator(versionedAttr, round, i, hook.Name, invocation.Webhook.GetConfigurationName())
changed, err := a.callAttrMutatingHook(ctx, hook, invocation, versionedAttr, annotator, o, round, i)
ignoreClientCallFailures := hook.FailurePolicy != nil && *hook.FailurePolicy == admissionregistrationv1.Ignore
+
rejected := false
if err != nil {
switch err := err.(type) {
case *webhookutil.ErrCallingWebhook:
if !ignoreClientCallFailures {
rejected = true
+ // Ignore context cancelled from webhook metrics
+ if !errors.Is(err.Reason, context.Canceled) {
admissionmetrics.Metrics.ObserveWebhookRejection(ctx, hook.Name, "admit", string(versionedAttr.Attributes.GetOperation()), admissionmetrics.WebhookRejectionCallingWebhookError, int(err.Status.ErrStatus.Code))
}
+ }
admissionmetrics.Metrics.ObserveWebhook(ctx, hook.Name, time.Since(t), rejected, versionedAttr.Attributes, "admit", int(err.Status.ErrStatus.Code))
case *webhookutil.ErrWebhookRejection:
rejected = true
@@ -199,9 +204,14 @@ func (a *mutatingDispatcher) Dispatch(ctx context.Context, attr admission.Attrib
if callErr, ok := err.(*webhookutil.ErrCallingWebhook); ok {
if ignoreClientCallFailures {
+ // Ignore context cancelled from webhook metrics
+ if errors.Is(callErr.Reason, context.Canceled) {
+ klog.Warningf("Context canceled when calling webhook %v", hook.Name)
+ } else {
klog.Warningf("Failed calling webhook, failing open %v: %v", hook.Name, callErr)
admissionmetrics.Metrics.ObserveWebhookFailOpen(ctx, hook.Name, "admit")
annotator.addFailedOpenAnnotation()
+ }
utilruntime.HandleError(callErr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comments apply to the validating dispatcher)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some feedback though that we wanted the math to add up in the metrics correctly. #119341 (comment) so that total = success + failure. So we were looking at skipping the metrics or adding a new metric that counts cancelled. #119341 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going to be backporting this, we should err on the side of fewer changes, imo. narrowly skipping the misleading rejection/failopen bits for cancelled requests and changing ~nothing else w.r.t. return values makes sense for a backport
if we want to add a new metric or label or label value to distinguish cancellation as a follow up in 1.29+ only, that also seems ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense
27fa66e
to
e9459e3
Compare
/lgtm |
LGTM label has been added. Git tree hash: 3cf1a9f2fd98141cbdeb8027e961d9700c3666ac
|
/lgtm |
/hold please squash down to a single commit before merge |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: divyasri537, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e9459e3
to
67d35ce
Compare
@liggitt Squashed to one commit. |
67d35ce
to
24877f9
Compare
/lgtm |
/retest |
…#120087-upstream-release-1.28 Automated cherry pick of #120087: Incorporating feedback on 119341
What type of PR is this?
/kind bug
What this PR does / why we need it:
Incorporating feedback on PR #119341
Which issue(s) this PR fixes:
Fixes #119341
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: