Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DelegatingAuthenticationOptions: TokenReview request timeout #100959

Merged

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Apr 9, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

It turns out that setting a timeout on HTTP client affects watch requests made by the delegated authentication component.
With a 10 second timeout watch requests are being re-established exactly after 10 seconds even though the default request timeout for them is ~5 minutes.

This is because if multiple timeouts were set, the stdlib picks the smaller timeout to be applied, leaving others useless.
For more details see https://github.com/golang/go/blob/a937729c2c2f6950a32bc5cd0f5b88700882f078/src/net/http/client.go#L364

Instead of setting a timeout on the HTTP client, we should use context for cancellation.

This has the potential of being scattered across the codebase, perhaps we should seek a broader solution.

Here is a reproducer for re-establish watch requests when http.Client.Timeout is set with standard client-go library
p0lyn0mial/simple-watch#2

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fixes regression in 1.20 in DelegatingAuthenticationOptions to set the timeout only for the token review client. Previously the timeout was also applied to watches making them reconnecting every 10 seconds.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 9, 2021
@p0lyn0mial
Copy link
Contributor Author

p0lyn0mial commented Apr 9, 2021

/assing @deads2k @sttts

@k8s-ci-robot k8s-ci-robot added area/apiserver area/cloudprovider sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 9, 2021
@k8s-ci-robot k8s-ci-robot requested review from deads2k and enj April 9, 2021 11:35
// if the context expires sooner than our hard requestTimeout use it,
// otherwise set the request timeout to whatever is defined in requestTimeout
if w.requestTimeout > 0 {
if oldCtx := ctx; timeBeforeContextDeadline(time.Now().Add(w.requestTimeout), oldCtx) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, we could only set a deadline iff it wasn't set in the context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do the following, since contexts are hierarchical:

if w.requestTimeout > 0 {
   child, cancel := context.WithTimoeut(ctx,  w.requestTimeout)
   defer cancel()
    
   ctx = child
}

if the child has a shorter deadline then it will expire first, otherwise if the parent has a shorter deadline then the parent will expire it will cause the child to expire as well even though the child has a larger deadline.

I think that's how it should work, I would also recommend adding a unit test to assert this behavior. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would work too :)

@p0lyn0mial p0lyn0mial force-pushed the upstream-delegated-authn-timeout branch from 63a2068 to 65166f9 Compare April 9, 2021 12:02
@deads2k
Copy link
Contributor

deads2k commented Apr 9, 2021

I would like to see an integration test that proves (somehow) that we don't have short timeout watches. The load created by these is fairly significant in some clusters.

@p0lyn0mial
Copy link
Contributor Author

I would like to see an integration test that proves (somehow) that we don't have short timeout watches. The load created by these is fairly significant in some clusters.

I added an integration test to #101022

@p0lyn0mial
Copy link
Contributor Author

/retest

// if the context expires sooner than our hard requestTimeout use it,
// otherwise set the request timeout to whatever is defined in requestTimeout
if w.requestTimeout > 0 {
if oldCtx := ctx; timeBeforeContextDeadline(time.Now().Add(w.requestTimeout), oldCtx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @tkashem's suggestion is easier to read.

@@ -214,7 +214,7 @@ func NewDelegatingAuthenticationOptions() *DelegatingAuthenticationOptions {
ExtraHeaderPrefixes: []string{"x-remote-extra-"},
},
WebhookRetryBackoff: DefaultAuthWebhookRetryBackoff(),
ClientTimeout: 10 * time.Second,
RequestTimeout: 10 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is narrowly now a TokenRequestTimeout, right?

@@ -60,6 +60,9 @@ type DelegatingAuthenticatorConfig struct {
APIAudiences authenticator.Audiences

RequestHeaderConfig *RequestHeaderConfig

// RequestTimeout specifies a time limit for requests made by the authorization webhook client.
RequestTimeout time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenAccessReviewTimeout and move up to beneath the client

@deads2k
Copy link
Contributor

deads2k commented Apr 13, 2021

I'd like to tweak the names to indicate what they are for. But this approach is ok.

@p0lyn0mial p0lyn0mial force-pushed the upstream-delegated-authn-timeout branch from 65166f9 to 49291e0 Compare April 13, 2021 14:32
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 13, 2021
@deads2k
Copy link
Contributor

deads2k commented Apr 13, 2021

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 13, 2021
@p0lyn0mial
Copy link
Contributor Author

/assign @cheftako

for cloud-provider approval

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 14, 2021
@wojtek-t
Copy link
Member

/approve

[The changes in cloud-provider are mechanical.]

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2021
@dims
Copy link
Member

dims commented Apr 14, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, dims, p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@p0lyn0mial
Copy link
Contributor Author

pull-kubernetes-unit

=== FAIL: pkg/registry/apps/replicaset/storage TestWatch (7.94s)

/retest

@p0lyn0mial
Copy link
Contributor Author

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit dc2020e into kubernetes:master Apr 15, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Apr 15, 2021
k8s-ci-robot added a commit that referenced this pull request Apr 28, 2021
…100959-upstream-release-1.20

Automated cherry pick of #100959: DelegatingAuthenticationOptions TokenReview request timeout
k8s-ci-robot added a commit that referenced this pull request Apr 28, 2021
…100959-upstream-release-1.21

Automated cherry pick of #100959: DelegatingAuthenticationOptions TokenReview request timeout
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants