Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client-go: changes config.Timeout field semantic #101022

Closed

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Apr 12, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR uses context for cancellation instead of setting a timeout on the HTTP client.

It turns out that setting a timeout on HTTP client affect watch requests.
For example, with a 10 second timeout watch requests are being re-established exactly after 10 seconds even though the default request timeout for them is ~5 minutes (informers).

This is because if multiple timeouts were set, the stdlib picks the smaller timeout to be applied, leaving other useless.
For more details see https://github.com/golang/go/blob/a937729c2c2f6950a32bc5cd0f5b88700882f078/src/net/http/client.go#L364

This PR preserves the previous behavior for all requests except the watch requests. With this PR a timeout set by ListOptions on watches is respected - used heavily by the informers. In particular when a global timeout via config.Timeout was set and no other timeout was specified then watch requests will be terminated after config.Timeout, just like before.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

@p0lyn0mial: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 12, 2021
@p0lyn0mial p0lyn0mial force-pushed the client-go-request-timeout branch from 42c4516 to db930d6 Compare April 12, 2021 11:36
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 12, 2021
@p0lyn0mial
Copy link
Contributor Author

/assing @sttts @deads2k

@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Apr 12, 2021
}

// NewRESTClient creates a new RESTClient. This client performs generic REST functions
// such as Get, Put, Post, and Delete on specified paths.
func NewRESTClient(baseURL *url.URL, versionedAPIPath string, config ClientContentConfig, rateLimiter flowcontrol.RateLimiter, client *http.Client) (*RESTClient, error) {
func NewRESTClient(baseURL *url.URL, versionedAPIPath string, config ClientContentConfig, rateLimiter flowcontrol.RateLimiter, client *http.Client, requestTimeout time.Duration) (*RESTClient, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth to avoid a breaking change through NewRESTClientWithTimeout ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many clients of this method do we have?

I didn't have to fix many references, just two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since client-go is a library, this change affects non k/k code as well.

@@ -100,11 +100,14 @@ type RESTClient struct {

// Set specific behavior of the client. If not set http.DefaultClient will be used.
Client *http.Client

// RequestTimeout specifies a time limit for requests made by the HTTP Client
RequestTimeout time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I don't think so. Will change to private.


// validate
if len(res) > failedProbesTreshold {
t.Fatalf("%d%% of probes failed. That means some (all) watch requests ended sooner than we expected. The current treshold was set to %d%%"+
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message might look like this:

100% of probes failed. That means some (all) watch requests ended sooner than we expected. The current treshold was set to 33% We did 6 probes in total. The upper-bound time limit for a single watch reqeust was set to 11s. 6 probes ended before that time [10.000410361s 10.001010468s 10.000472694s 10.001261837s 10.002812845s 10.001133277s]

@p0lyn0mial
Copy link
Contributor Author

If this PR turns out to be viable then we don't need #100959

@p0lyn0mial
Copy link
Contributor Author

/retest

@@ -324,9 +324,12 @@ func RESTClientFor(config *Config) (*RESTClient, error) {
var httpClient *http.Client
if transport != http.DefaultTransport {
httpClient = &http.Client{Transport: transport}
if config.Timeout > 0 {
httpClient.Timeout = config.Timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this removed, how do I create a rest.Config that has a client-side, context based timeout on every request? I was fairly sure this was exactly the right spot to set up such a timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do I create a rest.Config that has a client-side, context based timeout on every request?

In exactly the same way as before. This behavior hasn't changed.

@deads2k
Copy link
Contributor

deads2k commented Apr 12, 2021

A global timeout, per client, enforced client-side, seems very reasonable to me and matches what the existing code does. The current problem we face appears to be using a client with an explicitly configured, client-enforced timeout for a request we don't want to have the timeout enforced on. This appears to be an "us" problem, not a client problem.

Why wouldn't the delegating authenticator simply create the client with the configuration it wants? Clearly it does not want watches terminating every 10 seconds.

@p0lyn0mial
Copy link
Contributor Author

A global timeout, per client, enforced client-side, seems very reasonable to me and matches what the existing code does

This PR sets a global, per client timeout too. That behaviour hasn't changed.
What has changed is that it respects a timeout set by watch requests.

The current problem we face appears to be using a client with an explicitly configured, client-enforced timeout for a request we don't want to have the timeout enforced on.

It seems that the current problem we face is that the timeout for watch request is not respected at all.

Why wouldn't the delegating authenticator simply create the client with the configuration it wants? Clearly it does not want watches terminating every 10 seconds.

I think that it would be good if the delegating authenticator could set a short timeout for SAR requests and a longer timeout for watch requests. This PR allows thath.

httpClient.Timeout = config.Timeout
}
// do not set a timeout on the http client, instead use context for cancellation
// if multiple timeouts were set, the request will pick the smaller timeout to be applied, leaving other useless.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if multiple timeouts were set, the request will pick the smaller timeout to be applied, leaving other useless

if multiple timeouts were set, I would expect the smallest to rule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is what we want for client-go then sure. As it is today sharing the same client for normal requests and informers and setting a conig.Timeout to 10 seconds will terminate watches created by informers every 10 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not have to create two clients to use an informer. client-go should let you specifically set a timeout for long running requests distinct from normal requests, and it should be clear what it does.

@p0lyn0mial p0lyn0mial force-pushed the client-go-request-timeout branch from dd98be3 to b010fa5 Compare April 13, 2021 07:27
@enj
Copy link
Member

enj commented May 7, 2021

/remove-sig auth

@k8s-ci-robot k8s-ci-robot removed the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label May 7, 2021
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 1, 2021
@k8s-ci-robot
Copy link
Contributor

@p0lyn0mial: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2021
@p0lyn0mial
Copy link
Contributor Author

/remove-lifecycle stal

@aojea
Copy link
Member

aojea commented Sep 26, 2021

/cc

@k8s-ci-robot k8s-ci-robot requested a review from aojea September 26, 2021 22:03
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 26, 2021
@p0lyn0mial
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 29, 2021
@dims
Copy link
Member

dims commented Jan 10, 2022

Is this PR still needed, please rebase if so (or we can close it?)

@caesarxuchao
Copy link
Member

/unassign

@krousey krousey removed their request for review February 23, 2022 22:00
@liggitt
Copy link
Member

liggitt commented Mar 26, 2022

closing due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code-generation area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.