client-go: changes config.Timeout field semantic #101022

p0lyn0mial · 2021-04-12T09:14:09Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR uses context for cancellation instead of setting a timeout on the HTTP client.

It turns out that setting a timeout on HTTP client affect watch requests.
For example, with a 10 second timeout watch requests are being re-established exactly after 10 seconds even though the default request timeout for them is ~5 minutes (informers).

This is because if multiple timeouts were set, the stdlib picks the smaller timeout to be applied, leaving other useless.
For more details see https://github.com/golang/go/blob/a937729c2c2f6950a32bc5cd0f5b88700882f078/src/net/http/client.go#L364

This PR preserves the previous behavior for all requests except the watch requests. With this PR a timeout set by ListOptions on watches is respected - used heavily by the informers. In particular when a global timeout via config.Timeout was set and no other timeout was specified then watch requests will be terminated after config.Timeout, just like before.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2021-04-12T09:14:10Z

@p0lyn0mial: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

p0lyn0mial · 2021-04-12T11:37:24Z

/assing @sttts @deads2k

sttts · 2021-04-12T11:43:56Z

staging/src/k8s.io/client-go/rest/client.go

 }

 // NewRESTClient creates a new RESTClient. This client performs generic REST functions
 // such as Get, Put, Post, and Delete on specified paths.
-func NewRESTClient(baseURL *url.URL, versionedAPIPath string, config ClientContentConfig, rateLimiter flowcontrol.RateLimiter, client *http.Client) (*RESTClient, error) {
+func NewRESTClient(baseURL *url.URL, versionedAPIPath string, config ClientContentConfig, rateLimiter flowcontrol.RateLimiter, client *http.Client, requestTimeout time.Duration) (*RESTClient, error) {


worth to avoid a breaking change through NewRESTClientWithTimeout ?

How many clients of this method do we have?

I didn't have to fix many references, just two.

Since client-go is a library, this change affects non k/k code as well.

sttts · 2021-04-12T11:44:50Z

staging/src/k8s.io/client-go/rest/client.go

@@ -100,11 +100,14 @@ type RESTClient struct {

 	// Set specific behavior of the client.  If not set http.DefaultClient will be used.
 	Client *http.Client
+
+	// RequestTimeout specifies a time limit for requests made by the HTTP Client
+	RequestTimeout time.Duration


does this have to be public?

no, I don't think so. Will change to private.

p0lyn0mial · 2021-04-12T11:50:16Z

test/integration/apimachinery/watch_restart_test.go

+
+	// validate
+	if len(res) > failedProbesTreshold {
+		t.Fatalf("%d%% of probes failed. That means some (all) watch requests ended sooner than we expected. The current treshold was set to %d%%"+


this message might look like this:

100% of probes failed. That means some (all) watch requests ended sooner than we expected. The current treshold was set to 33% We did 6 probes in total. The upper-bound time limit for a single watch reqeust was set to 11s. 6 probes ended before that time [10.000410361s 10.001010468s 10.000472694s 10.001261837s 10.002812845s 10.001133277s]

p0lyn0mial · 2021-04-12T11:51:33Z

If this PR turns out to be viable then we don't need #100959

p0lyn0mial · 2021-04-12T13:09:11Z

/retest

deads2k · 2021-04-12T13:52:36Z

staging/src/k8s.io/client-go/rest/config.go

@@ -324,9 +324,12 @@ func RESTClientFor(config *Config) (*RESTClient, error) {
 	var httpClient *http.Client
 	if transport != http.DefaultTransport {
 		httpClient = &http.Client{Transport: transport}
-		if config.Timeout > 0 {
-			httpClient.Timeout = config.Timeout


With this removed, how do I create a rest.Config that has a client-side, context based timeout on every request? I was fairly sure this was exactly the right spot to set up such a timeout.

how do I create a rest.Config that has a client-side, context based timeout on every request?

In exactly the same way as before. This behavior hasn't changed.

deads2k · 2021-04-12T13:55:40Z

A global timeout, per client, enforced client-side, seems very reasonable to me and matches what the existing code does. The current problem we face appears to be using a client with an explicitly configured, client-enforced timeout for a request we don't want to have the timeout enforced on. This appears to be an "us" problem, not a client problem.

Why wouldn't the delegating authenticator simply create the client with the configuration it wants? Clearly it does not want watches terminating every 10 seconds.

p0lyn0mial · 2021-04-12T16:07:24Z

A global timeout, per client, enforced client-side, seems very reasonable to me and matches what the existing code does

This PR sets a global, per client timeout too. That behaviour hasn't changed.
What has changed is that it respects a timeout set by watch requests.

The current problem we face appears to be using a client with an explicitly configured, client-enforced timeout for a request we don't want to have the timeout enforced on.

It seems that the current problem we face is that the timeout for watch request is not respected at all.

Why wouldn't the delegating authenticator simply create the client with the configuration it wants? Clearly it does not want watches terminating every 10 seconds.

I think that it would be good if the delegating authenticator could set a short timeout for SAR requests and a longer timeout for watch requests. This PR allows thath.

liggitt · 2021-04-12T18:01:11Z

staging/src/k8s.io/client-go/rest/config.go

-			httpClient.Timeout = config.Timeout
-		}
+		// do not set a timeout on the http client, instead use context for cancellation
+		// if multiple timeouts were set, the request will pick the smaller timeout to be applied, leaving other useless.


if multiple timeouts were set, the request will pick the smaller timeout to be applied, leaving other useless

if multiple timeouts were set, I would expect the smallest to rule

if this is what we want for client-go then sure. As it is today sharing the same client for normal requests and informers and setting a conig.Timeout to 10 seconds will terminate watches created by informers every 10 seconds.

You should not have to create two clients to use an informer. client-go should let you specifically set a timeout for long running requests distinct from normal requests, and it should be clear what it does.

enj · 2021-05-07T16:58:57Z

/remove-sig auth

k8s-ci-robot · 2021-06-08T10:27:21Z

@p0lyn0mial: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2021-09-06T11:03:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

p0lyn0mial · 2021-09-22T09:35:18Z

/remove-lifecycle stal

aojea · 2021-09-26T22:03:42Z

/cc

k8s-triage-robot · 2021-10-26T22:18:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

p0lyn0mial · 2021-10-29T09:32:13Z

/remove-lifecycle rotten

dims · 2022-01-10T17:04:35Z

Is this PR still needed, please rebase if so (or we can close it?)

caesarxuchao · 2022-01-25T23:03:24Z

/unassign

liggitt · 2022-03-26T17:23:41Z

closing due to inactivity

k8s-ci-robot requested review from juanvallejo and krousey April 12, 2021 09:15

p0lyn0mial force-pushed the client-go-request-timeout branch from 42c4516 to db930d6 Compare April 12, 2021 11:36

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 12, 2021

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Apr 12, 2021

sttts reviewed Apr 12, 2021

View reviewed changes

p0lyn0mial commented Apr 12, 2021

View reviewed changes

p0lyn0mial mentioned this pull request Apr 12, 2021

DelegatingAuthenticationOptions: TokenReview request timeout #100959

Merged

p0lyn0mial force-pushed the client-go-request-timeout branch from db930d6 to fd1bd1e Compare April 12, 2021 12:35

deads2k reviewed Apr 12, 2021

View reviewed changes

liggitt reviewed Apr 12, 2021

View reviewed changes

p0lyn0mial force-pushed the client-go-request-timeout branch from dd98be3 to b010fa5 Compare April 13, 2021 07:27

This was referenced May 1, 2021

Bug 1948311: bump to kube 1.21.0 and pick up the delegated AuthN fix openshift/openshift-apiserver#202

Merged

Bug 1948311: DelegatingAuthenticationOptions TokenReview request timeout openshift/cluster-kube-storage-version-migrator-operator#56

Merged

openshift-ci bot mentioned this pull request May 7, 2021

Bug 1948311: DelegatingAuthenticationOptions TokenReview request timeout openshift/cluster-storage-operator#165

Merged

k8s-ci-robot removed the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label May 7, 2021

This was referenced May 14, 2021

Bug 1948311: DelegatingAuthenticationOptions TokenReview request timeout openshift/cluster-openshift-controller-manager-operator#212

Merged

Bug 1948311: DelegatingAuthenticationOptions TokenReview request timeout openshift/aws-ebs-csi-driver-operator#126

Merged

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 1, 2021

p0lyn0mial mentioned this pull request Jun 28, 2021

client-go: warn if no timeout was specified #99408

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2021

k8s-ci-robot requested a review from aojea September 26, 2021 22:03

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 26, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 29, 2021

k8s-ci-robot unassigned caesarxuchao Jan 25, 2022

krousey removed their request for review February 23, 2022 22:00

liggitt closed this Mar 26, 2022

sttts mentioned this pull request Sep 12, 2024

🐛 Remove client timeout and use tcp timeout and http keepalive to avoid watches to close after 30s kcp-dev/kcp#3162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client-go: changes config.Timeout field semantic #101022

client-go: changes config.Timeout field semantic #101022

p0lyn0mial commented Apr 12, 2021 •

edited

Loading

k8s-ci-robot commented Apr 12, 2021

p0lyn0mial commented Apr 12, 2021

sttts Apr 12, 2021

p0lyn0mial Apr 12, 2021

caesarxuchao Apr 26, 2021

sttts Apr 12, 2021

p0lyn0mial Apr 12, 2021

p0lyn0mial Apr 12, 2021

p0lyn0mial commented Apr 12, 2021

p0lyn0mial commented Apr 12, 2021

deads2k Apr 12, 2021

p0lyn0mial Apr 12, 2021

deads2k commented Apr 12, 2021

p0lyn0mial commented Apr 12, 2021

liggitt Apr 12, 2021

p0lyn0mial Apr 12, 2021

smarterclayton Apr 16, 2021

enj commented May 7, 2021

k8s-ci-robot commented Jun 8, 2021

k8s-triage-robot commented Sep 6, 2021

p0lyn0mial commented Sep 22, 2021

aojea commented Sep 26, 2021

k8s-triage-robot commented Oct 26, 2021

p0lyn0mial commented Oct 29, 2021

dims commented Jan 10, 2022

caesarxuchao commented Jan 25, 2022

liggitt commented Mar 26, 2022

client-go: changes config.Timeout field semantic #101022

client-go: changes config.Timeout field semantic #101022

Conversation

p0lyn0mial commented Apr 12, 2021 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Apr 12, 2021

p0lyn0mial commented Apr 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial commented Apr 12, 2021

p0lyn0mial commented Apr 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Apr 12, 2021

p0lyn0mial commented Apr 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enj commented May 7, 2021

k8s-ci-robot commented Jun 8, 2021

k8s-triage-robot commented Sep 6, 2021

p0lyn0mial commented Sep 22, 2021

aojea commented Sep 26, 2021

k8s-triage-robot commented Oct 26, 2021

p0lyn0mial commented Oct 29, 2021

dims commented Jan 10, 2022

caesarxuchao commented Jan 25, 2022

liggitt commented Mar 26, 2022

p0lyn0mial commented Apr 12, 2021 •

edited

Loading