-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oidc client auth provider: cache OpenID Connect clients to prevent reinitialization #38167
oidc client auth provider: cache OpenID Connect clients to prevent reinitialization #38167
Conversation
Manually tested and I can happily confirm it reduces the number of GET requests to the openid-configuration URL from 60+ to 1:
|
@philips. I have seen some requests taking 15 seconds with an OIDC provider, what did you observe an API call through kubectl taking with this change? |
@jsloyer
|
This is amazing, this is a huge speed up. I built and tested a kubectl client with this change |
var ( | ||
backoff = wait.Backoff{ | ||
Duration: 1 * time.Second, | ||
Factor: 2, | ||
Jitter: .1, | ||
Steps: 5, | ||
} | ||
cache = clientCache{cache: make(map[cacheKey]*oidc.Client)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache = &clientCache{cache: make(map[cacheKey]*oidc.Client)}
certAuthData, err = base64.StdEncoding.DecodeString(cfg[cfgCertificateAuthorityData]) | ||
if err != nil { | ||
return nil, err | ||
client, ok := cache.getClient(issuer, clientID, clientSecret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return early and unnest?
if ok {
return client, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's just enough logic after the if statement to make that annoying. Let me see if I can split it out to a helper or something.
if err != nil { | ||
return nil, fmt.Errorf("error creating OIDC Client: %v", err) | ||
} | ||
cache.setClient(issuer, clientID, clientSecret, client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still have the potential to construct this multiple times if newOIDCAuthProvider is called from multiple threads... are we ok with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think we are. restclient.AuthProviderConfigPersister
is kinda a harry abstraction for multiple calls to the underlying transport writing to kubeconfig. I think multiple clients running around might make it worse. We might even consider syncing around using that interface, e.g. put a mutex on the client itself. gah
a few nits, LGTM otherwise. tests would be good. |
Quick update on this PR. I've been diving a bit more into the implementation and it seems like there are some bigger issues here. For example the client doesn't sync around requesting refresh tokens and updating the kubeconfig. If it's used concurrently there's a very basic race there. Also the returned transport seems to be doing retries[0], which makes some bold assumptions about the server not consuming POST body data, and means we're not closing response bodies. Trying to clean this stuff up along with this change. cc @liggitt [0]
|
32bada3
to
380c5e8
Compare
@liggitt updated though this became a bit bigger of a PR. Highlights:
Since we already checked the id token's expiry these changes actually don't enforce anything we haven't already. It's completely backward compatible but uses more sane methods for determining token validity and persisting the config. For what it's worth I manually e2e tested this using dex. |
@k8s-bot bazel test this |
Jenkins GCI GKE smoke e2e failed for commit 380c5e8ed062707fa97896e4bdada124a126b24b. Full PR test history. The magic incantation to run this job again is |
b69e9d2
to
17be41d
Compare
@k8s-bot verify test this |
17be41d
to
c3edd4a
Compare
cc @yifan-gu |
Jenkins verification failed for commit c3edd4a. Full PR test history. The magic incantation to run this job again is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericchiang Just some nits, LGTM
} | ||
r2.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericchiang why we need to deepcopy header here?
https://play.golang.org/p/rxM2ogR1GS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To comply with the RoundTripper interface
RoundTrip should not modify the request...
https://golang.org/pkg/net/http/#RoundTripper
If we don't copy deepcopy the header then we're adding a bearer toke to the original header, modifying the original request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic as the oauth2 package https://github.com/golang/oauth2/blob/314dd2c0bf3ebd592ec0d20847d27e79d0dbe8dd/transport.go#L94-L106
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericchiang Gotcha.
defer c.mu.Unlock() | ||
key := cacheKey{issuer, clientID, clientSecret} | ||
|
||
if oldClient, ok := c.cache[key]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we want to return the old client?
if we want to use the old client when it's there we can always do getClient() first.
In fact, the newOIDCAuthProvider() function below, the setClient() is always returning the latest client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both routine A and B try to initialize at the same time, and A success and puts it in the cache, we want B to use A's client rather than the one it managed to create. We have the cache to always return the same client for each provider so it can guard the kubeconfig with the same mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericchiang I see, it's a testAndSet operation, maybe rename the func, or leave it as is with some comments, ok with either. Thanks 👍
cc @kubernetes/sig-auth-misc anyone else interested in reviewing this? Without this |
// Stubbed out for testing. | ||
now func() time.Time | ||
|
||
// Mutex guards presisting to the kubeconfig file and allows synchronized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
persisting
if err != nil { | ||
return jose.JWT{}, fmt.Errorf("could not perist new tokens: %v", err) | ||
idToken := jwt.Encode() | ||
newCfg[cfgIDToken] = idToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this just be newCfg[cfgIDToken] = tokens.IDToken
?
just a couple small things, LGTM |
* Cache OpenID Connect clients to prevent reinitialization * Don't retry requests in the http.RoundTripper. * Don't rely on the server not reading POST bodies. * Don't leak response body FDs. * Formerly ignored any throttling requests by the server. * Determine if the id token's expired by inspecting it. * Similar to logic in golang.org/x/oauth2 * Synchronize around refreshing tokens and persisting the new config.
7e44f2d
to
13e6318
Compare
Jenkins GCE etcd3 e2e failed for commit 13e6318. Full PR test history. The magic incantation to run this job again is |
@k8s-bot bazel test this |
1 similar comment
@k8s-bot bazel test this |
Jenkins Bazel Build failed for commit 13e6318. Full PR test history. The magic incantation to run this job again is |
@k8s-bot bazel test this Filed an issue about the empty result kubernetes/test-infra#1465 |
@k8s-bot etcd3 test this |
OK, tests finally went green. LGTM at this point but I am not an owner. |
LGTM |
Anyone know how to retrigger the kops build? |
@k8s-bot kops test this |
Automatic merge from submit-queue (batch tested with PRs 39648, 38167, 39591, 39415, 39612) |
Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Prior to this change, the OIDC client auth provider synced the OIDC provider metadata with every transport initialization. When used with kubectl, the provider can be reinitialize dozens of times, each triggering a round trip to the OIDC auth server. The overhead makes this feature largely unusable, adding several seconds to every kubectl invocation, and we observe many users simply abandoning it for other workarounds.
This fix prevents the excessive round trips by keeping a global cache of clients similarly to the TLS transport logic in client-go at the result of a significant speedup. The caching also fixes data races when a token is refreshed.
closes #37876
cc @kubernetes/sig-auth @liggitt @jsloyer @mlbiam @philips