-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable caching successful token authentication #50258
Conversation
cc @kubernetes/sig-auth-pr-reviews |
} | ||
if len(config.WebhookTokenAuthnConfigFile) > 0 { | ||
webhookTokenAuth, err := newWebhookTokenAuthenticator(config.WebhookTokenAuthnConfigFile, config.WebhookTokenAuthnCacheTTL) | ||
if err != nil { | ||
return nil, nil, err | ||
} | ||
authenticators = append(authenticators, bearertoken.New(webhookTokenAuth), websocket.NewProtocolAuthenticator(webhookTokenAuth)) | ||
hasTokenAuth = true | ||
tokenAuthenticators = append(tokenAuthenticators, webhookTokenAuth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This authenticator already does caching. Can we not double up?
kubernetes/staging/src/k8s.io/apiserver/plugin/pkg/authenticator/token/webhook/webhook.go
Line 76 in 90a45b2
r.Status = entry.(authentication.TokenReviewStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to rework that cache, I'd rather do it in a follow up. I wouldn't special-case the webhook token authenticator in the chain here. I also think the config knob for the TTL on the external authenticator makes more sense to expose.
fixed bazel build file, unit flaked on #50262 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments, but this looks good to me.
The only other interaction I'm worried about is if a user has set --authentication-token-webhook-cache-ttl 0
(or any TTL < 10s), this would potentially break their expectations about how often their authentication webhook is called. This probably isn't a great configuration to start with, but maybe there's something we can do to detect this and make sure we don't cache longer if the user has set a short TTL?
import "sync" | ||
|
||
// simple key/value cache with read/write locking | ||
type simpleCache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that important but we could add a TODO to switch this to the new builtin sync.Map
in Go 1.9.
Edit: once we are on 1.9, I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with existing lru cache
value, ok := c.data[key] | ||
return value, ok | ||
} | ||
func (c *simpleCache) set(key string, value *cacheRecord) *cacheRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If failureTTL
is non-zero, this will be caching failed attempts. This would allow DoS by filling all available memory with the underlying map
here. Maybe it should have a maximum size? Maybe even separate bounds for successful and unsuccessful records?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid points, but I'd probably defer those features until we need them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with existing lru cache with bounded maxsize
|
||
// New returns a token authenticator that validates credentials using a chain of authenticator.Token objects. | ||
// The entire chain is tried until one succeeds. If all fail, an aggregate error is returned. | ||
func New(authTokenHandlers ...authenticator.Token) authenticator.Token { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be an error if len(authTokenHandlers) == 0
(just in the interest of failing fast)? Feel free to ignore because you did add a test that makes sure it fails safe with zero handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep the signature error free (it does the right thing if there are no authenticators)
|
||
// NewFailOnError returns a request authenticator that validates credentials using a chain of authenticator.Request objects. | ||
// The first error short-circuits the chain. | ||
func NewFailOnError(authTokenHandlers ...authenticator.Token) authenticator.Token { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here (error if len(authTokenHandlers) == 0
).
return &unionAuthTokenHandler{Handlers: authTokenHandlers, FailOnError: true} | ||
} | ||
|
||
// AuthenticateRequest authenticates the request using a chain of authenticator.Request objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: godoc is wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
// NewTokenGroupAdder wraps a token authenticator, and adds the specified groups to the returned user when authentication succeeds | ||
func NewTokenGroupAdder(auth authenticator.Token, groups []string) authenticator.Token { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't expecting this. What is it for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mirrors the request group adder for assembling an auth chain and adding groups to one type of token auth (we'll use it downstream, but keeping the union/cache/adder implementations together made sense to me)
import "sync" | ||
|
||
// simple key/value cache with read/write locking | ||
type simpleCache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't expecting this. Aren't there already ttl and LRU caches? I thought someone used one in the admission chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually want an LRU cache... just the TTL. An LRU cache turns every read into a write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually want an LRU cache... just the TTL. An LRU cache turns every read into a write.
That means I'm going to find later on in this pull a spot where you're pruning old entries? You really think that's better? Seems like it would be pretty trivial to explode memory size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will look at swapping to using an existing lru/ttl cache... with the striping lock, that's probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
func (c *simpleCache) set(key string, value *cacheRecord) *cacheRecord { | ||
c.lock.Lock() | ||
defer c.lock.Unlock() | ||
if existing, exists := c.data[key]; exists && existing.expires.After(value.expires) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems weird to sometimes manage the expiry and sometimes not. If you check the expiry here, why not also check it during a get and avoid returning stale data entirely? If you don't do it there and leave it up to the caller, why check here?
I'm ok doing it either way (if we can't re-use the existing thread-safe LRU stuff), but the asymmetry bothers me.
|
||
// If our object was the one stored, queue the removal | ||
if cachedValue == value { | ||
// TODO: batch removals to avoid a goroutine per cache item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhh.... this seems worse than the disease. Have you measured against the next best solution: the existing LRU cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the comments around reusing the existing cache implementations. Everything else lgtm.
func fnvKeyFunc(key string) int64 { | ||
f := fnv.New32() | ||
f.Write([]byte(key)) | ||
return int64(f.Sum32()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not use the 64 bit function?
215054a
to
b92efc3
Compare
comments addressed, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment on the bearer token logic. Probably not the best person to review the striped cache code. Going to defer to @deads2k and @tallclair on that.
return record.user, record.ok, record.err | ||
} | ||
|
||
user, ok, err := a.authenticator.AuthenticateToken(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this to check the err and fail fast before evaluating ok. It seems odd that we'd cache errors. Maybe that's just a bad distinction between ok == false
and err != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cache the complete results. failure cases are only cached if the failure ttl is set (which this PR does not do... it only caches success). If we wanted to in the future, we could set different ttls on error and non-error cases.
type keyFunc func(string) uint32 | ||
type newCacheFunc func() cache | ||
|
||
func newStripedCache(stripeCount int, keyFunc keyFunc, newCacheFunc newCacheFunc) cache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the underlying cache is using k8s.io/apimachinery/pkg/util/cache
, it feels like this striping code belongs there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to be able to hash the key... which you can't assume for interface{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. Thanks.
@@ -83,7 +86,10 @@ type WebHookAuthenticationOptions struct { | |||
} | |||
|
|||
func NewBuiltInAuthenticationOptions() *BuiltInAuthenticationOptions { | |||
return &BuiltInAuthenticationOptions{} | |||
return &BuiltInAuthenticationOptions{ | |||
TokenSuccessCacheTTL: 10 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the way this default interacts with the webhook TTL configuration could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I can add a warning if webhook TTL is less than the overall success TTL, but we already have distributed use of token authentication that involves secondary caches (extension api servers and kubelets do tokenreviews with short ttl caches)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. A warning works for me, or just something in the docs for --authentication-token-webhook-cache-ttl
.
Another option might be to exclude webhookTokenAuth
from this new cache (leaving them with their own cache), or even to drop unionAuthTokenHandler
and use a separate cache for each token authorizer (where --authentication-token-webhook-cache-ttl
would become a flag that set the TTL on that particular cache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just added the warning for now, TTL under 10s for webhook isn't recommended anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nits.
|
||
type cacheRecord struct { | ||
user user.Info | ||
ok bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: document what these fields are
"k8s.io/apiserver/pkg/authentication/token/tokenfile" | ||
tokenunion "k8s.io/apiserver/pkg/authentication/token/union" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you commits are out of order (not sure if you plan on squashing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the order is correct... github actually displays them out of order... I was planning on leaving the components distinct
/retest |
1 similar comment
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: liggitt, tallclair Assign the PR to them by writing No associated issue. Update pull-request body to add a reference to an issue, or get approval with The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 49488, 50407, 46105, 50456, 50258) |
@@ -166,7 +166,9 @@ func TestAddFlagsFlag(t *testing.T) { | |||
ServiceAccounts: &kubeoptions.ServiceAccountAuthenticationOptions{ | |||
Lookup: true, | |||
}, | |||
TokenFile: &kubeoptions.TokenFileAuthenticationOptions{}, | |||
TokenFile: &kubeoptions.TokenFileAuthenticationOptions{}, | |||
TokenSuccessCacheTTL: 10 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like there are any command-line flags on the API server to tune these two TTL parameters. Was that an anticipated feature to come? Was their omission deliberate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already control via flags over the TTLs for "expensive" webhook token validation operations. There's no current plan to expose these as tunable settings.
Resolves #50472
To support revocation of service account tokens, an etcd lookup of the token and service account is done by the token authenticator. Controllers that make dozens or hundreds of API calls per second (like the endpoints controller) cause this lookup to be done very frequently on the same objects.
This PR: