Enable caching successful token authentication #50258

liggitt · 2017-08-07T16:48:17Z

Resolves #50472

To support revocation of service account tokens, an etcd lookup of the token and service account is done by the token authenticator. Controllers that make dozens or hundreds of API calls per second (like the endpoints controller) cause this lookup to be done very frequently on the same objects.

This PR:

Implements a cached token authenticator that conforms to the authenticator.Token interface
Implements a union token authenticator (same approach as the union request authenticator, conforming to the authenticator.Token interface)
Cleans up the auth chain construction to group all token authenticators (means we only do bearer and websocket header parsing once)
Adds a 10-second TTL cache to successful token authentication

API server authentication now caches successful bearer token authentication results for a few seconds.

liggitt · 2017-08-07T16:49:34Z

cc @kubernetes/sig-auth-pr-reviews

ericchiang · 2017-08-07T17:47:15Z

pkg/kubeapiserver/authenticator/config.go

 	}
 	if len(config.WebhookTokenAuthnConfigFile) > 0 {
 		webhookTokenAuth, err := newWebhookTokenAuthenticator(config.WebhookTokenAuthnConfigFile, config.WebhookTokenAuthnCacheTTL)
 		if err != nil {
 			return nil, nil, err
 		}
-		authenticators = append(authenticators, bearertoken.New(webhookTokenAuth), websocket.NewProtocolAuthenticator(webhookTokenAuth))
-		hasTokenAuth = true
+		tokenAuthenticators = append(tokenAuthenticators, webhookTokenAuth)


This authenticator already does caching. Can we not double up?

kubernetes/staging/src/k8s.io/apiserver/plugin/pkg/authenticator/token/webhook/webhook.go

Line 76 in 90a45b2

r.Status = entry.(authentication.TokenReviewStatus)

If we wanted to rework that cache, I'd rather do it in a follow up. I wouldn't special-case the webhook token authenticator in the chain here. I also think the config knob for the TTL on the external authenticator makes more sense to expose.

liggitt · 2017-08-07T19:10:36Z

fixed bazel build file, unit flaked on #50262

mattmoyer

Left a few minor comments, but this looks good to me.

The only other interaction I'm worried about is if a user has set --authentication-token-webhook-cache-ttl 0 (or any TTL < 10s), this would potentially break their expectations about how often their authentication webhook is called. This probably isn't a great configuration to start with, but maybe there's something we can do to detect this and make sure we don't cache longer if the user has set a short TTL?

mattmoyer · 2017-08-07T19:10:21Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cache_simple.go

+import "sync"
+
+// simple key/value cache with read/write locking
+type simpleCache struct {


It's not that important but we could add a TODO to switch this to the new builtin sync.Map in Go 1.9.

Edit: once we are on 1.9, I mean.

replaced with existing lru cache

mattmoyer · 2017-08-07T19:17:38Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cache_simple.go

+	value, ok := c.data[key]
+	return value, ok
+}
+func (c *simpleCache) set(key string, value *cacheRecord) *cacheRecord {


If failureTTL is non-zero, this will be caching failed attempts. This would allow DoS by filling all available memory with the underlying map here. Maybe it should have a maximum size? Maybe even separate bounds for successful and unsuccessful records?

valid points, but I'd probably defer those features until we need them

replaced with existing lru cache with bounded maxsize

mattmoyer · 2017-08-07T19:25:02Z

staging/src/k8s.io/apiserver/pkg/authentication/token/union/union.go

+
+// New returns a token authenticator that validates credentials using a chain of authenticator.Token objects.
+// The entire chain is tried until one succeeds. If all fail, an aggregate error is returned.
+func New(authTokenHandlers ...authenticator.Token) authenticator.Token {


Should it be an error if len(authTokenHandlers) == 0 (just in the interest of failing fast)? Feel free to ignore because you did add a test that makes sure it fails safe with zero handlers.

I'd rather keep the signature error free (it does the right thing if there are no authenticators)

mattmoyer · 2017-08-07T19:25:24Z

staging/src/k8s.io/apiserver/pkg/authentication/token/union/union.go

+
+// NewFailOnError returns a request authenticator that validates credentials using a chain of authenticator.Request objects.
+// The first error short-circuits the chain.
+func NewFailOnError(authTokenHandlers ...authenticator.Token) authenticator.Token {


Same here (error if len(authTokenHandlers) == 0).

mattmoyer · 2017-08-07T19:28:29Z

staging/src/k8s.io/apiserver/pkg/authentication/token/union/union.go

+	return &unionAuthTokenHandler{Handlers: authTokenHandlers, FailOnError: true}
+}
+
+// AuthenticateRequest authenticates the request using a chain of authenticator.Request objects.


Nit: godoc is wrong here.

deads2k · 2017-08-07T19:51:07Z

staging/src/k8s.io/apiserver/pkg/authentication/group/token_group_adder.go

+}
+
+// NewTokenGroupAdder wraps a token authenticator, and adds the specified groups to the returned user when authentication succeeds
+func NewTokenGroupAdder(auth authenticator.Token, groups []string) authenticator.Token {


Wasn't expecting this. What is it for?

mirrors the request group adder for assembling an auth chain and adding groups to one type of token auth (we'll use it downstream, but keeping the union/cache/adder implementations together made sense to me)

deads2k · 2017-08-07T19:52:45Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cache_simple.go

+import "sync"
+
+// simple key/value cache with read/write locking
+type simpleCache struct {


wasn't expecting this. Aren't there already ttl and LRU caches? I thought someone used one in the admission chain.

I don't actually want an LRU cache... just the TTL. An LRU cache turns every read into a write.

I don't actually want an LRU cache... just the TTL. An LRU cache turns every read into a write.

That means I'm going to find later on in this pull a spot where you're pruning old entries? You really think that's better? Seems like it would be pretty trivial to explode memory size.

ok, will look at swapping to using an existing lru/ttl cache... with the striping lock, that's probably fine.

deads2k · 2017-08-07T19:55:59Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cache_simple.go

+func (c *simpleCache) set(key string, value *cacheRecord) *cacheRecord {
+	c.lock.Lock()
+	defer c.lock.Unlock()
+	if existing, exists := c.data[key]; exists && existing.expires.After(value.expires) {


it seems weird to sometimes manage the expiry and sometimes not. If you check the expiry here, why not also check it during a get and avoid returning stale data entirely? If you don't do it there and leave it up to the caller, why check here?

I'm ok doing it either way (if we can't re-use the existing thread-safe LRU stuff), but the asymmetry bothers me.

deads2k · 2017-08-07T20:02:23Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go

+
+	// If our object was the one stored, queue the removal
+	if cachedValue == value {
+		// TODO: batch removals to avoid a goroutine per cache item


uhh.... this seems worse than the disease. Have you measured against the next best solution: the existing LRU cache?

tallclair

Agree with the comments around reusing the existing cache implementations. Everything else lgtm.

tallclair · 2017-08-07T21:04:39Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cache_striped.go

+func fnvKeyFunc(key string) int64 {
+	f := fnv.New32()
+	f.Write([]byte(key))
+	return int64(f.Sum32())


nit: why not use the 64 bit function?

liggitt · 2017-08-08T20:16:47Z

comments addressed, PTAL

ericchiang

Just one comment on the bearer token logic. Probably not the best person to review the striped cache code. Going to defer to @deads2k and @tallclair on that.

ericchiang · 2017-08-08T22:54:01Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go

+		return record.user, record.ok, record.err
+	}
+
+	user, ok, err := a.authenticator.AuthenticateToken(token)


I'd expect this to check the err and fail fast before evaluating ok. It seems odd that we'd cache errors. Maybe that's just a bad distinction between ok == false and err != nil

we cache the complete results. failure cases are only cached if the failure ttl is set (which this PR does not do... it only caches success). If we wanted to in the future, we could set different ttls on error and non-error cases.

mattmoyer · 2017-08-09T13:07:06Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cache_striped.go

+type keyFunc func(string) uint32
+type newCacheFunc func() cache
+
+func newStripedCache(stripeCount int, keyFunc keyFunc, newCacheFunc newCacheFunc) cache {


Now that the underlying cache is using k8s.io/apimachinery/pkg/util/cache, it feels like this striping code belongs there.

you need to be able to hash the key... which you can't assume for interface{}

Ah, that makes sense. Thanks.

mattmoyer · 2017-08-09T13:11:08Z

pkg/kubeapiserver/options/authentication.go

@@ -83,7 +86,10 @@ type WebHookAuthenticationOptions struct {
 }

 func NewBuiltInAuthenticationOptions() *BuiltInAuthenticationOptions {
-	return &BuiltInAuthenticationOptions{}
+	return &BuiltInAuthenticationOptions{
+		TokenSuccessCacheTTL: 10 * time.Second,


I still think the way this default interacts with the webhook TTL configuration could be confusing.

I suppose I can add a warning if webhook TTL is less than the overall success TTL, but we already have distributed use of token authentication that involves secondary caches (extension api servers and kubelets do tokenreviews with short ttl caches)

Thanks for clarifying. A warning works for me, or just something in the docs for --authentication-token-webhook-cache-ttl.

Another option might be to exclude webhookTokenAuth from this new cache (leaving them with their own cache), or even to drop unionAuthTokenHandler and use a separate cache for each token authorizer (where --authentication-token-webhook-cache-ttl would become a flag that set the TTL on that particular cache).

just added the warning for now, TTL under 10s for webhook isn't recommended anyway

tallclair

LGTM, just some nits.

tallclair · 2017-08-09T23:08:17Z

staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go

+
+type cacheRecord struct {
+	user user.Info
+	ok   bool


nit: document what these fields are

tallclair · 2017-08-09T23:11:39Z

pkg/kubeapiserver/authenticator/config.go

 	"k8s.io/apiserver/pkg/authentication/token/tokenfile"
+	tokenunion "k8s.io/apiserver/pkg/authentication/token/union"


nit: I think you commits are out of order (not sure if you plan on squashing)

the order is correct... github actually displays them out of order... I was planning on leaving the components distinct

liggitt · 2017-08-10T05:10:47Z

/retest

liggitt · 2017-08-10T12:10:30Z

/retest

tallclair · 2017-08-10T21:23:20Z

/lgtm

k8s-github-robot · 2017-08-10T21:23:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liggitt, tallclair
We suggest the following additional approver: lavalamp

Assign the PR to them by writing /assign @lavalamp in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-08-11T21:14:04Z

Automatic merge from submit-queue (batch tested with PRs 49488, 50407, 46105, 50456, 50258)

seh · 2019-02-03T19:16:13Z

cmd/kube-apiserver/app/options/options_test.go

@@ -166,7 +166,9 @@ func TestAddFlagsFlag(t *testing.T) {
 			ServiceAccounts: &kubeoptions.ServiceAccountAuthenticationOptions{
 				Lookup: true,
 			},
-			TokenFile: &kubeoptions.TokenFileAuthenticationOptions{},
+			TokenFile:            &kubeoptions.TokenFileAuthenticationOptions{},
+			TokenSuccessCacheTTL: 10 * time.Second,


It doesn't look like there are any command-line flags on the API server to tune these two TTL parameters. Was that an anticipated feature to come? Was their omission deliberate?

There's already control via flags over the TTLs for "expensive" webhook token validation operations. There's no current plan to expose these as tunable settings.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 7, 2017

k8s-github-robot assigned caesarxuchao and lavalamp Aug 7, 2017

k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-label-needed labels Aug 7, 2017

liggitt assigned ericchiang, deads2k and tallclair and unassigned lavalamp and caesarxuchao Aug 7, 2017

k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Aug 7, 2017

liggitt added the area/security label Aug 7, 2017

liggitt force-pushed the token-cache branch from fe521d3 to 06bd9c1 Compare August 7, 2017 17:13

ericchiang reviewed Aug 7, 2017

View reviewed changes

liggitt force-pushed the token-cache branch from 06bd9c1 to 8e83e59 Compare August 7, 2017 18:24

liggitt added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Aug 7, 2017

liggitt force-pushed the token-cache branch from 8e83e59 to dc5df75 Compare August 7, 2017 19:10

mattmoyer reviewed Aug 7, 2017

View reviewed changes

deads2k reviewed Aug 7, 2017

View reviewed changes

tallclair reviewed Aug 7, 2017

View reviewed changes

liggitt force-pushed the token-cache branch 4 times, most recently from 215054a to b92efc3 Compare August 8, 2017 04:52

liggitt added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Aug 8, 2017

liggitt removed the release-note-label-needed label Aug 8, 2017

ericchiang reviewed Aug 8, 2017

View reviewed changes

mattmoyer reviewed Aug 9, 2017

View reviewed changes

tallclair reviewed Aug 9, 2017

View reviewed changes

liggitt added 4 commits August 9, 2017 23:36

Add token group adder component

15d8509

Add token cache component

1670ba5

Add union token authenticator

4fd8196

Simplify bearer token auth chain, cache successful authentications

0458fac

liggitt force-pushed the token-cache branch from b92efc3 to 0458fac Compare August 10, 2017 03:51

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2017

liggitt added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 10, 2017

k8s-github-robot merged commit 42adb9e into kubernetes:master Aug 11, 2017

liggitt deleted the token-cache branch August 12, 2017 01:50

seh reviewed Feb 3, 2019

View reviewed changes

		"k8s.io/apiserver/pkg/authentication/token/tokenfile"
		tokenunion "k8s.io/apiserver/pkg/authentication/token/union"

Enable caching successful token authentication #50258

Enable caching successful token authentication #50258

Conversation

liggitt commented Aug 7, 2017 • edited Loading

liggitt commented Aug 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Aug 7, 2017

mattmoyer left a comment

Choose a reason for hiding this comment

mattmoyer Aug 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattmoyer Aug 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Aug 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tallclair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Aug 8, 2017

ericchiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Aug 10, 2017 • edited Loading

Choose a reason for hiding this comment

tallclair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Aug 10, 2017 • edited Loading

Choose a reason for hiding this comment

liggitt commented Aug 10, 2017

liggitt commented Aug 10, 2017

tallclair commented Aug 10, 2017

k8s-github-robot commented Aug 10, 2017

k8s-github-robot commented Aug 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Aug 7, 2017 •

edited

Loading

mattmoyer Aug 7, 2017 •

edited

Loading

mattmoyer Aug 7, 2017 •

edited

Loading

liggitt Aug 8, 2017 •

edited

Loading

liggitt Aug 10, 2017 •

edited

Loading

liggitt Aug 10, 2017 •

edited

Loading