Kubectl API discovery caching may be up to 10 minutes stale #47977

mbohlool · 2017-06-23T18:04:04Z

kubectl caches API discovery results without being aware of its changes. Registering/Removing a new API service with api aggregator results in a change in API discovery but kubectl has no way to know the result is changed.

Possible solutions:

short term - implement e-tag for API discovery
long term - move to OpenAPI for api discovery

This is possibly a release blocker.

cc @lavalamp @pwittrock @dchen1107

@kubernetes/sig-cli-misc @kubernetes/sig-api-machinery-misc

liggitt · 2017-06-23T18:12:34Z

caches have a lifetime of 10 minutes, and can be cleared client-side.

it's not perfect, but this is no different than how kubectl behaved with dynamic discovery changes due to registered/unregistered TPRs in prior releases, so I don't consider it a release blocker.

deads2k · 2017-06-23T18:36:16Z

What are you doing to trigger this problems? kubectl get foo check its cache, fails, re-primes the cache, checks again, succeeds, all silently to the user.

madhusudancs · 2017-06-23T19:20:09Z

What should other clients that are not kubectl do? This is a breaking change for kubefed for example. I am going to add the release-blocker label back, until we decide it is not for clients that are not kubectl.

madhusudancs · 2017-06-23T19:22:34Z

Btw, I forgot to mention that kubefed uses kubectl's cmd libraries.

liggitt · 2017-06-23T19:30:27Z

again, this behavior is no different than how prior releases handled dynamically added/removed APIs

dchen1107 · 2017-06-23T19:38:27Z

@liggitt Do we understand why this cause ci-kubernetes-e2e-gce-federation failing #47737?

liggitt · 2017-06-23T19:51:25Z

To my knowledge, nothing has changed in this area anytime recently. Didn't the Federation job just start failing a few days ago?

caesarxuchao · 2017-06-23T20:04:58Z

If the autoregister-completion health check is not done, the /healthz will return 500. can we let kubectl check the /healthz before visiting the discovery endpoints?

I0623 02:53:30.095497       5 wrap.go:42] GET /healthz: (571.071µs) 500
...[-]autoregister-completion failed: reason withheld\n

deads2k · 2017-06-23T21:11:57Z

This seems like much ado about a problem we've had since 1.2 (I think) when we got TPRs. The discovery doc changes in response to those and clients already had to deal with it. The same thing happens on upgrades and even server restarts since we added API groups years ago.

Claiming this as a new problem is a mischaracterisation since we've had it for years. In addition, the exact case described here is incorrect (kubectl live checks on a miss).

dchen1107 · 2017-06-23T21:22:40Z

@deads2k I am not sure about when we introduced the issue here. But what you described for upgrade is matching what I observed during upgrade test debugging for 1.7.

The only reason we mark this as the release blocker is the federation team think this is the root cause for #47737. We need to find the solution / workaround / document as the known issue for federation / upgrade / etc.. The long term fixing is not required here.

deads2k · 2017-06-23T22:46:44Z

@deads2k I am not sure about when we introduced the issue here. But what you described for upgrade is matching what I observed during upgrade test debugging for 1.7.

@dchen1107 Then I would avoid trying to make rushed API changes to discovery or behavior modifications to kubectl. The aggregation behavior people seem focused on merged in March with this pull here #42911 .

lavalamp · 2017-06-24T00:24:15Z

This is a behavior change on the server that exposes a caching bug (which itself is probably both a server and client bug).

There is no reason for apiserver to return a partial list of built-in resources at any point in its startup stack. I consider that a bug, it is exposing internal setup details to the world for no good reason. Since it is breaking clients and we aren't sure the etag is set correctly to allow caches to work, I think we need to fix the server bug.

Caching for X amount of time is also clearly really wrong now that we are super dynamic, the server must present an ETag and the client needs to use the If-None-Match header to do it right.

mbohlool · 2017-06-26T20:35:25Z

In the Burndown meeting, the decision was not to fix API server, only document it as a known issue.

liggitt · 2017-06-26T20:42:12Z

the kubectl and kubefed bugs related to cache usage will be fixed in 1.7 (and picked back to 1.6.x)

kubectl:
master: #48016 (merged)
1.7 pick: #48067 (merged)
1.6 pick: #48070 (merged)

kubefed:
master: #48077 (merged)
1.7 pick: #48100 (merged)
1.6 pick: #48101 (merged)

dchen1107 · 2017-06-26T21:05:07Z

xref: v1.7 known issue: #46733

Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823) Retry finding RBAC version if not found in discovery cache Alternate to #47995 xref #47977 The caching discovery client can indicate whether it used fresh discovery data. `kubefed init` should invalidate and recheck if it doesn't find an RBAC API group ```release-note `kubefed init` correctly checks for RBAC API enablement. ```

marun · 2017-06-28T03:35:17Z

@liggitt Given that the pr's you've linked to have all merged, should this issue continue blocking 1.7? If not, please remove the 'approved-for-milestone' label.

liggitt · 2017-06-28T03:43:23Z

The kubectl api-versions and kubefed cache usage has been fixed for 1.7

liggitt · 2017-06-29T04:57:48Z

issue tracking caching headers / etag support is #44957

mbohlool added kind/bug Categorizes issue or PR as related to a bug. release-blocker sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Jun 23, 2017

mbohlool added this to the v1.7 milestone Jun 23, 2017

liggitt removed the release-blocker label Jun 23, 2017

liggitt removed this from the v1.7 milestone Jun 23, 2017

liggitt changed the title ~~Kubectl API discovery caching may not be valid in a kube-aggregator world~~ Kubectl API discovery caching may be up to 10 minutes stale Jun 23, 2017

mbohlool mentioned this issue Jun 23, 2017

WIP: Add eTag support to API discovery #47979

Closed

csbell mentioned this issue Jun 23, 2017

[Federation] Account for caching in kubectl #47980

Merged

madhusudancs added the release-blocker label Jun 23, 2017

dchen1107 assigned pwittrock Jun 23, 2017

dchen1107 added this to the v1.7 milestone Jun 23, 2017

dchen1107 added approved-for-milestone priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 23, 2017

lavalamp assigned mbohlool and unassigned pwittrock Jun 24, 2017

dchen1107 mentioned this issue Jun 24, 2017

ci-kubernetes-e2e-gce-federation started constantly failing since build 4503 (last Friday night) #47737

Closed

madhusudancs mentioned this issue Jun 24, 2017

Stop relying on local API discovery cache. #47995

Closed

mbohlool mentioned this issue Jun 25, 2017

Do not return partial list of API/Groups on startup #48026

Closed

This was referenced Jun 26, 2017

Fix kubectl api-versions caching #48016

Merged

Retry finding RBAC version if not found in discovery cache #48077

Merged

marun added the status/in-review label Jun 26, 2017

mbohlool mentioned this issue Jun 26, 2017

Always return delegated server's API group in API discovery #48109

Closed

liggitt closed this as completed Jun 28, 2017

jessesuen mentioned this issue May 8, 2018

Kubernetes API resource discovery needs to be cached argoproj/argo-cd#170

Closed

ellieayla mentioned this issue Jul 3, 2019

Kubectl takes a long time to execute #79691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubectl API discovery caching may be up to 10 minutes stale #47977

Kubectl API discovery caching may be up to 10 minutes stale #47977

mbohlool commented Jun 23, 2017 •

edited

Loading

liggitt commented Jun 23, 2017 •

edited

Loading

deads2k commented Jun 23, 2017

madhusudancs commented Jun 23, 2017

madhusudancs commented Jun 23, 2017

liggitt commented Jun 23, 2017 •

edited

Loading

dchen1107 commented Jun 23, 2017

liggitt commented Jun 23, 2017

caesarxuchao commented Jun 23, 2017

deads2k commented Jun 23, 2017

dchen1107 commented Jun 23, 2017

deads2k commented Jun 23, 2017

lavalamp commented Jun 24, 2017

mbohlool commented Jun 26, 2017 •

edited

Loading

liggitt commented Jun 26, 2017 •

edited

Loading

dchen1107 commented Jun 26, 2017

marun commented Jun 28, 2017

liggitt commented Jun 28, 2017 •

edited

Loading

liggitt commented Jun 29, 2017

Kubectl API discovery caching may be up to 10 minutes stale #47977

Kubectl API discovery caching may be up to 10 minutes stale #47977

Comments

mbohlool commented Jun 23, 2017 • edited Loading

liggitt commented Jun 23, 2017 • edited Loading

deads2k commented Jun 23, 2017

madhusudancs commented Jun 23, 2017

madhusudancs commented Jun 23, 2017

liggitt commented Jun 23, 2017 • edited Loading

dchen1107 commented Jun 23, 2017

liggitt commented Jun 23, 2017

caesarxuchao commented Jun 23, 2017

deads2k commented Jun 23, 2017

dchen1107 commented Jun 23, 2017

deads2k commented Jun 23, 2017

lavalamp commented Jun 24, 2017

mbohlool commented Jun 26, 2017 • edited Loading

liggitt commented Jun 26, 2017 • edited Loading

dchen1107 commented Jun 26, 2017

marun commented Jun 28, 2017

liggitt commented Jun 28, 2017 • edited Loading

liggitt commented Jun 29, 2017

mbohlool commented Jun 23, 2017 •

edited

Loading

liggitt commented Jun 23, 2017 •

edited

Loading

liggitt commented Jun 23, 2017 •

edited

Loading

mbohlool commented Jun 26, 2017 •

edited

Loading

liggitt commented Jun 26, 2017 •

edited

Loading

liggitt commented Jun 28, 2017 •

edited

Loading