Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubectl API discovery caching may be up to 10 minutes stale #47977

Closed
mbohlool opened this issue Jun 23, 2017 · 18 comments
Closed

Kubectl API discovery caching may be up to 10 minutes stale #47977

mbohlool opened this issue Jun 23, 2017 · 18 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-blocker sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Milestone

Comments

@mbohlool
Copy link
Contributor

mbohlool commented Jun 23, 2017

kubectl caches API discovery results without being aware of its changes. Registering/Removing a new API service with api aggregator results in a change in API discovery but kubectl has no way to know the result is changed.

Possible solutions:

short term - implement e-tag for API discovery
long term - move to OpenAPI for api discovery

This is possibly a release blocker.

cc @lavalamp @pwittrock @dchen1107

@kubernetes/sig-cli-misc @kubernetes/sig-api-machinery-misc

@mbohlool mbohlool added kind/bug Categorizes issue or PR as related to a bug. release-blocker sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Jun 23, 2017
@mbohlool mbohlool added this to the v1.7 milestone Jun 23, 2017
@liggitt
Copy link
Member

liggitt commented Jun 23, 2017

caches have a lifetime of 10 minutes, and can be cleared client-side.

it's not perfect, but this is no different than how kubectl behaved with dynamic discovery changes due to registered/unregistered TPRs in prior releases, so I don't consider it a release blocker.

@liggitt liggitt removed this from the v1.7 milestone Jun 23, 2017
@liggitt liggitt changed the title Kubectl API discovery caching may not be valid in a kube-aggregator world Kubectl API discovery caching may be up to 10 minutes stale Jun 23, 2017
@deads2k
Copy link
Contributor

deads2k commented Jun 23, 2017

What are you doing to trigger this problems? kubectl get foo check its cache, fails, re-primes the cache, checks again, succeeds, all silently to the user.

@madhusudancs
Copy link
Contributor

What should other clients that are not kubectl do? This is a breaking change for kubefed for example. I am going to add the release-blocker label back, until we decide it is not for clients that are not kubectl.

@madhusudancs
Copy link
Contributor

Btw, I forgot to mention that kubefed uses kubectl's cmd libraries.

@liggitt
Copy link
Member

liggitt commented Jun 23, 2017

again, this behavior is no different than how prior releases handled dynamically added/removed APIs

@dchen1107
Copy link
Member

@liggitt Do we understand why this cause ci-kubernetes-e2e-gce-federation failing #47737?

@liggitt
Copy link
Member

liggitt commented Jun 23, 2017

To my knowledge, nothing has changed in this area anytime recently. Didn't the Federation job just start failing a few days ago?

@caesarxuchao
Copy link
Member

If the autoregister-completion health check is not done, the /healthz will return 500. can we let kubectl check the /healthz before visiting the discovery endpoints?

I0623 02:53:30.095497       5 wrap.go:42] GET /healthz: (571.071µs) 500
...[-]autoregister-completion failed: reason withheld\n

@dchen1107 dchen1107 added this to the v1.7 milestone Jun 23, 2017
@dchen1107 dchen1107 added approved-for-milestone priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 23, 2017
@deads2k
Copy link
Contributor

deads2k commented Jun 23, 2017

This seems like much ado about a problem we've had since 1.2 (I think) when we got TPRs. The discovery doc changes in response to those and clients already had to deal with it. The same thing happens on upgrades and even server restarts since we added API groups years ago.

Claiming this as a new problem is a mischaracterisation since we've had it for years. In addition, the exact case described here is incorrect (kubectl live checks on a miss).

@dchen1107
Copy link
Member

@deads2k I am not sure about when we introduced the issue here. But what you described for upgrade is matching what I observed during upgrade test debugging for 1.7.

The only reason we mark this as the release blocker is the federation team think this is the root cause for #47737. We need to find the solution / workaround / document as the known issue for federation / upgrade / etc.. The long term fixing is not required here.

@deads2k
Copy link
Contributor

deads2k commented Jun 23, 2017

@deads2k I am not sure about when we introduced the issue here. But what you described for upgrade is matching what I observed during upgrade test debugging for 1.7.

@dchen1107 Then I would avoid trying to make rushed API changes to discovery or behavior modifications to kubectl. The aggregation behavior people seem focused on merged in March with this pull here #42911 .

@lavalamp lavalamp assigned mbohlool and unassigned pwittrock Jun 24, 2017
@lavalamp
Copy link
Member

This is a behavior change on the server that exposes a caching bug (which itself is probably both a server and client bug).

There is no reason for apiserver to return a partial list of built-in resources at any point in its startup stack. I consider that a bug, it is exposing internal setup details to the world for no good reason. Since it is breaking clients and we aren't sure the etag is set correctly to allow caches to work, I think we need to fix the server bug.

Caching for X amount of time is also clearly really wrong now that we are super dynamic, the server must present an ETag and the client needs to use the If-None-Match header to do it right.

@mbohlool
Copy link
Contributor Author

mbohlool commented Jun 26, 2017

In the Burndown meeting, the decision was not to fix API server, only document it as a known issue.

@liggitt
Copy link
Member

liggitt commented Jun 26, 2017

the kubectl and kubefed bugs related to cache usage will be fixed in 1.7 (and picked back to 1.6.x)

kubectl:
master: #48016 (merged)
1.7 pick: #48067 (merged)
1.6 pick: #48070 (merged)

kubefed:
master: #48077 (merged)
1.7 pick: #48100 (merged)
1.6 pick: #48101 (merged)

@dchen1107
Copy link
Member

xref: v1.7 known issue: #46733

k8s-github-robot pushed a commit that referenced this issue Jun 26, 2017
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)

Retry finding RBAC version if not found in discovery cache

Alternate to #47995

xref #47977

The caching discovery client can indicate whether it used fresh discovery data. `kubefed init` should invalidate and recheck if it doesn't find an RBAC API group

```release-note
`kubefed init` correctly checks for RBAC API enablement.
```
@marun
Copy link
Contributor

marun commented Jun 28, 2017

@liggitt Given that the pr's you've linked to have all merged, should this issue continue blocking 1.7? If not, please remove the 'approved-for-milestone' label.

@liggitt liggitt closed this as completed Jun 28, 2017
@liggitt
Copy link
Member

liggitt commented Jun 28, 2017

The kubectl api-versions and kubefed cache usage has been fixed for 1.7

@liggitt
Copy link
Member

liggitt commented Jun 29, 2017

issue tracking caching headers / etag support is #44957

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-blocker sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants