-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Server returns HTTP 200 OK on "too old resource version" errors #35068
Comments
Looks like a straight bug to me. |
Another thing that I find odd is that the resource version will remain too old and there is no way to retrieve a resource version that is current and will work with a future watch request. Hours later and no changes to the nginx endpoints results in the following: Get the current nginx endpoint and capture the resource version (250647):
Start a watch using the resource version (250647):
Is there another way to get the current resource version? My current workaround is to check if the type is ERROR and restart the watch at 0. This workaround has the drawback that you'll need to ignore the first response with type ADDED or else you'll end up in a fast loop and ddos the API server. |
Is this a bug in the |
Oh, wait a second. I forgot how this worked. The 200 is because the connection was established; it did that before getting the error from etcd. There (unfortunately) has to be two distinct error mechanisms for streaming connections, since you only get one chance to return a status code but an error can happen at any point. It looks bad when an error is the first thing returned, but IIRC changing this is actually technically difficult. And since clients need to handle errors in this form anyway, it may actually be a good thing that it's easy to produce one, so people will be surprised long before they get to production. |
Just stumbled on this issue. The resourceVersion starting at a point that is too old is addressed in As far as watching from the current resourceVersion, the resourceVersion returned in lists is always the current resourceVersion. Setting resourceVersion=0 in the watch also works. |
I think this is fixed by #25369 |
Due to a regression in some versions of Kubernetes (kubernetes/kubernetes#35068), the "resource version too old" watch event sometimes has HTTP status code 200, rather than status code 410. This event is not a fatal error and simply indicates that a watch should be restarted – k8s will fire this event for any watch that has been open for longer than thirty minutes. In Linkerd,`Watchable` currently detects this event by matching the HTTP status code of the watch event, and restarts the watch when it occurs. However, when the event is fired with the incorrect status code, the error is not handled in `Watchable` and passed downstream – in the case of issue #1636, to `EndpointsNamer`, which does not know how to handle this event. This leads to namers intermittently failing to resolve k8s endpoints. Although this issue seems to have been fixed upstream in kubernetes/kubernetes#25369, many users of Linkerd are running versions of Kubernetes where it still occurs. Therefore, I've added a workaround in `Watchable` to detect "resource version too old" events with status code 200 and restart the watch rather than passing these events downstream. When this occurs, Linkerd now logs a warning indicating that, although the error was handled, Kubernetes behaved erroneously. I've added a test to `v1.ApiTest` that replicates the Kubernetes bug. Fixes #1636
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
References: * ManageIQ/kubeclient#452 (comment) * fabric8io/kubernetes-client#1800 (comment) * kubernetes/kubernetes#25151 (comment) * kubernetes/kubernetes#35068 (comment) * https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/test/e2e/apimachinery/protocol.go#L68-L75 * https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses * https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes * https://www.baeldung.com/java-kubernetes-watch#1-resource-versions
References: * ManageIQ/kubeclient#452 (comment) * fabric8io/kubernetes-client#1800 (comment) * kubernetes/kubernetes#25151 (comment) * kubernetes/kubernetes#35068 (comment) * https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/test/e2e/apimachinery/protocol.go#L68-L75 * https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses * https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes * https://www.baeldung.com/java-kubernetes-watch#1-resource-versions
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
Kubernetes version (use
kubectl version
):Environment:
What happened:
When the resource version is too old the API server returns HTTP 200 OK.
What you expected to happen:
When the resource version is too old the API server returns HTTP 410 to match the error code in the response body.
How to reproduce it (as minimally and precisely as possible):
Anything else do we need to know:
No.
The text was updated successfully, but these errors were encountered: