-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating apiserver to not return 500 when a service without pods is proxied #11076
Conversation
LGTM other than the choice of error code. From w3.org, 400 means "The request could not be understood by the server due to malformed syntax". I don't know that there's a clear choice, but something like a 409 or 412 might make more sense? |
GCE e2e build/test passed for commit 40a5a4766a63b8ead96c854dc264050a68a88007. |
Hm, bad request doesn't seem right, it's not the requester's fault that no pods are running. |
IRL discussion: 503 Service Unavailable seems best. |
It certainly can be the requester's fault that there aren't any pods running. If I create a service but it has a bad label selector on it, or I forgot to create pods for it, then it's my fault. It's also a case where simply retrying won't ever fix the problem, which is what 503 implies. |
9ad9a4e
to
4cb62f8
Compare
It can be, but it isn't necessarily. the person who made the service may not be the same person who made the http request. Anyway, the fix to this does not involve changing the HTTP request, but instead some kind of cluster state. Therefore I think 5xx is correct. |
I'm sorry, but I'm really not a fan of this being a 503 response. It's not a server error at all, it's not due to "a temporary overloading or maintenance of the server", and in many cases (particularly that of an incorrect label selector) it's not going to be "a temporary condition which will be alleviated after some delay". The fix to this doesn't involve changing the HTTP request, but it does involve changing server state by making other requests. 409 conflict somewhat matches up with this assumption, although I'm not sure there's anything that matches up perfectly. Really this is a matter of a failed precondition, but the HTTP 412 code refers to preconditions in the request header. Maybe 400 actually is the best choice for this. |
GCE e2e build/test passed for commit 9ad9a4ee99f2865666ea00b28457122ab2d0032f. |
GCE e2e build/test passed for commit 4cb62f8b240e724b34cb521df3b4da31b0998645. |
I am still working on getting the integration test to pass. Looks like this might miss 1.0, so we have more time to discuss the right response status :) |
409 conflict has a very specific meaning in our system (someone else modified the resource you're trying to modify), so I'm not a fan of reusing it for this. 412 Failed precondition is explicitly about the request header, as you note. We can't use that. 400 bad request is not true, because there's nothing wrong with the request. IMO, if the solution to the problem involves a change to the HTTP request, then you need a 4xx error code. If it involves a change to the server (which IMO includes the relevant cluster state for this purpose), then the error code should be 5xx. @smarterclayton can you break the tie here? |
Why can't it be 404? On Fri, Jul 10, 2015 at 3:54 PM, Daniel Smith notifications@github.com
|
Hm, OK, I could buy 404. On Fri, Jul 10, 2015 at 3:57 PM, Eric Tune notifications@github.com wrote:
|
503 is the canonical error for when a proxy has no available backends - if you set up Apache or HAProxy and they can't find a backend, they will default to 503. On the apiserver.... Since we are acting as a proxy for a service I would still say 503. If all the backends were down, it would be the same scenario as having none to start with. Part of the reasoning behind that is that the author expected there to be a backend (hence 404 is probably not relevant). we wouldn't want to flip between 404 and 503 based on whether there happen to be no pods now, because we don't know if there may be some in the future. |
Thanks for the explanation, @smarterclayton. SGTM |
LGTM, but @lavalamp knows this code much better than me |
We return 404 when the service doesnt exist. We want to return 503, when the service exists, but proxy fails because the pods are down or there are no pods. |
Fixed integration test. PTAL |
GCE e2e build/test passed for commit 3660faecad7ff6587088165d29934029b1c360c3. |
LGTM |
Looks fine-- do we need this for 1.0? |
It's more of a nice-to-have than a need from my perspective. Depends on how important having the correct return codes in place is for the v1 API, though. |
It's "service with no Pods matching the label selector" not "service without an rc", right? There's no check that there's an RC AFAICT. |
Yes, that's correct @davidopp |
Yes. The check is that endpoints.Subset is empty: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/registry/service/rest.go#L302 |
|
||
// StatusReasonServiceUnavailable means that the request itself was valid, | ||
// but the requested service is unavailable at this time. | ||
// Retrying the request after some time might succeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add:
// Status code 503
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
One nit, but LGTM. I think this is reasonable to merge for 1.0 if it helps our monitoring, e2e tests, or something. If we do merge it, please submit a cherrypick PR to the release branch. |
GCE e2e build/test passed for commit 1517b66. |
cc @vishh for merging |
@k8s-bot: ok to test |
@nikhiljindal: I am restarting the tests. Will merge once they are green again. |
GCE e2e build/test failed for commit 1517b66. |
e2e failed as it could not fetch the repo. |
GCE e2e build/test passed for commit 1517b66. |
The first run of shippable passed. The second one seems a flake. Rerunning. |
Updating apiserver to not return 500 when a service without pods is proxied
Fixes #11069
Updating the code to return a 503 when user tries to proxy a service, which doesnt have an rc backing it.
Before:
Now: