Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating apiserver to not return 500 when a service without pods is proxied #11076

Merged
merged 1 commit into from
Jul 24, 2015

Conversation

nikhiljindal
Copy link
Contributor

Fixes #11069

Updating the code to return a 503 when user tries to proxy a service, which doesnt have an rc backing it.

Before:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "no endpoints available for \"frontend\"",
  "code": 500
}

Now:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "no endpoints available for service \"frontend\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

@nikhiljindal
Copy link
Contributor Author

cc @a-robinson @lavalamp

@a-robinson
Copy link
Contributor

LGTM other than the choice of error code. From w3.org, 400 means "The request could not be understood by the server due to malformed syntax". I don't know that there's a clear choice, but something like a 409 or 412 might make more sense?

@k8s-bot
Copy link

k8s-bot commented Jul 10, 2015

GCE e2e build/test passed for commit 40a5a4766a63b8ead96c854dc264050a68a88007.

@lavalamp
Copy link
Member

Hm, bad request doesn't seem right, it's not the requester's fault that no pods are running.

@lavalamp
Copy link
Member

IRL discussion: 503 Service Unavailable seems best.

@a-robinson
Copy link
Contributor

It certainly can be the requester's fault that there aren't any pods running. If I create a service but it has a bad label selector on it, or I forgot to create pods for it, then it's my fault. It's also a case where simply retrying won't ever fix the problem, which is what 503 implies.

@nikhiljindal nikhiljindal force-pushed the apiError branch 2 times, most recently from 9ad9a4e to 4cb62f8 Compare July 10, 2015 21:16
@lavalamp
Copy link
Member

It can be, but it isn't necessarily. the person who made the service may not be the same person who made the http request.

Anyway, the fix to this does not involve changing the HTTP request, but instead some kind of cluster state. Therefore I think 5xx is correct.

@a-robinson
Copy link
Contributor

I'm sorry, but I'm really not a fan of this being a 503 response. It's not a server error at all, it's not due to "a temporary overloading or maintenance of the server", and in many cases (particularly that of an incorrect label selector) it's not going to be "a temporary condition which will be alleviated after some delay".

The fix to this doesn't involve changing the HTTP request, but it does involve changing server state by making other requests. 409 conflict somewhat matches up with this assumption, although I'm not sure there's anything that matches up perfectly. Really this is a matter of a failed precondition, but the HTTP 412 code refers to preconditions in the request header. Maybe 400 actually is the best choice for this.

@k8s-bot
Copy link

k8s-bot commented Jul 10, 2015

GCE e2e build/test passed for commit 9ad9a4ee99f2865666ea00b28457122ab2d0032f.

@k8s-bot
Copy link

k8s-bot commented Jul 10, 2015

GCE e2e build/test passed for commit 4cb62f8b240e724b34cb521df3b4da31b0998645.

@nikhiljindal
Copy link
Contributor Author

I am still working on getting the integration test to pass. Looks like this might miss 1.0, so we have more time to discuss the right response status :)

@lavalamp
Copy link
Member

409 conflict has a very specific meaning in our system (someone else modified the resource you're trying to modify), so I'm not a fan of reusing it for this.

412 Failed precondition is explicitly about the request header, as you note. We can't use that.

400 bad request is not true, because there's nothing wrong with the request.

IMO, if the solution to the problem involves a change to the HTTP request, then you need a 4xx error code. If it involves a change to the server (which IMO includes the relevant cluster state for this purpose), then the error code should be 5xx.

@smarterclayton can you break the tie here?

@erictune
Copy link
Member

Why can't it be 404?
"The requested resource could not be found but may be available again in
the future. Subsequent requests by the client are permissible"

On Fri, Jul 10, 2015 at 3:54 PM, Daniel Smith notifications@github.com
wrote:

409 conflict has a very specific meaning in our system (someone else
modified the resource you're trying to modify), so I'm not a fan of reusing
it for this.

412 Failed precondition is explicitly about the request header, as you
note. We can't use that.

400 bad request is not true, because there's nothing wrong with the
request.

IMO, if the solution to the problem involves a change to the HTTP request,
then you need a 4xx error code. If it involves a change to the server
(which IMO includes the relevant cluster state for this purpose), then the
error code should be 5xx.

@smarterclayton https://github.com/smarterclayton can you break the tie
here?


Reply to this email directly or view it on GitHub
#11076 (comment)
.

@lavalamp
Copy link
Member

Hm, OK, I could buy 404.

On Fri, Jul 10, 2015 at 3:57 PM, Eric Tune notifications@github.com wrote:

Why can't it be 404?
"The requested resource could not be found but may be available again in
the future. Subsequent requests by the client are permissible"

On Fri, Jul 10, 2015 at 3:54 PM, Daniel Smith notifications@github.com
wrote:

409 conflict has a very specific meaning in our system (someone else
modified the resource you're trying to modify), so I'm not a fan of
reusing
it for this.

412 Failed precondition is explicitly about the request header, as you
note. We can't use that.

400 bad request is not true, because there's nothing wrong with the
request.

IMO, if the solution to the problem involves a change to the HTTP
request,
then you need a 4xx error code. If it involves a change to the server
(which IMO includes the relevant cluster state for this purpose), then
the
error code should be 5xx.

@smarterclayton https://github.com/smarterclayton can you break the
tie
here?


Reply to this email directly or view it on GitHub
<
#11076 (comment)

.


Reply to this email directly or view it on GitHub
#11076 (comment)
.

@smarterclayton
Copy link
Contributor

503 is the canonical error for when a proxy has no available backends - if you set up Apache or HAProxy and they can't find a backend, they will default to 503. On the apiserver.... Since we are acting as a proxy for a service I would still say 503. If all the backends were down, it would be the same scenario as having none to start with. Part of the reasoning behind that is that the author expected there to be a backend (hence 404 is probably not relevant). we wouldn't want to flip between 404 and 503 based on whether there happen to be no pods now, because we don't know if there may be some in the future.

@a-robinson
Copy link
Contributor

Thanks for the explanation, @smarterclayton. SGTM

@a-robinson
Copy link
Contributor

LGTM, but @lavalamp knows this code much better than me

@nikhiljindal
Copy link
Contributor Author

We return 404 when the service doesnt exist. We want to return 503, when the service exists, but proxy fails because the pods are down or there are no pods.

@nikhiljindal
Copy link
Contributor Author

Fixed integration test. PTAL

@k8s-bot
Copy link

k8s-bot commented Jul 14, 2015

GCE e2e build/test passed for commit 3660faecad7ff6587088165d29934029b1c360c3.

@smarterclayton
Copy link
Contributor

LGTM

@lavalamp
Copy link
Member

Looks fine-- do we need this for 1.0?

@a-robinson
Copy link
Contributor

It's more of a nice-to-have than a need from my perspective. Depends on how important having the correct return codes in place is for the v1 API, though.

@davidopp
Copy link
Member

It's "service with no Pods matching the label selector" not "service without an rc", right? There's no check that there's an RC AFAICT.

@a-robinson
Copy link
Contributor

Yes, that's correct @davidopp

@nikhiljindal
Copy link
Contributor Author

Yes. The check is that endpoints.Subset is empty: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/registry/service/rest.go#L302
Subsets can be empty if the pods havent come up yet, or no one created the required pods.

@nikhiljindal nikhiljindal assigned bgrant0607 and unassigned lavalamp Jul 15, 2015
@bgrant0607 bgrant0607 changed the title Updating apiserver to not return 500 when a service without an rc is proxied Updating apiserver to not return 500 when a service without pods is proxied Jul 15, 2015

// StatusReasonServiceUnavailable means that the request itself was valid,
// but the requested service is unavailable at this time.
// Retrying the request after some time might succeed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

// Status code 503

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bgrant0607 bgrant0607 added this to the v1.0 milestone Jul 15, 2015
@bgrant0607 bgrant0607 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 15, 2015
@bgrant0607
Copy link
Member

One nit, but LGTM. I think this is reasonable to merge for 1.0 if it helps our monitoring, e2e tests, or something. If we do merge it, please submit a cherrypick PR to the release branch.

@k8s-bot
Copy link

k8s-bot commented Jul 16, 2015

GCE e2e build/test passed for commit 1517b66.

@nikhiljindal
Copy link
Contributor Author

cc @vishh for merging

@vishh
Copy link
Contributor

vishh commented Jul 23, 2015

@k8s-bot: ok to test

@vishh
Copy link
Contributor

vishh commented Jul 23, 2015

@nikhiljindal: I am restarting the tests. Will merge once they are green again.

@k8s-bot
Copy link

k8s-bot commented Jul 23, 2015

GCE e2e build/test failed for commit 1517b66.

@nikhiljindal
Copy link
Contributor Author

e2e failed as it could not fetch the repo.
Trying again.
@k8s-bot ok to test

@k8s-bot
Copy link

k8s-bot commented Jul 23, 2015

GCE e2e build/test passed for commit 1517b66.

@nikhiljindal
Copy link
Contributor Author

The first run of shippable passed. The second one seems a flake. Rerunning.

@nikhiljindal nikhiljindal reopened this Jul 24, 2015
vishh added a commit that referenced this pull request Jul 24, 2015
Updating apiserver to not return 500 when a service without pods is proxied
@vishh vishh merged commit 1a908a2 into kubernetes:master Jul 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants