Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API server should limit the number of concurrent requests it processes #5866

Closed
fabioy opened this issue Mar 24, 2015 · 11 comments · Fixed by #6207
Closed

API server should limit the number of concurrent requests it processes #5866

fabioy opened this issue Mar 24, 2015 · 11 comments · Fixed by #6207
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@fabioy
Copy link
Contributor

fabioy commented Mar 24, 2015

Related with issue #5865, the API server should have a cap on the number of outstanding operations/requests it's handling, and return an error otherwise. A simple semaphore is an idea for this.

@fabioy fabioy added team/master priority/backlog Higher priority than priority/awaiting-more-evidence. labels Mar 24, 2015
@ghost
Copy link

ghost commented Mar 24, 2015

+1 !

@fabioy
Copy link
Contributor Author

fabioy commented Mar 24, 2015

The good news/bad news on this is that there is a rate limiter in the API server (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/apiserver/handlers.go#L70), but the defaults for it seem too high: 10 qps average, 200 burst (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cmd/kube-apiserver/app/server.go).

We could start by lowering the burst level to something smaller, like 30. It may be interesting to also differentiate between GET and PUT/POST requests.

@ghost
Copy link

ghost commented Mar 24, 2015

I think that we need limits on the number of outstanding requests, not the number of QPS.

@fabioy
Copy link
Contributor Author

fabioy commented Mar 24, 2015

@quinton-hoole -> An max outstanding would be useful. But it may force us to have separate buckets for different types of requests, i.e. a small number of long running requests causing API server to reject short requests.

@smarterclayton -> Heard that you might be looking into adding support for flow control (or something like it?). We may need it soon to help avoid bugs like runaway node status updates (see issue #5864).

@smarterclayton
Copy link
Contributor

I had not - the things we were looking at were the bastion changes to apiserver which will break logs and proxying into their own rest resources so they can be granularly controlled. Being able to timebox and classify requests differently would be good.

----- Original Message -----

@quinton-hoole -> An max outstanding would be useful. But it may force us to
have separate buckets for different types of requests, i.e. a small number
of long running requests causing API server to reject short requests.

@smarterclayton -> Heard that you might be looking into adding support for
flow control (or something like it?). We may need it soon to help avoid bugs
like runaway node status updates (see issue #5864).


Reply to this email directly or view it on GitHub:
#5866 (comment)

@fabioy
Copy link
Contributor Author

fabioy commented Mar 24, 2015

@smarterclayton - Ah, np, I probably misheard.

@gmarek
Copy link
Contributor

gmarek commented Apr 7, 2015

What are our plans for handling this client side? Do we want clients to handle "you're being throttled" error themselves, or do we plan to handle it in the client library?

I'm asking because currently kubelet does not have a 'short' retry loop, hence it tries to report NodeStatus every X (2) seconds, whether it succeed or not. Neither we have a sidechannel for heartbeat messages.

This would mean, that if API server will be under heavy load, and with not-negligible probability NodeStatus reporting will be throttled, we may incorrectly mark a Node as unreachable. Currently it's not a huge problem, because X << Y, where Y is the time we wait before we decide a Node is unreachable. But because of the effort somewhere close to #5864 it'll probably get worse.

@gmarek gmarek reopened this Apr 7, 2015
@smarterclayton
Copy link
Contributor

The client library should have options for retry on client.Config which include the ability to disable the behavior. Automatic retry is probably an "opt-out" behavior.

----- Original Message -----

What are our plans for handling this client side? Do we want clients to
handle "you're being throttled" error themselves, or do we plan to handle it
in the client library?

I'm asking because currently kubelet does not have a 'short' retry loop,
hence it tries to report NodeStatus every X (2) seconds, whether it succeed
or not. Neither we have a sidechannel for heartbeat messages.

This would mean, that if API server will be under heavy load, and with
not-negligible probability NodeStatus reporting will be throttled, we may
incorrectly mark a Node as unreachable. Currently it's not a huge problem,
because X << Y, where Y is the time we wait before we decide a Node is
unreachable. But because of the effort somewhere close to #5864 it'll
probably get worse.


Reply to this email directly or view it on GitHub:
#5866 (comment)

@davidopp
Copy link
Member

davidopp commented Apr 7, 2015

@lavalamp says there are people working on rate limiting client connections to API server.

@satnam6502
Copy link
Contributor

I have in the past adjusted the rate limiter to support the creation of large clusters. I also added the Retry-After logic. If you adjust the rate limiter settings, please make sure it is still possible to create large clusters.

@brendandburns
Copy link
Contributor

I'm closing this as fixed, given the original subject. If we want to expand client-side support for retry, please open a dedicated issue. (Also note that QPS based throttling was added to the client recently too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants