-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a limit to the number of in-flight requests that a server processes. #6207
Conversation
What happens when all the inflight requests are consumed by watches? Starvation? |
yes. I suppose we might want to have a separate counter for watches? |
Or we could just ignore watches for now. We managed to DOS an apiserver out of file descriptors by spamming it. That's bad :P hence this PR. |
We are already talking about pushing some operations deeper into apiserver, so if it's high level for now when those other steps complete it will be easier to move this down. |
@@ -66,6 +66,22 @@ func ReadOnly(handler http.Handler) http.Handler { | |||
}) | |||
} | |||
|
|||
// MaxInFlight limits the number of in flight requests to the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
How do we monitor the number of inflight requests over time? I'm guessing that the current answer is "we don't", in which case please add an issue requesting this, at least. |
Assigning to @quinton-hoole, feel free to reassign. |
Comments addressed, and #6227 filed for the issue Quinton asked for. |
Getting back to smarterclayton@ 's comment above, surely with the low default that you've set to 20 inflight requests in this PR, watch requests will pretty much always use up all of those, and we'll just serve errors for everything else? Am I missing something? |
Also clients can open watches pretty easily, so suddenly our watch could could be 50-100 ----- Original Message -----
|
One option would be to set the default limit closer to the default per-process file handle limit of 1024. A better solution, as you mentioned, would be to split the limits for watches and other stuff into separate counters. |
+1 to separate buckets for long-running requests vs. short ones. May also be worth having separate buckets for different operation types that have different dependencies -- e.g. reading from etcd vs. writing? Is there a separate issue to call go's equivalent of setrlimit(RLIMIT_NOFILE) to something higher? filedescs are relatively cheap and 1024 is tiny. See golang/go#5949 for sample code? |
Ok, added a regexp to not count |
Please take another look. |
block.Add(1) | ||
sem := make(chan bool, Iterations) | ||
|
||
re := regexp.MustCompile("[.*\\/watch][^\\/proxy.*]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v1beta3 we are not going to be using /watch - we'll be using a query param on the list response. I don't know where that leaves us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the implementation so the test passes when that goes live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its there now, it's just no one is using it yet. It doesn't have to block this, but it's not clear to me how we tell that.
Fixed gofmt issues (and installed the git hooks on my new machine... ;) |
Travis is still whining about a build failure. I don't know enough about Travis yet to know whether or not we can trust it's judgement. Other than that, I think that this can be merged. |
Travis kicked. |
Travis still complaining :-( |
PS: zmerlynn@ when you say "Travis kicked", what exactly do you kick? |
There's a restart build button in the travis UI if you have commit writes on the underlying repo
|
Travis is now happy. I can haz mergez? |
(and you need to login to travis via github) |
Add a limit to the number of in-flight requests that a server processes.
Can folks please label this as perf for tracking. |
@timothysc done. |
Sorry - we have a holiday in Poland today - will take a look tomorrow. |
Side effect of this PR is that density test is creating pods much slower than earlier (~6 pods per second). For 50 node cluster and rc with 1500 replicas (30 pods per node) this gives 300 seconds just to create rc. |
Actually my previous comment should be in #6203 |
@fgrzadkowski thx, I figured this would be the case, we're currently working through issues with running load-balanced apiservers as a result. @abhgupta, @smarterclayton, @rrati ^ FYI. |
related ???
|
The cluster response curve from *this & #6203 are bad, imho this is not the correct route to take, or at least default off until good defaults can be found. Time to fill a pool has regressed sharply. ~2.3 pods per second which is a fairly drastic decrease from about a week ago. |
@timothysc - I agree. I observed a very similar scheduling rate and indeed it's much worse than before it. |
@smarterclayton @alex-mohr @fabioy @quinton-hoole
Closes #5866