-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make 99% of API calls return in less than 1s; constant time to number of nodes and pods #4521
Comments
I'm looking at slow 'get pods' call #4196 and have a few observations about API server performance. I added a doc about turning on profiling to docs/devel, which should be helpful for trying to find bottlenecks. Sadly AFAICT most of the time is spent in YAML handling, which I guess means that we need to reduce number of etcd entries parsed in normal workflow, try to parallelize some of them, write custom YAML parser or change the way things are stored in etcd. As first two options have no impact on things outside the master I'm trying to apply them to get pods call. I've already tried parallelizing, but it do not give big improvement. Also I'm sometimes seeing slow response from server when checking metrics. Most of the times it's instantaneous, but once in a while it takes few seconds. I don't know yet how metrics are stored/computed, but (I guess) it's not a master API call. This could mean that there may be a problem with HTTP server being slow/overcommited from time to time. |
There's an issue with this goal, namely there's no machine on which API server should have this <1sec. performance. It is important, as on GCE by default we're running on single threaded CPU, hence there's not a lot of space for parallelization of anything. |
Even though it's single-core, parellelizing should still help if there's network access. I think I already did that before I left a month ago in the places it was appropriate, though. ...also it might be worth double checking if we ever allow more than one thread by setting GOMAXPROCS. |
@lavalamp No, we haven't measured. It's possible that we already meet the goal. Also, as you noted, the goal here is under-defined, as it doesn't specify a workload. (I beileve @satnam6502 is working on workload generators.) So, part of this issue is to define a workload. |
I added parallel decoding of data extracted from etcd for get pods, as it takes 1.5 to 6 sec when 50 pods are present (it's more like 1.2-4 with an empty cluster). It did not help. Additionally I traced only APIserver part of execution. |
Rephrasing my previous comment: We should agree on some golden master/node configuration. I think it's kind of important, as if we assume that master will be running on single thread, the architecture may be quite different, than if we assume multithreaded one. In addition if we're aiming at multithreaded master I believe that we should add a second thread to the default configuration of the master machine for each provider, to make testing more realistic. Similar thing applies to node machines. My personal opinion is that we should assume that both master and kubelet are running on multithreaded machines, and modify our default accordingly. |
As per @gmarek's comment it seems reasonable to provide a minimum spec of X cores for our SLO of Y. It does seem unreasonable to expect a single core machine to run 50 pods (of any size) in a timely manner. |
When I was last doing experiments I used n n1-highmem-16 master and I also tried a n1-highcpu-16 master but still ran into lots of issues -- perhaps because our API server needs to be (more?) multi-threaded? |
OK, I see we don't actually use GOMAXPROCS outside of our tests. I'm making a PR to add that in, so at least we'll use more cores if we have them. |
When testing some change I run some experiments to see how far from this goal we are in two configurations:
I'm sorry for the format, but GitHub does not allow pdfs:/ All results in seconds. 1 master (highcpu-16), 4 nodes (standard-2), 50 1-pod rc
1 master (highcpu-4), 10 nodes (standard-1), 200 1-pod rc
|
@bprashanth is working on moving replication controller status computation into replication controller, which is going to have the side effect of making list rc not be O(#pods * #rc) anymore. |
#4675 will add GOMAXPROCS to the scheduler, which I missed yesterday. |
After #4429 lands, the times for replication controllers should drop down to be about the same as the pod times. |
I'll paste my comment in this thread then: When checking out the performance data on the apiserver under soak testing it appears that there is a fair amount of gc work going on: 20.43% kube-apiserver [.] runtime.MSpan_Sweep Seems related to: golang/go#9265 go version go1.3.3 linux/amd64 |
More color: #4862 shows a rc fill time ~10 minutes. A part of this lag is apiserver, but there are other issues in the system that we're tracing. |
@timothysc what process are you using to generate those numbers? I think taking boundpods out will drastically speed adding & removing pods to the system, I think there's a lot of contention for updating that. |
@fgrzadkowski There was a request to also test on g1-small since that's the smallest master users might run in practice. There's no expectation that the performance will be good -- we just want to make sure it is not unusable. |
@piosz have enabled load test which creates :
Metrics from the first run:
|
Thanks for the update! Can you indicate which PRs related to this issue are still out for review (other than the CoreOS one), if any? |
CoreOS one is the only PR in-flight. Number in my previous comment where taken from jenkins. |
Great, thanks! So the CoreOS PR will improve the above numbers further? (I assume "taken from jenkins" means only using what is already in the codebase -- but wanted to make sure I am understanding correctly.) |
@davidopp Yes. I'll send a PR to update go-etcd client library version today. |
@fgrzadkowski : Once you're confident we're meeting the goal described in the title of this issue, please move the issue to milestone "1.0-post". (We don't need to wait on the etcd PRs to merge if we're already meeting the goal.) |
I enabled load test. Currently it plays the following scenario:
First run gave really good results (with 8 core master): load test:
density 30 pods per node:
I think we should pay more attention to the load test which plays more realistic scenario. Tomorrow I will set thresholds to 1s for load test and 2s for density test. @davidopp @brendanburns @wojtek-t Assuming that next runs will give similar results I suggest we announce victory. Does that sound reasonable to you? |
@fgrzadkowski just fyi, with the load test we resize rcs quickly so there's a chance you will run into #9147 |
Yes. |
@davidopp Results with 4 core master: Density with 3 pods per node
Density with 30 pods per node
Load test
NOTE: For load tests some requests were performed when we were creating/deleting RCs, so the cluster was not full (full spectrum from 0 to 30 pods per node) @davidopp I suggest we keep 4 core master, but increase the threshold for density to 3 seconds. WDYT? |
Sure, sounds good. |
And thanks for collecting the additional data. That doesn't look too terrible vs. the 8-core data (i.e. it's not 2x the latency). |
After merging #9862 the latency improved #9862 (comment) Results from density test with 30 pods per node in 100 node cluster with n1-standard-4 master:
This is way better (~25%) than the usual run for density test. |
@lavalamp would you be interested in taking this (it's from the v1.0 roadmap)?
Also cc'ing @roberthbailey and @satnam6502 for good measure.
The text was updated successfully, but these errors were encountered: