Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Determine the scalability goal for v1.2 kubelet #16943

Closed
yujuhong opened this issue Nov 6, 2015 · 49 comments
Closed

RFC: Determine the scalability goal for v1.2 kubelet #16943

yujuhong opened this issue Nov 6, 2015 · 49 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@yujuhong
Copy link
Contributor

yujuhong commented Nov 6, 2015

The umbrella issue for kubelet's scalability: #12540

Kubelet manages and monitors all user pods on a node to ensure their container states meet the desired pod specifications. Every pod and/or container Kubelet manages incurs a certain resource overhead even in the absence of active events (i.e., no spec change or container lifecycle events). The exact per-pod overhead varies based on the detailed pod specification, but in general, such steady-state management overhead of is proportional to the number of active pods/containers. When there no active pods, the resource overhead should be negligible.

Below are some resource usage numbers for the v1.1 kubelet.

v1.1 Resource Usage** (commit: dd1187c)

Cluster setup/workload:

  • Three n1-standard-2 nodes on GCE (i.e., 2-core)
  • Running all default cluster add-on pods
  • 0 and 40 single-container pause pods per node
  • 30-minutes measurement period
  • The cpu usage numbers are aggregated by 10s _1s_ intervals
  • The working set memory size was hard to measured correctly (see Improved memory usage measurements #12422 for details). In the experiments, I set logging level to 0 and lowered the cgroup memory limit of kubelet and docker to create a constant memory pressure, then let them bounce back until the usage was stable. The numbers are approximates and should only be considered as ballparks.

0 pause pods per node

median cpu 95th% cpu memory
kubelet 0.02 0.05 35MB
docker 0.01 0.03 35MB
kube-proxy <0.001 0.01 <5MB*

40 pause pods per node

median cpu 95th% cpu memory
kubelet 0.18 0.64 55MB
docker 0.01 0.62 70MB
kube-proxy <0.001 0.03 <5MB*

kube-proxy’s memory usage depends on the number of services in the cluster. In this test case, the number of services is 7.

Note that, kubelet syncs all pods periodically, hence even though the median cpu usage tends to be lower, the cpu spikes are significant.

90 pause pods per node?

v1.1 does not support 90 pods per node, but I did some rough measurement and the median cpu usage of kubelet and docker combined varies from 0.35 to 1.2 cores -- it was right on the edge of a steep curve. Beyond 70th%, it was consistently > 1.4 core.

Reaction latency

The latency Kubelet takes to notice a container event (e.g. died) varies and could be up to 10+ seconds, since the sync period is 10s.

v1.2 Resource Usage Target

For v1.2 we want to set a goal so that we can work towards to it. The goal would affect what approach we pick to tackle the issues.

Below are some tentative targets. Suggestions/comments are welcome.

  • Number of pods: 90 pause pods + all add-on pods
  • Reaction latency (for majority of the container evens): within 3s
  • cpu usage for kubelet and docker combined:
    • median: 0.2 core (10% of the 2-core machine)
    • 95th%: 0.6 core
    • 99th%: TBD

Caveats

  • The detailed pod spec may affect the resource usage. For example, if every container defines a health check, the overhead of periodic probing may significantly increase the cpu usage. We should also publish the usage numbers of the worst-case scenario.
  • We focus on the steady-state resource usage. When there are non-monitoring work involved (e.g. container creation/deletion), the cpu usage would of course increase. We can further restrict the usage by either rate-limiting operations, or set a reasonable cpu share for kubelet and docker. Again, this is not the focus of the issue.
  • The built-in cadvisor constantly monitors all containers stats. The cpu usage of cadvisor for 100 pods is not ideal (see Provide an option to disable/mock cadvisor in kubelet #16296 (comment)), but seems tolerable at the moment. We will optimize that if it becomes the bottleneck.
  • We will want to optimize the 99th% cpu usage later as well. The current measurements are not very meaningful because kubelet and docker each consumes ~1 core at 99th% cpu usage.
  • If we use a smaller cpu measurement interval, we might be able to capture the fine-grained, more dynamic behavior. On the other hand, the measurement overhead would increase.
  • The goal will be specifically for docker, but the most of the improvement will apply to rkt as well.

/cc @kubernetes/goog-node

@yujuhong yujuhong added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Nov 6, 2015
@yujuhong yujuhong added this to the v1.2-candidate milestone Nov 6, 2015
@derekwaynecarr
Copy link
Member

@yujuhong - for the 90 pause pods per node use case, do you have a breakdown betweek docker and kubelet for those numbers rather than just combined?

@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 6, 2015

@derekwaynecarr, those are just combined. For 50%th, the numbers vary a lot like I mentioned. As for 70th%, kubelet is ~0.7 core, while docker ~0.8 core.

@vishh
Copy link
Contributor

vishh commented Nov 6, 2015

cpu usage for kubelet and docker combined:

Do we have an understanding of the performance bottlenecks in kubelet and docker? Will the current goals be achievable with just kubelet optimizations or will it require docker changes as well?

@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 6, 2015

Do we have an understanding of the performance bottlenecks in kubelet and docker? Will the current goals be achievable with just kubelet optimizations or will it require docker changes as well?

The problem is two-fold:

  1. Each pod worker polls docker directly.
  2. Pod workers periodically wakes up to check even when there are no work to do.

(1) is the main problem as docker gets overwhelmed pretty quickly as we scale the number of pods. (2) is also significant, but not as much as (1).
To reach the goal, we need to tackle both:

  • Using a generic PLEG (with just relisting) would reduce both (1) and (2). However, we need to preserve the periodic sync as a safety net. Even at 1 minute periodic sync, the 99th% cpu usage is still high (~1.4 cores).
  • Using a pod cache would reduce (1), and would also lower the 99th% cpu usage.
  • Using the docker container event stream would allow us to discover and react to events faster. If we are willing to tolerate the slower reaction time, we could rely on the purely-relisting generic PLEG.

The goal is still quite aggressive. If we cannot reach the goal after implementing the solutions, I will look in to kubelet's global cleanup routine, cadvisor's housekeeping, or even docker client, etc. I haven't spent time digging into the further/smaller improvements yet. However, I'd try to target the parts that are container-runtime-agnostic.

I would like feedback on a few questions though:

  • How important it is to have a super fast reaction time? What's an acceptable range (ignoring the 99.9th percentiles)?
  • Do you think the current kubelet/docker resource usage is acceptable?

@dchen1107
Copy link
Member

@yujuhong Can we convert the content of v1.1 Resource Usage to a doc (.md file) like what we discussed the other day? Thanks!

@timstclair
Copy link

How important it is to have a super fast reaction time? What's an acceptable range (ignoring the 99.9th percentiles)?

I think a super-fast reaction time is less important with the addition of ProbePeriod. If a specific pod has a high requirement for time to restart, then the user could just add a liveness probe with a short period. This approach would scale poorly, but I'm assuming most pods don't have that high of a reaction requirement. On a related note - how does probing factor in to this? Do we have any stats or requirements around probed-pod resource usage and scalability?

@vishh
Copy link
Contributor

vishh commented Nov 6, 2015

IIUC, focus will be on kubelet optimizations first.

How important it is to have a super fast reaction time? What's an acceptable range (ignoring the 99.9th percentiles)?

From a user perspective, low latency seems desirable. Based on the scalability blog, the pod-startup times are expected to be lesser than 5 seconds. Would it be fair to expect status updates to be within the same bounds?

Do you think the current kubelet/docker resource usage is acceptable?

Yes. I think scalability (# of pods per node) is more important that optimized resource usage as of now.

@derekwaynecarr
Copy link
Member

Yes. I think scalability is more important that optimized resource usage as of now.

@vishh - do you mean scalability on number of nodes in cluster instead of number of containers per node? In my experience, users running smaller cluster sizes are surprised by the overhead of system daemons on the node.

@vishh
Copy link
Contributor

vishh commented Nov 6, 2015

@derekwaynecarr: I meant the number of pods a node can run without compromising reliability and performance.

v1.1 includes quite a bit of optimization as illustrated by @yujuhong's data. Do you think those numbers are not acceptable for production scenarios?

By small clusters are you referring to the size or number of nodes? Based on the current data 95%ile CPU overhead on the nodes is ~1cpu.

@dchen1107
Copy link
Member

@yujuhong Please indicate that your memory stats for both Kubelet and Docker by applying hard limit to memcg group which tricks the dirty pages to flush to the disk more frequently. But 1.1 release by default, there is no hard limit to apply to those daemons' memcg cgroup. We plan to have this for 1.2 release (aggressive goal). To document those number, so that we have a benchmark to compare for future releases, and easily to detect the regression or improvements.

To answer your above two questions:

  • How important it is to have a super fast reaction time? What's an acceptable range (ignoring the 99.9th percentiles)?
    There are different reaction times: 1) Pod first start time latency 2) container failure detection and restart latency 3) container's probing response latency, etc. Each of them has different set of bottlenecks and could be addressed separately. Today we don't have SLOs set for those different reaction times yet.

cc/ @wojtek-t who is in charging of performance in general.

  • Do you think the current kubelet/docker resource usage is acceptable?
    I personally think the current number is acceptable. Of course, we should optimize them, but that is not the high priority in our list.

@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 7, 2015

Please indicate that your memory stats for both Kubelet and Docker by applying hard limit to memcg group which tricks the dirty pages to flush to the disk more frequently.

I did in my original comment :)

How important it is to have a super fast reaction time? What's an acceptable range (ignoring the 99.9th percentiles)? There are different reaction times: 1) Pod first start time latency 2) container failure detection and restart latency 3) container's probing response latency, etc. Each of them has different set of bottlenecks and could be addressed separately. Today we don't have SLOs set for those different reaction times yet.

  • For 1), kubelet tries to respond as quickly as possible after it sees the pod from watch.
  • For 3), we have per-container probing period defined in the pod spec. They will be respected as much as possible.

Both (1) and (3), should be fine as long as kubelet doesn't go crazy with the resource usage (which it might today).
My main problem is with (2), as it's not defined anywhere and the expectation is unclear. If we decide to optimize as much as possible, container event stream is the way to go. If we can tolerate a certain delay, relisting may just be enough to carry us.

I personally think the current number is acceptable. Of course, we should optimize them, but that is not the high priority in our list.

@dchen1107, do you think they are acceptable for 40 pods or for 90 pods? There is a huge different between the two sets of numbers.
IMHO, the 90-pod numbers are certainly too high. To what extent we should optimize is the problem.

@vishh
Copy link
Contributor

vishh commented Nov 7, 2015

My main problem is with (2), as it's not defined anywhere and the expectation is unclear. If we decide to optimize as much as possible, container event stream is the way to go. If we can tolerate a certain delay, relisting may just be enough to carry us.

AFAIK, we are only publishing SLI. I don't think there are any SLO's documented.

IIUC, the improvements you have in pipeline for node scalability should also result in reduced resource consumption right?

@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 7, 2015

From a user perspective, low latency seems desirable. Based on the scalability blog, the pod-startup times are expected to be lesser than 5 seconds. Would it be fair to expect status updates to be within the same bounds?

My focus is mainly on detection the container failure -- sorry if I didn't make it clear in the original comment. If all we need is ~5 second latency (from container failure to pod worker waking up, excluding the actual container creation time), we might even get by with relisting PLEG + a dumb pod cache with a global update timestamp. Pod workers only have to wake up and see if the timestamp is newer than the completed time of its last sync. The cache itself, other than being populated by PLEG, will work similar to the runtime cache: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/runtime_cache.go. Of course, it will include more information (form docker inspect), but the gist is the same. This means we don't need to record pod worker expectations proposed in #12810 at all.

_UPDATE: Caveat: relisting may eventually become the bottleneck if we want to scale further_

Yes. I think scalability (# of pods per node) is more important that optimized resource usage as of now.

We are going to scale by reducing the resource usage footprint, which means these two things are essentially the same. Do you mean the resource usage is okay as long as we can scale to 90 pods and maintain the same resource usage as we have for 40 pods today?

@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 7, 2015

I think a super-fast reaction time is less important with the addition of ProbePeriod. If a specific pod has a high requirement for time to restart, then the user could just add a liveness probe with a short period. This approach would scale poorly, but I'm assuming most pods don't have that high of a reaction requirement. On a related note - how does probing factor in to this? Do we have any stats or requirements around probed-pod resource usage and scalability?

Please see #16943 (comment) for the potential implementation if we can relax the delay :)
In short, pure relisting does not scale, but with relaxed reaction time, kubelet may just get by for our v1.2 goal. If we need to scale to more pods, this could become the bottleneck though.

Probing is the missing piece of the puzzle in my measurement, since users can customize the probing period. I think we should create workload with different probing period, and see how much kubelet can withstand while still being performant. At the least we should have a recommended probing period for N pods/containers on the node. Users can choose to set the a smaller period for select few containers, but hopefully not all of them.

@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 7, 2015

@vishh - do you mean scalability on number of nodes in cluster instead of number of containers per node? In my experience, users running smaller cluster sizes are surprised by the overhead of system daemons on the node.

I agree with @derekwaynecarr on this. The management overhead for a node in the steady state is quite high today. Reducing kubelet's and docker's cpu usage would help us scale the #pods on the node, but more importantly, it would reduce the resource footprint even for fewer (<= 40) pods.

However, for a smaller cluster in quite a few user reports, it was the monitoring/logging pods that were hogging the resource... :-(

@dchen1107
Copy link
Member

@yujuhong yes, we only have issue with 2) case I listed above. What is the interval of relist in your mind? We can do some measurement with it.

Honestly I don't think there is a big issue since we already introduced backoff on restarting dead containers. If some users really care about the react time for 2), we can add support for using the docker container event stream to cover it. Just need to do it step-by-step.

By the way, we can also listen kernel's netlink model on process / thread's events directly to have our own process tracking system without depending on any container runtime's event stream. But that is a separate topic, and we won't get to that soon.

do you think they are acceptable for 40 pods or for 90 pods?
It is for 40 pods. For 90 pods, we should optimize, but that is relatively lower priority to the scalability and performance which we are discussing here. Once you are done with your prototype, we can measure and understand where the resource goes.

@bprashanth
Copy link
Contributor

40 pause pods per node

median cpu 95th% cpu memory
kubelet 0.18 0.64 55MB
docker 0.01 0.62 70MB
kube-proxy <0.001 0.03 <5MB*

Written as a semi-educated, casual user of Kubernetes: Isn't spiking to 65% of a core every 10s essentially as bad as using 65%, always? from a capacity planning perspective I still have to shave off 1/2 a core per node for management. If we somehow saved 1/2 a core per node on a 250 node cluster, I'd use that budget to buy a 16 core master and run a 350 nodes cluster.

Written as myself: We argued a lot about whether it was ok to use 2/4 extra cores on the master for 100-250 nodes.

@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 7, 2015

Update:

The generic PLEG has been merged (#13571), and the excessive container listing has been disabled in #17545.
Using HEAD (b7d8221), the cpu usage of running 40 pause pods has improved.

kubelet:

  • median: 0.18 -> 0.09
  • 95th%: 0.64 -> 0.23
  • 99th%: ~1 -> 0.45

docker:

  • median: 0.01 -> 0.03
  • 95th%: 0.62 -> 0.17
  • 99th%: ~1 -> 0.48

Note that because the pod syncing period is 1 minute now, the peak cpu usage is affected by how many pods syncing simultaneously. I picket the largest number among 10 runs for each percentile to account for this fluctuation.

@vishh
Copy link
Contributor

vishh commented Dec 8, 2015

This is awesome 👏

@dchen1107
Copy link
Member

Great progress! Looking forward to seeing the improvement once we have runtime pod cache in.

@yujuhong
Copy link
Contributor Author

yujuhong commented Jan 4, 2016

@luxas, I run this e2e test to get the cpu usage of kubelet and docker. The test queries cadvisor on each node for data over a preset period of time and calculate the percentiles, etc, locally.

@vishh
Copy link
Contributor

vishh commented Jan 6, 2016

cc @timothysc

@vishh
Copy link
Contributor

vishh commented Jan 6, 2016

cc @jeremyeder

@timothysc
Copy link
Member

We should re-update and compare numbers once #19850 and accompanying bugz are fixed.

@yujuhong
Copy link
Contributor Author

We should re-update and compare numbers once #19850 and accompanying bugz are fixed.

Yes. Hopefully it won't take long...

@yujuhong
Copy link
Contributor Author

Update:

The generic PLEG has been merged (#13571), and the excessive container listing has been disabled in #17545.
Using HEAD (b7d8221), the cpu usage of running 40 pause pods has improved.

kubelet:

median: 0.18 -> 0.09
95th%: 0.64 -> 0.23
99th%: ~1 -> 0.45
docker:

median: 0.01 -> 0.03
95th%: 0.62 -> 0.17
99th%: ~1 -> 0.48
Note that because the pod syncing period is 1 minute now, the peak cpu usage is affected by how many pods syncing simultaneously. I picket the largest number among 10 runs for each percentile to account for this fluctuation.

UPDATE:
Using b3bc741, I an some more resource monitoring tests against a test cluster with 1s granularity. This includes #19750 and #20726 and use docker v1.9.1 (as opposed to v1.8.3)
Note that I had to turn off heapster/influxdb/grafana/elastic-search RCs to avoid some crash loopping problems.

40 pause pods:
kubelet

  • median: 0.18 (v1.1) -> 0.11
  • 95th%: 0.64 (v1.1) -> 0.19
  • 99th%: ~1 (v1.1) -> 0.26

docker:

  • median: 0.01 (v1.1) -> 0.02
  • 95th%: 0.62 (v1.1) -> 0.07
  • 99th%: ~1 (v1.1) -> 0.11

100 pause pods:
kubelet

  • median: 0.21
  • 95th%: 0.34
  • 99th%: 0.45

docker:

  • median: 0.05
  • 95th%: 0.10
  • 99th%: 0.16

As another reference point, I ran some stress tests with a 100 single-container (gcr.io/google_containers/test-webserver) pods in which each container has a http liveness probe with 1s period, and the same configuration for a readiness probe. Since the unit of probing period is second, this is the most aggressive setting.
Below only shows kubelet's cpu usage because probing affects kubelet more significantly (unless the probe uses exec).
40 pods

  • median: 0.11 -> 0.23
  • 95th%: 0.19 -> 0.35
  • 99th%: 0.26 -> 0.42

100 pods:

  • median: 0.21 -> 0.56
  • 95th%: 0.34 -> 0.79
  • 99th%: 0.45 -> 1.08

Sorry for not generating pretty graphs. My attempts to collect more data in the past week were interrupted a few times due to various docker/kubelet issues.

In short, running single-container 100 pause pods on a single, reasonable-spec node looks ok as far as the cpu usage is concerned for kubelet and docker. The caveats remain:

  • Aggressive probing can significantly increase kubelet's cpu uasge.
  • This measurement is focused on nodes in the steady state. During batch creation/deletion of pods, the node will consume more resources and will appear less responsive. We may consider rate limit such operations in the future.
  • If user scales the number of containers per pod, the usage will of course be higher (although it should have less impact compared with number of pods).
  • cpu usage will be higher if containers need to be restarted.

@gmarek
Copy link
Contributor

gmarek commented Feb 15, 2016

@yujuhong - Nice! Does this mean that we officially support 100 pods/node for 1.2?

@timothysc
Copy link
Member

Could we update the defaults now?

/cc @kubernetes/sig-scalability

@luxas
Copy link
Member

luxas commented Feb 15, 2016

100 pods per node support would be awesome
Really looking forward to even better performance since I run k8s on Raspberry Pis :)

@yujuhong
Copy link
Contributor Author

@yujuhong - Nice! Does this mean that we officially support 100 pods/node for 1.2?

Could we update the defaults now?

I have some reservations about bumping the max-pods to 100 because of the some issues mentioned above (e.g., we don't rate-limit batch creation/deletion, and aggressive probing could really stress the node). More importantly, when under stress, the node is likely to encounter unrecoverable docker/kernel issues. I've run into the kernel bug #20096 (which the cluster team has a workaround in progress), and moby/moby#18527 (with docker v1.9.1). I think we should still update the default, but perhaps we should be more conservative, or stress test more?

@luxas
Copy link
Member

luxas commented Feb 16, 2016

@yujuhong Maybe 60 pods?

@dchen1107
Copy link
Member

@yujuhong Here is my counter argument on #16943 (comment)

  • The issue of batch creation/deletion, kernel bug Mitigate impact of unregister_netdevice kernel race #20096 are more related to churn, not number here. For batch creation / deletion, we could add rate-limit in our system, instead of capping pods by number.
  • The issue of aggressive probing could happen with small max_pods but many containers within one pod.
  • The docker issue of running out of ips (No available IPv4 addresses on this network's address pools: bridge moby/moby#18527) is clearly a bug, and I didn't see it is relevant to number of pods. I saw this once before on master node, which only has 5 pods running.
  • In reality, how often we would run into the issue caused by batch creation / deletion before adding rate-limit feature in kubelet? Even with such issue, how long will it take for kubelet and node stable in production? My observation is that unless we hit a real out-of-resource issue on a given node, or other kernel / docker issues, kubelet / node will eventually be stabilized.
  • Enabling 100 pods by default (on a big node) can help us to find more issues / bottleneck in our system.

@yujuhong
Copy link
Contributor Author

@yujuhong Here is my counter argument on #16943 (comment)

In short, I discussed with @dchen1107 offline and we agreed that we'll update the max-pods to 100 pods and monitor our jenkins builds to see how stable node/kubelet/docker is.

The issue of batch creation/deletion, kernel bug #20096 are more related to churn, not number here. For batch creation / deletion, we could add rate-limit in our system, instead of capping pods by number.

Yes, high number of concurrent docker requests is the key. For the reference, there is an old issue for throttling container startup #3312, which did not receive much love. Perhaps it's time to re-evaluate. We probably won't do this for v1.2 though.

The issue of aggressive probing could happen with small max_pods but many containers within one pod.

Yes, if a user creates 100 containers per pod, it'd have the same effect as far as probing is concerned. However, this should be unusual. I was simply pointing out that our setting (single-container per pod) is relatively conservative, and user may experience less responsive kubelet if they keep 2~3 containers per pod.

In reality, how often we would run into the issue caused by batch creation / deletion before adding rate-limit feature in kubelet? Even with such issue, how long will it take for kubelet and node stable in production? My observation is that unless we hit a real out-of-resource issue on a given node, or other kernel / docker issues, kubelet / node will eventually be stabilized.

Besides stability, kubelet will become less responsive during batch pod creation/deletion. This duration will grow as the number of pods scale.

@gmarek
Copy link
Contributor

gmarek commented Feb 17, 2016

@yujuhong - we're continuously running only 30 pods/node density tests. You can run density test by hand with 100 pods/node saturation (--ginkgo.focus='starting 100 pods). Can you run it on a 100 node cluster to make sure we're meeting our SLO?

cc @wojtek-t

@yujuhong
Copy link
Contributor Author

@yujuhong - we're continuously running only 30 pods/node density tests. You can run density test by hand with 100 pods/node saturation (--ginkgo.focus='starting 100 pods). Can you run it on a 100 node cluster to make sure we're meeting our SLO?

I don't have a 100 node cluster. Do we have a 100-node cluster on jenkins that runs density test?

@gmarek
Copy link
Contributor

gmarek commented Feb 17, 2016

Kubernetes-scale

@yujuhong
Copy link
Contributor Author

Kubernetes-scale

I will modify density test to start 100 pods on a node and then test locally on my 3-node cluster until it's stably passes.

As for running this in a 100 node cluster, I don't think it makes a huge difference for nodes, but would probably have more impact on the apiserver. it might not be crazy to let kubernetes-scale catch this since kubemark should be a good indication that the apiserver can handle more?

@gmarek
Copy link
Contributor

gmarek commented Feb 17, 2016

I'm not worried about Kubelet - I'm sure it'll handle. I just want to be sure that increasing pod density 3x won't expose some bugs in API server.

And yes - running 100 pod/node Kubemark will do.

@timothysc
Copy link
Member

I'll find out soon-ish 🎱

@gmarek
Copy link
Contributor

gmarek commented Feb 17, 2016

@timothysc - thanks!

@yujuhong
Copy link
Contributor Author

I'm not worried about Kubelet - I'm sure it'll handle. I just want to be sure that increasing pod density 3x won't expose some bugs in API server.

And yes - running 100 pod/node Kubemark will do.

I ran the 100 pods/node density test in a 100-node kubemark cluster and it passed three times in a row. I used the same configuration as the jenkins kubemark suite: 10 n1-standard-2 gce instances.
I pasted one of the test logs in case any of you want to take a look.

@gmarek @wojtek-t

@yujuhong
Copy link
Contributor Author

We have quite a few e2e tests that runs 100 pods per node regularly (e.g., tracking resource usage, and testing scheduler decisions.
kubernetes-kubemark-high-density-100-gce (thanks to @gmarek) runs 100 pods per node in a kubemark cluster. The only missing piece is running density test with a real kubelet near its maximum capacity (100 pods per node). @gmarek, will we be able to convert one of the scalability suite to do it, or should I make the serial suite run this particular configuration?

@gmarek
Copy link
Contributor

gmarek commented Mar 15, 2016

@yujuhong - yes I can. I plan on rethinking our scale testing this week, as they consume a lot of resources. I'll take high-density testing into account.

@yujuhong
Copy link
Contributor Author

yujuhong commented Apr 8, 2016

Let's close this since v1.2 is out.

@yujuhong yujuhong closed this as completed Apr 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests