Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dynamic allocation of heapster/eventer cpu request/limit #27185

Merged
merged 1 commit into from
Jun 19, 2016

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Jun 10, 2016

@gmarek gmarek added this to the v1.3 milestone Jun 10, 2016
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jun 10, 2016
{% set eventer_memory_per_node = 500 -%}
{% set num_nodes = pillar.get('num_nodes', -1) -%}
{% if num_nodes >= 0 -%}
{% set metrics_memory = (200 + num_nodes * metrics_memory_per_node)|string + "Mi" -%}
{% set metrics_cpu = (num_nodes + 80)|string + "m" -%}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmerlynn @Q-Lee

This would mean that we would use 2.080 cores for 2000-node clusters. Zach said that it would be painful, because that would mean that heapster doesn't fit on 2-core machines.
On the other hand, 1 core seem to be good enough for heapster, but IIUC it's not possible to pass 0.5 millicore to pod nanny, which is so painful...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an update - it's possible to pass 0.5 millicore to the nanny.
However, in nanny it would be treated as 1millicore anyway. We would need to change nanny, which is not trivial, because resource.Quantity (which is used by nanny) doesn't support "multiplication" operation... heh...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I think I know how to solve it relatively easy. We should add another flag to pod-nany that will allow us to add a given number of millicores per given number of nodes. If we set the second flag to 2, and the existing flag to 1millicore, this will result in adding 1 millicore per 2 nodes. This should be easy to do - I can help with that tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I have a PR in-flight to pod-nanny, that will enable passing less-than-millicore values to nanny and have them being correctly handled:
kubernetes-retired/contrib#1195

So once this is in, we should just change it to "0.5m" and we will be all happy, I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - so I implemented and pushed the new version of pod-nanny.
I'm bumping its version in #27268 and in the same PR I'm using the new feature by passing 0.5 millicore per node. This is the most important part of this change, so maybe we can push this one out of 1.3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add multiplication to quantity anyway for another use case. I'll take a todo.

@dchen1107
Copy link
Member

@k8s-bot test this issue: #27179

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2016
{% set base_eventer_memory = "190Mi" -%}
{% set eventer_memory = base_eventer_memory -%}
{% set metrics_memory_per_node = 4 -%}
{% set metrics_cpu_per_node = 1 -%}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 0.5m per node.

Anyway - I think that we should first merge: #27268 and only then rebase this one to be compatible with that one.

@wojtek-t wojtek-t added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 13, 2016
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2016
@gmarek gmarek force-pushed the heapster-cpu branch 2 times, most recently from c50b193 to f6c73c8 Compare June 14, 2016 10:09
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2016
@gmarek gmarek changed the title Adjust heapster cpu request/limit Dynamic allocation of heapster/eventer cpu request/limit Jun 14, 2016
@gmarek gmarek changed the title Dynamic allocation of heapster/eventer cpu request/limit Enable dynamic allocation of heapster/eventer cpu request/limit Jun 14, 2016
@gmarek gmarek added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jun 14, 2016
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2016
@gmarek gmarek removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 14, 2016
@Q-Lee
Copy link
Contributor

Q-Lee commented Jun 14, 2016

Possible conflict with #27187 ?

{% set eventer_memory_per_node = 500 -%}
{% set num_nodes = pillar.get('num_nodes', -1) -%}
{% if num_nodes >= 0 -%}
{% set metrics_memory = (200 + num_nodes * metrics_memory_per_node)|string + "Mi" -%}
{% set metrics_cpu = (80 + num_nodes * metrics_cpu_per_node)|string + "m" -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equation sufficient? Looking at some sample num_node values:

1 node -> 80 + 1_0.5 = 80.5m
10 nodes -> 80 + 10_0.5 = 85m
100 nodes -> 80 + 100*0.5 = 130m
1000 nodes -> 80 + 1000 * 0.5 => 580m

The delta between 10 and 100 nodes is roughly 45m (is that enough?), while from 100 to 1000 is only 450m, which doesn't seem enough. I wonder if we need to have adjustments for various "tiers" of nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it's good enough. In fact, we used to have 100m even for 1000nodes in 1.2 release, and it was somehow working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the limits were used for scheduling, but not strictly enforced (ContainerVM didn't support it). The new GCI nodes should be enforcing it, so this may need to be checked and perhaps revised.

@gmarek gmarek added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 14, 2016
@gmarek
Copy link
Contributor Author

gmarek commented Jun 15, 2016

Resolved. @fabioy @Q-Lee @andyzheng0831 PTAL

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2016
local -r eventer_memory_per_node="500"
local -r nanny_memory_per_node="200"
if [[ -n "${NUM_NODES:-}" && "${NUM_NODES}" -ge 1 ]]; then
num_kube_nodes="$((${NUM_NODES}+1))"
metrics_memory="$((${num_kube_nodes} * ${metrics_memory_per_node} + 200))Mi"
eventer_memory="$((${num_kube_nodes} * ${eventer_memory_per_node} + 200 * 1024))Ki"
nanny_memory="$((${num_kube_nodes} * ${nanny_memory_per_node} + 90 * 1024))Ki"
metrics_cpu=$(echo - | awk "{print ${num_kube_nodes} * ${metrics_cpu_per_node} + 80}")m

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awk is a heavy-weight tool. Can we use the same way as the above line to calculate the variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a float. We could redefine it here to metrics_millicpu_per_node and keep it in integer space, though?

Copy link
Contributor Author

@gmarek gmarek Jun 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that I want it to be 0.5. I can implement it as an ordinary fraction, but that seems even worse than awk. I can switch to bc if that's better for some reason.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not notice it is a float, so please ignore my previous comment about awk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmerlynn - it's already 0.5 millicpu (0.0005 cpu). So if we want to keep it integer, we should probably move to micro-cpus...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math is hard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, not to complain about bash not having floats, but are we OK with awk, should we move to bc, use micro-cpus, or something else?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with awk.

@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

@fabioy @Q-Lee PTAL

@fabioy
Copy link
Contributor

fabioy commented Jun 16, 2016

Change is ok, just wanted to highlight a comment that I made which may have gotten lost in the discussion: please retest these limits on GCI, where the resource limits are being enforced more strictly (ContainerVM didn't have all the proper flags set).

@fabioy fabioy added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2016
@wojtek-t
Copy link
Member

@fabioy - this PR is only changing the initial resource; pod-nanny is changing them anyway; and pod-nanny has already been changed few days ago. And that works fine. So there isn't much risk in it.

@fabioy
Copy link
Contributor

fabioy commented Jun 16, 2016

@wojtek-t Cool, that's good to know. Thanks.

@k8s-bot
Copy link

k8s-bot commented Jun 19, 2016

GCE e2e build/test passed for commit 8617f70.

@goltermann
Copy link
Contributor

@k8s-bot verify this issue: #IGNORE

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jun 19, 2016

GCE e2e build/test passed for commit 8617f70.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 368704d into kubernetes:master Jun 19, 2016
@gmarek gmarek deleted the heapster-cpu branch August 30, 2016 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.