Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change kube-proxy & fluentd CPU request to 20m/80m. #23646

Merged
merged 1 commit into from
Mar 31, 2016

Conversation

cjcullen
Copy link
Member

On 1.1.8, our per-node CPU overhead was 100m (fluentd 100m, kube-proxy unaccounted). To stay at that number, we should adjust fluentd to 80m, and give kube-proxy 20m.

Fixes the "failure to schedule kube-dns" part of #23556.

@cjcullen cjcullen changed the title Change kube-proxy & fluentd CPU request to 50m. WIP: Change kube-proxy & fluentd CPU request to 50m. Mar 30, 2016
@k8s-github-robot k8s-github-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 30, 2016
@k8s-github-robot
Copy link

Labelling this PR as size/XS

@bprashanth
Copy link
Contributor

I think this is a good compromise for right now. Unfortunately kube-proxy will still get starved when all the addons land on the same node, becaues they all have higher limits, which leads to a bad ux since services are a core part of our system, but this is cpu so it should be OK.

We will have to bump up these limits sometime. I'm not sure how we'd do that yet.

This might also result in test flake, but we can handle that as it comes, we can easily figure out a higher timeout till a roll-forward solution presents itself.

@k8s-bot
Copy link

k8s-bot commented Mar 30, 2016

GCE e2e build/test passed for commit 915b6c6513317078df500daa37d055646ebc9a97.

@bprashanth
Copy link
Contributor

@k8s-bot unit test this github issue: #23647

@cjcullen
Copy link
Member Author

I'm going to switch it to 80/20 for fluentd/kube-proxy. In 1.1.8, kube-proxy had no reservation, so this will still be strictly better for kube-proxy, while not constraining fluentd too much.

@cjcullen cjcullen force-pushed the kubeproxy branch 3 times, most recently from 9040ce4 to 20a51cb Compare March 30, 2016 22:28
@cjcullen cjcullen changed the title WIP: Change kube-proxy & fluentd CPU request to 50m. WIP: Change kube-proxy & fluentd CPU request to 25m/75m. Mar 30, 2016
@cjcullen cjcullen changed the title WIP: Change kube-proxy & fluentd CPU request to 25m/75m. Change kube-proxy & fluentd CPU request to 25m/75m. Mar 30, 2016
@cjcullen cjcullen changed the title Change kube-proxy & fluentd CPU request to 25m/75m. Change kube-proxy & fluentd CPU request to 20m/80m. Mar 30, 2016
@@ -17,7 +17,7 @@
- makedirs: true
- dir_mode: 755
- context:
cpurequest: '200m'
cpurequest: '20m'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment with something like:

// 20m might cause cpu starvation for kube-proxy resulting in delayed service updates. However 
// we can't given it more cpu till we have a way to do so without affecting existing Kubernetes clusters. 
// Any change made here must reflect in a proportional change to the requests of other daemons on
// the node, eg fluentd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant # wherever I said //

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's that look?

@alex-mohr
Copy link
Contributor

And documenting for the record, @a-robinson mentioned that fluentd in 1.2 is more efficient than 1.1, so we suspect there won't be much if any regression in user-visible logging performance by downsizing it from 100m to 80m.

@k8s-github-robot
Copy link

Labelling this PR as size/S

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 30, 2016
@k8s-bot
Copy link

k8s-bot commented Mar 30, 2016

GCE e2e build/test passed for commit 20a51cb0e1eebb6bf4bbba8c63a2e37610f24464.

@k8s-bot
Copy link

k8s-bot commented Mar 30, 2016

GCE e2e build/test passed for commit 6386938d3e079994b142aa0c3a06aafadfb2a84f.

@@ -14,7 +14,7 @@ spec:
limits:
memory: 200Mi
requests:
cpu: 100m
cpu: 80m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the same note of caution here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-bot
Copy link

k8s-bot commented Mar 30, 2016

GCE e2e build/test failed for commit 4c18769caca655eb74d8180c9d08d1b0f892c754.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@bprashanth
Copy link
Contributor

Your bug is #23659 in case you really need to push this out and can't re-up

@bprashanth bprashanth added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2016
@cjcullen
Copy link
Member Author

Although that doesn't sound like a kube-proxy flake...

@cjcullen
Copy link
Member Author

@k8s-bot test this github issue: #23668

1 similar comment
@cjcullen
Copy link
Member Author

@k8s-bot test this github issue: #23668

@k8s-bot
Copy link

k8s-bot commented Mar 31, 2016

GCE e2e build/test passed for commit 26a6c66.

@alex-mohr
Copy link
Contributor

Mergebot is blocked on unrelated kubemark test suite broken for past 5 hours. This PR passed tests, is low risk, and fixes a significant regression in the amount of cpu available to users on each node, so manually merging.

@alex-mohr alex-mohr merged commit 2e89f55 into kubernetes:master Mar 31, 2016
@alex-mohr alex-mohr added cherrypick-candidate cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. labels Mar 31, 2016
@alex-mohr alex-mohr added this to the v1.2 milestone Mar 31, 2016
@wojtek-t
Copy link
Member

@gmarek - FYI

@david-mcmahon
Copy link
Contributor

The cherrypick process is defined here. Hoping to get this PR merged today, but the details are here. You need to run the cherry pick tool yourself to get these onto the branch. @bgrant0607 will then approve them for merge.

@bgrant0607
Copy link
Member

Every cherrypick needs a release note.

@bgrant0607 bgrant0607 added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Mar 31, 2016
@bgrant0607
Copy link
Member

This PR has no description and no reference to a P0 issue.

@cjcullen
Copy link
Member Author

Fixes the "failure to schedule kube-dns" part of #23556.

@cjcullen
Copy link
Member Author

Internal bugref: 27924350

@bgrant0607 bgrant0607 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Mar 31, 2016
bgrant0607 added a commit that referenced this pull request Mar 31, 2016
…46-upstream-release-1.2

Automated cherry pick of #23646
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.2" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@fagiani
Copy link

fagiani commented Apr 2, 2016

@cjcullen thanks for this PR, it helps a lot to get smaller clusters running smooth. I've tested with 1 master and 2 node hosts on AWS m3.medium machines and it works great!

@thockin
Copy link
Member

thockin commented Apr 4, 2016

@vishh @dchen1107 how far are we from having a way to use docker's --cgroup-parent logic? This might make a cool intern project if we can spec it enough. Then we could have a handful of special cgroups (let's call it "sys" for things that need to run on each machine) that some pods can ask to be contained in. I doubt that is the right API but we can work on that as we spec...

@vishh
Copy link
Contributor

vishh commented Apr 4, 2016

@thockin that's the plan. My intern is most likely working on cgroups.
On Apr 3, 2016 10:03 PM, "Tim Hockin" notifications@github.com wrote:

@vishh https://github.com/vishh @dchen1107
https://github.com/dchen1107 how far are we from having a way to use
docker's --cgroup-parent logic? This might make a cool intern project if we
can spec it enough. Then we could have a handful of special cgroups (let's
call it "sys" for things that need to run on each machine) that some pods
can ask to be contained in. I doubt that is the right API but we can work
on that as we spec...


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23646 (comment)

@vishh
Copy link
Contributor

vishh commented Apr 14, 2016

BTW we already use cgroup parent, except that it is not configurable per
pod.
On Apr 3, 2016 10:03 PM, "Tim Hockin" notifications@github.com wrote:

@vishh https://github.com/vishh @dchen1107
https://github.com/dchen1107 how far are we from having a way to use
docker's --cgroup-parent logic? This might make a cool intern project if we
can spec it enough. Then we could have a handful of special cgroups (let's
call it "sys" for things that need to run on each machine) that some pods
can ask to be contained in. I doubt that is the right API but we can work
on that as we spec...


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23646 (comment)

@davidopp
Copy link
Member

I don't think that will solve the problem of occasionally needing to increase the sys cgroup's size from time to time, though.

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…ck-of-#23646-upstream-release-1.2

Automated cherry pick of kubernetes#23646
shouhong pushed a commit to shouhong/kubernetes that referenced this pull request Feb 14, 2017
…ck-of-#23646-upstream-release-1.2

Automated cherry pick of kubernetes#23646
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.