Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set memory limit #3332

Merged
merged 3 commits into from
Mar 12, 2019
Merged

Set memory limit #3332

merged 3 commits into from
Mar 12, 2019

Conversation

duglin
Copy link

@duglin duglin commented Feb 27, 2019

We're seeing OOM issues at the Node level when we don't set the limit. This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

Signed-off-by: Doug Davis dug@us.ibm.com

/lint

Release Note

NONE

@knative-prow-robot knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 27, 2019
Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an elasticsearch expert but 500M sounds like quite a low limit for elasticsearch to begin with. IIRC it's usually run with gigabytes of heap.

@markusthoemmes
Copy link
Contributor

/assign @mdemirhan

@greghaynes
Copy link
Contributor

AIUI this value wont prevent this process OOMing - it is just a cgroups limit and scheduler hint that will OOM the process at this level. I believe you can set memory limits for java ES uses this way:

env:
- name: ES_JAVA_OPTS
   value: "-Xms500m -Xmx500m"

@duglin
Copy link
Author

duglin commented Feb 27, 2019

I updated the first comment to better reflect the issue. I'm open to other values but w/o this we're seeing our Nodes OOM'ing instead of just the Pod, which should just recycle nicely when it OOM's.

@duglin duglin changed the title Set memory limit [WIP] Set memory limit Feb 27, 2019
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2019
@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 27, 2019
@duglin duglin changed the title [WIP] Set memory limit Set memory limit Feb 27, 2019
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2019
@duglin
Copy link
Author

duglin commented Feb 27, 2019

There were a couple of other Deployments in serving that needed limits too. I raised the limit to 1G.
See what y'all think

@duglin
Copy link
Author

duglin commented Feb 27, 2019

/test pull-knative-serving-integration-tests

@knative-prow-robot knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 27, 2019
@duglin
Copy link
Author

duglin commented Feb 28, 2019

/test pull-knative-serving-integration-tests
/test pull-knative-serving-upgrade-tests

@duglin
Copy link
Author

duglin commented Feb 28, 2019

/test pull-knative-serving-upgrade-tests

@duglin
Copy link
Author

duglin commented Feb 28, 2019

/test pull-knative-serving-integration-tests

@mdemirhan
Copy link
Contributor

Changes in monitoring components seem fine, but we shouldn't really change Istio YAML files. Those are mostly generated from the latest istio helm charts and next time we upgrade istio, it will be overridden. Istio changes should happen via Istio's github project.

@duglin duglin force-pushed the setMemoryLimits branch from 861f408 to 07df2fb Compare March 6, 2019 16:29
@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 6, 2019
@duglin
Copy link
Author

duglin commented Mar 6, 2019

@mdemirhan ok I removed the Istio ones - thanks!

@duglin
Copy link
Author

duglin commented Mar 7, 2019

ping @mdemirhan @markusthoemmes for review

@@ -94,6 +94,7 @@ spec:
name: elasticsearch-logging
resources:
limits:
memory: 1000Mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that 1gig might be too little for Elastic Search. In these memory constraint environments, it might be better to not install Elastic Search at all and undo this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to changing it - what value would you be more comfortable with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good answer to that, but the recommendation from ES is to have as much as you can afford - https://qbox.io/support/article/choosing-a-size-for-nodes

third_party/config/build/release.yaml Outdated Show resolved Hide resolved
Doug Davis added 3 commits March 7, 2019 17:24
We're seeing OOM issues when we don't see the limit

Signed-off-by: Doug Davis <dug@us.ibm.com>
Signed-off-by: Doug Davis <dug@us.ibm.com>
Signed-off-by: Doug Davis <dug@us.ibm.com>
@duglin duglin force-pushed the setMemoryLimits branch from 999d32f to 431f7e1 Compare March 8, 2019 01:28
@knative-prow-robot knative-prow-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 8, 2019
@duglin
Copy link
Author

duglin commented Mar 8, 2019

@mdemirhan I removed third_party/config/monitoring/logging/elasticsearch/elasticsearch.yaml and third_party/config/monitoring/logging/elasticsearch/kibana.yaml thinking that I should edit those some place else too, but I'm not seeing where those might be coming from. Are those files generated too or hand-crafted?

duglin pushed a commit to duglin/build that referenced this pull request Mar 8, 2019
This is the "build" version of knative/serving#3332

We're seeing OOM issues at the Node level when we don't set the limit.
This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

Signed-off-by: Doug Davis <dug@us.ibm.com>
knative-prow-robot pushed a commit to knative/build that referenced this pull request Mar 8, 2019
* Set memory limits

This is the "build" version of knative/serving#3332

We're seeing OOM issues at the Node level when we don't set the limit.
This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

Signed-off-by: Doug Davis <dug@us.ibm.com>

* Fix indentation
@mdemirhan
Copy link
Contributor

@mdemirhan I removed third_party/config/monitoring/logging/elasticsearch/elasticsearch.yaml and third_party/config/monitoring/logging/elasticsearch/kibana.yaml thinking that I should edit those some place else too, but I'm not seeing where those might be coming from. Are those files generated too or hand-crafted?

Those files are mostly hand crafted and we don't continuously update them.

@mdemirhan
Copy link
Contributor

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 8, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: duglin, mdemirhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 8, 2019
@duglin
Copy link
Author

duglin commented Mar 9, 2019

/test pull-knative-serving-upgrade-tests

@mdemirhan
Copy link
Contributor

/test pull-knative-serving-upgrade-tests
/test pull-knative-serving-integration-tests

@mdemirhan
Copy link
Contributor

/test pull-knative-serving-integration-tests

@knative-prow-robot knative-prow-robot merged commit a6854d6 into knative:master Mar 12, 2019
duglin pushed a commit to duglin/eventing that referenced this pull request Mar 19, 2019
We're seeing OOM issues at the Node level when we don't set the limit on some pods. This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

See knative/serving#3332 for the serving equivalent.

Signed-off-by: Doug Davis <dug@us.ibm.com>
duglin pushed a commit to duglin/eventing that referenced this pull request Mar 19, 2019
We're seeing OOM issues at the Node level when we don't set the limit on some pods. This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

See knative/serving#3332 for the serving equivalent.

Signed-off-by: Doug Davis <dug@us.ibm.com>
duglin pushed a commit to duglin/eventing that referenced this pull request Mar 19, 2019
We're seeing OOM issues at the Node level when we don't set the limit on some pods. This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

See knative/serving#3332 for the serving equivalent.

Signed-off-by: Doug Davis <dug@us.ibm.com>
knative-prow-robot pushed a commit to knative/eventing that referenced this pull request Mar 19, 2019
We're seeing OOM issues at the Node level when we don't set the limit on some pods. This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

See knative/serving#3332 for the serving equivalent.

Signed-off-by: Doug Davis <dug@us.ibm.com>
vdemeester pushed a commit to vdemeester/knative-build that referenced this pull request Apr 3, 2019
* Set memory limits

This is the "build" version of knative/serving#3332

We're seeing OOM issues at the Node level when we don't set the limit.
This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

Signed-off-by: Doug Davis <dug@us.ibm.com>

* Fix indentation
vdemeester pushed a commit to vdemeester/knative-build that referenced this pull request Apr 3, 2019
* Set memory limits

This is the "build" version of knative/serving#3332

We're seeing OOM issues at the Node level when we don't set the limit.
This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

Signed-off-by: Doug Davis <dug@us.ibm.com>

* Fix indentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants