Set memory limit #3332

duglin · 2019-02-27T15:49:21Z

We're seeing OOM issues at the Node level when we don't set the limit. This will allow the pod to restart gracefully w/o taking down the entire node.

I'm not stuck on the specific value so we can pick a different one.

Signed-off-by: Doug Davis dug@us.ibm.com

/lint

Release Note

NONE

markusthoemmes

Not an elasticsearch expert but 500M sounds like quite a low limit for elasticsearch to begin with. IIRC it's usually run with gigabytes of heap.

markusthoemmes · 2019-02-27T15:57:25Z

/assign @mdemirhan

greghaynes · 2019-02-27T16:14:01Z

AIUI this value wont prevent this process OOMing - it is just a cgroups limit and scheduler hint that will OOM the process at this level. I believe you can set memory limits for java ES uses this way:

env:
- name: ES_JAVA_OPTS
   value: "-Xms500m -Xmx500m"

duglin · 2019-02-27T16:29:37Z

I updated the first comment to better reflect the issue. I'm open to other values but w/o this we're seeing our Nodes OOM'ing instead of just the Pod, which should just recycle nicely when it OOM's.

duglin · 2019-02-27T17:38:59Z

There were a couple of other Deployments in serving that needed limits too. I raised the limit to 1G.
See what y'all think

duglin · 2019-02-27T18:11:41Z

/test pull-knative-serving-integration-tests

duglin · 2019-02-28T00:07:48Z

/test pull-knative-serving-integration-tests
/test pull-knative-serving-upgrade-tests

duglin · 2019-02-28T12:12:02Z

/test pull-knative-serving-upgrade-tests

duglin · 2019-02-28T12:12:49Z

/test pull-knative-serving-integration-tests

mdemirhan · 2019-03-06T16:09:02Z

Changes in monitoring components seem fine, but we shouldn't really change Istio YAML files. Those are mostly generated from the latest istio helm charts and next time we upgrade istio, it will be overridden. Istio changes should happen via Istio's github project.

duglin · 2019-03-06T17:42:18Z

@mdemirhan ok I removed the Istio ones - thanks!

duglin · 2019-03-07T19:05:05Z

ping @mdemirhan @markusthoemmes for review

mdemirhan · 2019-03-07T23:51:43Z

third_party/config/monitoring/logging/elasticsearch/elasticsearch.yaml

@@ -94,6 +94,7 @@ spec:
        name: elasticsearch-logging
        resources:
          limits:
+            memory: 1000Mi


I am worried that 1gig might be too little for Elastic Search. In these memory constraint environments, it might be better to not install Elastic Search at all and undo this change.

I'm open to changing it - what value would you be more comfortable with?

I don't have a good answer to that, but the recommendation from ES is to have as much as you can afford - https://qbox.io/support/article/choosing-a-size-for-nodes

third_party/config/build/release.yaml

We're seeing OOM issues when we don't see the limit Signed-off-by: Doug Davis <dug@us.ibm.com>

Signed-off-by: Doug Davis <dug@us.ibm.com>

duglin · 2019-03-08T01:38:03Z

@mdemirhan I removed third_party/config/monitoring/logging/elasticsearch/elasticsearch.yaml and third_party/config/monitoring/logging/elasticsearch/kibana.yaml thinking that I should edit those some place else too, but I'm not seeing where those might be coming from. Are those files generated too or hand-crafted?

This is the "build" version of knative/serving#3332 We're seeing OOM issues at the Node level when we don't set the limit. This will allow the pod to restart gracefully w/o taking down the entire node. I'm not stuck on the specific value so we can pick a different one. Signed-off-by: Doug Davis <dug@us.ibm.com>

* Set memory limits This is the "build" version of knative/serving#3332 We're seeing OOM issues at the Node level when we don't set the limit. This will allow the pod to restart gracefully w/o taking down the entire node. I'm not stuck on the specific value so we can pick a different one. Signed-off-by: Doug Davis <dug@us.ibm.com> * Fix indentation

mdemirhan · 2019-03-08T17:12:37Z

@mdemirhan I removed third_party/config/monitoring/logging/elasticsearch/elasticsearch.yaml and third_party/config/monitoring/logging/elasticsearch/kibana.yaml thinking that I should edit those some place else too, but I'm not seeing where those might be coming from. Are those files generated too or hand-crafted?

Those files are mostly hand crafted and we don't continuously update them.

mdemirhan · 2019-03-08T17:16:07Z

/lgtm
/approve

knative-prow-robot · 2019-03-08T17:16:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: duglin, mdemirhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~config/monitoring/OWNERS~~ [mdemirhan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

duglin · 2019-03-09T21:26:40Z

/test pull-knative-serving-upgrade-tests

mdemirhan · 2019-03-11T18:11:45Z

/test pull-knative-serving-upgrade-tests
/test pull-knative-serving-integration-tests

mdemirhan · 2019-03-12T16:36:32Z

/test pull-knative-serving-integration-tests

We're seeing OOM issues at the Node level when we don't set the limit on some pods. This will allow the pod to restart gracefully w/o taking down the entire node. I'm not stuck on the specific value so we can pick a different one. See knative/serving#3332 for the serving equivalent. Signed-off-by: Doug Davis <dug@us.ibm.com>

* Set memory limits This is the "build" version of knative/serving#3332 We're seeing OOM issues at the Node level when we don't set the limit. This will allow the pod to restart gracefully w/o taking down the entire node. I'm not stuck on the specific value so we can pick a different one. Signed-off-by: Doug Davis <dug@us.ibm.com> * Fix indentation

knative-prow-robot requested review from mdemirhan and yanweiguo February 27, 2019 15:49

knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 27, 2019

markusthoemmes reviewed Feb 27, 2019

View reviewed changes

knative-prow-robot assigned mdemirhan Feb 27, 2019

duglin force-pushed the setMemoryLimits branch from 022a9e2 to 7213538 Compare February 27, 2019 16:27

duglin changed the title ~~Set memory limit~~ [WIP] Set memory limit Feb 27, 2019

knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2019

duglin force-pushed the setMemoryLimits branch from 7213538 to a632903 Compare February 27, 2019 17:35

knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 27, 2019

duglin force-pushed the setMemoryLimits branch from a632903 to 08827cd Compare February 27, 2019 17:37

duglin changed the title ~~[WIP] Set memory limit~~ Set memory limit Feb 27, 2019

knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2019

duglin mentioned this pull request Feb 27, 2019

Kibana-logging deployment is missing memory request/limit sizes #3333

Closed

duglin force-pushed the setMemoryLimits branch from 08827cd to 861f408 Compare February 27, 2019 18:42

knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 27, 2019

duglin force-pushed the setMemoryLimits branch from 861f408 to 07df2fb Compare March 6, 2019 16:29

knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 6, 2019

mdemirhan reviewed Mar 7, 2019

View reviewed changes

Doug Davis added 3 commits March 7, 2019 17:24

Set memory limit

74453a4

We're seeing OOM issues when we don't see the limit Signed-off-by: Doug Davis <dug@us.ibm.com>

undo the istio stuff

109a99d

Signed-off-by: Doug Davis <dug@us.ibm.com>

remove more stuff

431f7e1

Signed-off-by: Doug Davis <dug@us.ibm.com>

duglin force-pushed the setMemoryLimits branch from 999d32f to 431f7e1 Compare March 8, 2019 01:28

knative-prow-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 8, 2019

duglin mentioned this pull request Mar 8, 2019

Set memory limits knative/build#572

Merged

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 8, 2019

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 8, 2019

knative-prow-robot merged commit a6854d6 into knative:master Mar 12, 2019

duglin mentioned this pull request Mar 19, 2019

Set memory limits knative/eventing#921

Merged

theofpa mentioned this pull request Jul 3, 2020

Increase prometheus mem limit to prevent crashing #8557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set memory limit #3332

Set memory limit #3332

duglin commented Feb 27, 2019 •

edited

Loading

markusthoemmes left a comment

markusthoemmes commented Feb 27, 2019

greghaynes commented Feb 27, 2019

duglin commented Feb 27, 2019 •

edited

Loading

duglin commented Feb 27, 2019

duglin commented Feb 27, 2019

duglin commented Feb 28, 2019

duglin commented Feb 28, 2019

duglin commented Feb 28, 2019

mdemirhan commented Mar 6, 2019

duglin commented Mar 6, 2019

duglin commented Mar 7, 2019

mdemirhan Mar 7, 2019

duglin Mar 8, 2019

mdemirhan Mar 8, 2019

duglin commented Mar 8, 2019 •

edited

Loading

mdemirhan commented Mar 8, 2019

mdemirhan commented Mar 8, 2019

knative-prow-robot commented Mar 8, 2019

duglin commented Mar 9, 2019

mdemirhan commented Mar 11, 2019

mdemirhan commented Mar 12, 2019

Set memory limit #3332

Set memory limit #3332

Conversation

duglin commented Feb 27, 2019 • edited Loading

markusthoemmes left a comment

Choose a reason for hiding this comment

markusthoemmes commented Feb 27, 2019

greghaynes commented Feb 27, 2019

duglin commented Feb 27, 2019 • edited Loading

duglin commented Feb 27, 2019

duglin commented Feb 27, 2019

duglin commented Feb 28, 2019

duglin commented Feb 28, 2019

duglin commented Feb 28, 2019

mdemirhan commented Mar 6, 2019

duglin commented Mar 6, 2019

duglin commented Mar 7, 2019

mdemirhan Mar 7, 2019

Choose a reason for hiding this comment

duglin Mar 8, 2019

Choose a reason for hiding this comment

mdemirhan Mar 8, 2019

Choose a reason for hiding this comment

duglin commented Mar 8, 2019 • edited Loading

mdemirhan commented Mar 8, 2019

mdemirhan commented Mar 8, 2019

knative-prow-robot commented Mar 8, 2019

duglin commented Mar 9, 2019

mdemirhan commented Mar 11, 2019

mdemirhan commented Mar 12, 2019

duglin commented Feb 27, 2019 •

edited

Loading

duglin commented Feb 27, 2019 •

edited

Loading

duglin commented Mar 8, 2019 •

edited

Loading