-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node e2e memory eviction test #28693
Conversation
@k8s-bot node e2e test this github issue #IGNORE |
489514e
to
d88a696
Compare
@vishh Reassigning to you since I have a PR backlog |
08015ae
to
b0682a3
Compare
@vishh PTAL |
|
||
}) | ||
|
||
func createMemhogPod(f *framework.Framework, genName string, ctnName string, res api.ResourceRequirements) *api.Pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I have an image that uses memhog
itself (derekwaynecarr/memhog).
you can see some sample pods using it here:
https://github.com/derekwaynecarr/kubernetes/tree/examples-eviction/demo/kubelet-eviction
maybe we can just have a dedicated image we share?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, I created vish/stress because memhog was a large image and it does not let us control the rate of consumption of memory.
$ docker run --rm vish/stress --help
Usage of /stress:
-alsologtostderr
log to standard error as well as files
-cpus int
total number of CPUs to utilize
-log_backtrace_at value
when logging hits line file:N, emit a stack trace (default :0)
-log_dir string
If non-empty, write log files in this directory
-logtostderr
log to standard error instead of files
-mem-alloc-size string
amount of memory to be consumed in each allocation (default "4Ki")
-mem-alloc-sleep duration
duration to sleep between allocations (default 1ms)
-mem-total string
total memory to be consumed. Memory will be consumed via multiple allocations.
-stderrthreshold value
logs at or above this threshold go to stderr
-v value
log level for V logs
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
@mtaufen If you can address the pod image comments, this PR can go in. |
Rebased and transitioned to @vishh's image. But I just saw a burstable fail before a best-effort while running the test. I used to set |
I've run a few scenarios to start getting a ballpark for the thresholds where I see proper eviction order and where I don't (I'm operating under the assumption that when I don't, it's due to an OOM-kill, but I haven't actually investigated that yet):
It seems that failures start to crop up somewhere between a threshold of |
I'm ok with the thresholds being higher for now. We can lower them in a On Mon, Aug 8, 2016 at 8:36 AM, Michael Taufen notifications@github.com
|
This test creates three pods with QoS of besteffort, burstable, and guaranteed, respectively, which each contain a container that tries to consume almost all the available memory at a rate of about 12Mi/10sec. The expectation is that eviction will be initiated when the hard memory.available<250Mi threshold is triggered, and that eviction will proceed in the order of besteffort, then burstable. Since guaranteed pods should only be evicted if something charged to the host uses more resources than were reserved for it, we currently end the test when besteffort and burstable have both been evicted. Note that this commit also sets --eviction-hard=memory.available<250Mi to enable eviction during tests.
Ok, set to |
GCE e2e build/test passed for commit 736f1cb. |
@vishh PTAL this can probably go in. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
@k8s-bot test this issue: #IGNORE Tests have been pending for 24 hours |
GCE e2e build/test passed for commit 736f1cb. |
GCE e2e build/test passed for commit 736f1cb. |
Automatic merge from submit-queue |
This tests memory evictions.
See related issue #28619 and fix to cadvisor google/cadvisor#1380.
cc @vishh @derekwaynecarr @timstclair
This change is