-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kubelet] Fix oom-score-adj policy in kubelet #32251
Conversation
The tests need some more work. Posting the PR for feedback while I fix the tests. |
4ebae4e
to
39e99c1
Compare
39e99c1
to
5cbd607
Compare
@dchen1107 The tests have been fixed. I verified the fix manually as well. Improved logging along the way. PTAL. This PR is ready to go! |
Review status: 0 of 10 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. test/e2e_node/container_manager_test.go, line 55 [r1] (raw file):
Can I have one more test case: validate docker's oom_score_adj, then restart docker daemon, revalidate? Comments from Reviewable |
Review status: 0 of 10 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. test/e2e_node/container_manager_test.go, line 55 [r1] (raw file):
|
Ok, could you please file a separate issue for node e2e test too? The test are failed but not due to your pr. LGTM |
@k8s-bot test this github issue #IGNORE |
@vishh I think some recent test-infra changes completely break builds related to GCI (also coreos). |
@k8s-bot test this github issue #IGNORE |
The node e2e tests are passing when I run manually. Puzzling. Looking into cluster level tests now |
5cbd607
to
ff2920f
Compare
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Signed-off-by: Vishnu kannan <vishnuk@google.com>
7b18b24
to
ba6feb2
Compare
GCE e2e build/test passed for commit ba6feb2. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
@vishh This seems pretty risky for this late in the cycle |
GCE e2e build/test passed for commit ba6feb2. |
I'm not comfortable with it really. Why did we change from using mnt namespace to using pid namespace to determine 'is in container' ? In any case, this does not seem to be fixing something I'd consider a blocker bug. Yes, it stinks if docker get oomed, and it'll render the node pretty much useless, but the system as a whole will recover. This is not a regression vs 1.3. I suggest we either don't fix in 1.4 or we punt to 1.4.1. |
We have been having this regression since v1.3. It is critical for GKE/GCE deployments of k8s because docker daemon has a high likelihood of being OOM killed which will end up nuking all containers. |
@pwittrock This is fixing a scale reliability regression that will be visible to customers. The priority call belongs to you and @eparis. |
I'm oging to remove the cherrypick-candidate label since this broke master. You can suggest a pick of the replacement, but my fears are not abated. |
Yeah. SGTM! On Thu, Sep 15, 2016 at 8:50 PM, Eric Paris notifications@github.com
|
Automatic merge from submit-queue Add separate build process for node test. This PR is part of #31093. However, because currently node e2e is built on `KUBE_TEST_PLATFORMS`, which includes linux/amd64, darwin/amd64, windows/amd64 and linux/arm, it caused #32251 to fail. In fact, node e2e is running on the same node with kubelet, and it also has built-in apiserver, etcd and namespace controller. All of them are only built on `KUBE_SERVER_PLATFORMS`, so node e2e should also only be built on those platforms. ``` KUBE_SERVER_PLATFORMS=( linux/amd64 linux/arm linux/arm64 ) ``` This PR added a separate build process for node e2e to address this. @vishh Do you need this for v1.4? because this blocks your #32251. /cc @dchen1107
…r-node-e2e Automatic merge from submit-queue Add separate build process for node test. This PR is part of kubernetes#31093. However, because currently node e2e is built on `KUBE_TEST_PLATFORMS`, which includes linux/amd64, darwin/amd64, windows/amd64 and linux/arm, it caused kubernetes#32251 to fail. In fact, node e2e is running on the same node with kubelet, and it also has built-in apiserver, etcd and namespace controller. All of them are only built on `KUBE_SERVER_PLATFORMS`, so node e2e should also only be built on those platforms. ``` KUBE_SERVER_PLATFORMS=( linux/amd64 linux/arm linux/arm64 ) ``` This PR added a separate build process for node e2e to address this. @vishh Do you need this for v1.4? because this blocks your kubernetes#32251. /cc @dchen1107 (cherry picked from commit dae3bdd)
…r-node-e2e Automatic merge from submit-queue Add separate build process for node test. This PR is part of kubernetes#31093. However, because currently node e2e is built on `KUBE_TEST_PLATFORMS`, which includes linux/amd64, darwin/amd64, windows/amd64 and linux/arm, it caused kubernetes#32251 to fail. In fact, node e2e is running on the same node with kubelet, and it also has built-in apiserver, etcd and namespace controller. All of them are only built on `KUBE_SERVER_PLATFORMS`, so node e2e should also only be built on those platforms. ``` KUBE_SERVER_PLATFORMS=( linux/amd64 linux/arm linux/arm64 ) ``` This PR added a separate build process for node e2e to address this. @vishh Do you need this for v1.4? because this blocks your kubernetes#32251. /cc @dchen1107 (cherry picked from commit dae3bdd)
Docker daemon and kubelet needs to be protected by setting oom-score-adj to -999.
Fixes #32238
This change is