[k8s.io] Restart [Disruptive] should restart all nodes and ensure all nodes and pods recover #37202

Random-Liu · 2016-11-21T09:41:50Z

The restart test is broke by #37070.

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/restart.go:124
Expected error:
    <*errors.errorString | 0xc420cd42a0>: {
        s: "couldn't find 28 pods within 5m0s; last error: expected to find 28 pods but found only 29",
    }
    couldn't find 28 pods within 5m0s; last error: expected to find 28 pods but found only 29
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/restart.go:119

The original issue is #34003. The reason is #34003 (comment):

The mirror pod of the e2e-image-puller pod has been deleted in the network partition test by the node controller. kubelet currently doesn't try to recreate the mirror pod (or even sync it) if the pod has already terminated. In the restart test, kubelet got restarted, and the in-memory status cache was cleared. kubelet then decides to sync the pod once to generate the status, which leads to creating of the mirror pod.

We fixed this by filtering out RestartNever mirror pod in the test. However #37070 changed image-puller to RestartOnFailure which broke the workaround.

A quick fix is to filter out non-RestartAlways pods. Because either RestartNever or RestartOnFailure pods could succeed, and we can not deal with terminated mirror pods very well now.

@yujuhong @gmarek
/cc @kubernetes/sig-node

The text was updated successfully, but these errors were encountered:

gmarek · 2016-11-21T09:48:01Z

Yeah... I see there's a number of problems here. I looked at the history of the image puller config and it looked to me that it was 'Never' from the beginning...

Thanks for the fix though.

calebamiles · 2016-11-21T18:05:07Z

@Random-Liu @yujuhong, @gmarek is this a release blocker for 1.5? Please update the issue ASAP, thanks!

cc: @kubernetes/sig-node, @saad-ali, @dims

@yujuhong

Automatic merge from submit-queue Filter out non-RestartAlways mirror pod in restart test. Fixes #37202. > A quick fix is to filter out non-RestartAlways pods. Because either RestartNever and RestartOnFailure pods could succeed, and we can not deal with terminated mirror pods very well now. @yujuhong @gmarek /cc @kubernetes/sig-node

Random-Liu added area/test kind/flake Categorizes issue or PR as related to a flaky test. sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet labels Nov 21, 2016

Random-Liu mentioned this issue Nov 21, 2016

Filter out non-RestartAlways mirror pod in restart test. #37203

Merged

Random-Liu added this to the v1.5 milestone Nov 21, 2016

Random-Liu self-assigned this Nov 21, 2016

gmarek added the release-blocker label Nov 21, 2016

k8s-github-robot closed this as completed in #37203 Nov 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k8s.io] Restart [Disruptive] should restart all nodes and ensure all nodes and pods recover #37202

[k8s.io] Restart [Disruptive] should restart all nodes and ensure all nodes and pods recover #37202

Random-Liu commented Nov 21, 2016 •

edited

Loading

gmarek commented Nov 21, 2016

calebamiles commented Nov 21, 2016

[k8s.io] Restart [Disruptive] should restart all nodes and ensure all nodes and pods recover #37202

[k8s.io] Restart [Disruptive] should restart all nodes and ensure all nodes and pods recover #37202

Comments

Random-Liu commented Nov 21, 2016 • edited Loading

gmarek commented Nov 21, 2016

calebamiles commented Nov 21, 2016

Random-Liu commented Nov 21, 2016 •

edited

Loading