Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite} #33882

Closed
k8s-github-robot opened this issue Oct 1, 2016 · 73 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@k8s-github-robot
Copy link

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gci-gke-reboot-release-1.4/356/

Failed: [k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:98
Oct  1 04:43:30.313: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:158
@k8s-github-robot k8s-github-robot assigned ghost Oct 1, 2016
@k8s-github-robot k8s-github-robot added kind/flake Categorizes issue or PR as related to a flaky test. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 1, 2016
@k8s-github-robot
Copy link
Author

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gci-gke-reboot/246/

Failed: [k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:98
Oct  3 12:11:33.161: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:158

@ghost
Copy link

ghost commented Oct 3, 2016

Duplicate of #33874

@k8s-github-robot
Copy link
Author

k8s-github-robot commented Oct 7, 2016

Builds:
kubernetes-e2e-gci-gce-reboot-release-1.4 655 689 694 1225
kubernetes-e2e-gci-gce-reboot 764 771 876 919 1210 1315
kubernetes-e2e-gci-gke-reboot-release-1.4 722

Failed: [k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:98
Oct  6 17:02:19.898: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:158

@k8s-github-robot k8s-github-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 7, 2016
@k8s-github-robot
Copy link
Author

k8s-github-robot commented Oct 7, 2016

Builds:
kubernetes-e2e-gci-gce-reboot-release-1.4 685 1567

Failed: [k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:134
Oct  7 11:17:36.388: All nodes should be ready after test, Not ready nodes: [&TypeMeta{Kind:,APIVersion:,}]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:418

@k8s-github-robot k8s-github-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 7, 2016
@k8s-github-robot
Copy link
Author

[FLAKE-PING] @quinton-hoole

This flaky-test issue would love to have more attention.

2 similar comments
@k8s-github-robot
Copy link
Author

[FLAKE-PING] @quinton-hoole

This flaky-test issue would love to have more attention.

@k8s-github-robot
Copy link
Author

[FLAKE-PING] @quinton-hoole

This flaky-test issue would love to have more attention.

@k8s-github-robot
Copy link
Author

[FLAKE-PING] @quinton-hoole

This flaky-test issue would love to have more attention.

@k8s-github-robot
Copy link
Author

[FLAKE-PING] @quinton-hoole

This flaky-test issue would love to have more attention.

1 similar comment
@k8s-github-robot
Copy link
Author

[FLAKE-PING] @quinton-hoole

This flaky-test issue would love to have more attention.

@k8s-github-robot
Copy link
Author

[FLAKE-PING] @quinton-hoole

This flaky-test issue would love to have more attention.

@saad-ali
Copy link
Member

saad-ali commented Dec 1, 2016

Ack thanks for the investigation guys. Will stop tracking this for 1.5 release.

@k8s-github-robot
Copy link
Author

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke-reboot-release-1.5/338/

Failed: [k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:99
Dec  1 14:56:26.683: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:159

@adityakali
Copy link
Contributor

@vishh the patch in moby/moby#25523 does not apply to our docker v1.11.2 cleanly. Basically the module pkg/ioutils/fswriters.go itself was introduced later.

@vishh
Copy link
Contributor

vishh commented Dec 2, 2016

@adityakali Would it help if we come up with a patch for docker v1.11.2?

@vishh
Copy link
Contributor

vishh commented Dec 2, 2016

@adityakali They cherry-pick is pretty straight forward, even though there is a conflict. pkg/ioutils/fswriters.go is just a utility file and adding that file should not cause any other issues.

@vishh
Copy link
Contributor

vishh commented Dec 2, 2016

@mtaufen as for #33882 (comment), serial console says that the node was initialized successfully.

[�[32m OK �[0m] Started Google Compute Engine user startup scripts. [ 4.518309] cloud-init[623]: Cloud-init v. 0.7.6 running 'init-local' at Thu, 01 Dec 2016 22:50:59 +0000. Up 4.48 seconds. [�[32m OK �[0m] Started Initial cloud-init job (pre-networking). Starting Initial cloud-init job (metadata service crawler)... [�[32m OK �[0m] Started Initialize device policy. [�[32m OK �[0m] Started GCI Device Policy Service. [�[32m OK �[0m] Started Chromium OS system update service. [�[32m OK �[0m] Started Metrics Daemon. [ 5.097754] cloud-init[906]: Cloud-init v. 0.7.6 running 'init' at Thu, 01 Dec 2016 22:51:00 +0000. Up 5.07 seconds. [ 5.118548] cloud-init[906]: ci-info: ++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++ [ 5.122559] cloud-init[906]: ci-info: +--------+------+------------+-----------------+-------------------+ [ 5.126536] cloud-init[906]: ci-info: | Device | Up | Address | Mask | Hw-Address | [ 5.130358] cloud-init[906]: ci-info: +--------+------+------------+-----------------+-------------------+ [ 5.134401] cloud-init[906]: ci-info: | lo: | True | 127.0.0.1 | 255.0.0.0 | . | [ 5.138431] cloud-init[906]: ci-info: | eth0: | True | 10.240.0.2 | 255.255.255.255 | 42:01:0a:f0:00:02 | [ 5.142388] cloud-init[906]: ci-info: +--------+------+------------+-----------------+-------------------+ [ 5.146372] cloud-init[906]: ci-info: ++++++++++++++++++++++++++++++++Route info++++++++++++++++++++++++++++++++ [ 5.150357] cloud-init[906]: ci-info: +-------+-------------+------------+-----------------+-----------+-------+ [ 5.154355] cloud-init[906]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | [ 5.158382] cloud-init[906]: ci-info: +-------+-------------+------------+-----------------+-----------+-------+ [ 5.162538] cloud-init[906]: ci-info: | 0 | 0.0.0.0 | 10.240.0.1 | 0.0.0.0 | eth0 | UG | [ 5.166437] cloud-init[906]: ci-info: | 1 | 0.0.0.0 | 10.240.0.1 | 0.0.0.0 | eth0 | UG | [ 5.170642] cloud-init[906]: ci-info: | 2 | 10.240.0.1 | 0.0.0.0 | 255.255.255.255 | eth0 | UH | [ 5.175477] cloud-init[906]: ci-info: | 3 | 10.240.0.1 | 0.0.0.0 | 255.255.255.255 | eth0 | UH | [ 5.179474] cloud-init[906]: ci-info: +-------+-------------+------------+-----------------+-----------+-------+ [�[32m OK �[0m] Started Initial cloud-init job (metadata service crawler). [�[32m OK �[0m] Reached target Cloud-config availability. Starting Apply the settings specified in cloud-config... [ 5.623712] cloud-init[932]: Cloud-init v. 0.7.6 running 'modules:config' at Thu, 01 Dec 2016 22:51:00 +0000. Up 5.57 seconds. [�[32m OK �[0m] Started Apply the settings specified in cloud-config. Starting Execute cloud user/final scripts... [ 5.931904] cloud-init[947]: Cloud-init v. 0.7.6 running 'modules:final' at Thu, 01 Dec 2016 22:51:00 +0000. Up 5.88 seconds. <14>Dec 1 22:51:00 ec2: <14>Dec 1 22:51:00 ec2: ############################################################# <14>Dec 1 22:51:00 ec2: -----BEGIN SSH HOST KEY FINGERPRINTS----- <14>Dec 1 22:51:00 ec2: -----END SSH HOST KEY FINGERPRINTS----- <14>Dec 1 22:51:00 ec2: ############################################################# -----BEGIN SSH HOST KEY KEYS----- -----END SSH HOST KEY KEYS----- [ 5.969856] cloud-init[947]: Cloud-init v. 0.7.6 finished at Thu, 01 Dec 2016 22:51:00 +0000. Datasource DataSourceGCE. Up 5.96 seconds [�[32m OK �[0m] Started Execute cloud user/final scripts. [ 9.811552] Bridge firewalling registered [ 10.277742] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready [�[32m OK �[0m] Started Docker Application Container Engine. [�[32m OK �[0m] Reached target Multi-User System. [�[32m OK �[0m] Reached target Graphical Interface.

gke-bootstrap-e2e-default-pool-912eb71d-jktv login:

But, I don't see any logs from docker or kubelet or any other service from around that time. As part of the e2e infra, may be we should perform a manual healthz on kubelet & docker in addition to collecting the logs?

@k8s-github-robot
Copy link
Author

k8s-github-robot commented Dec 2, 2016

Builds:
ci-kubernetes-e2e-gci-gce-reboot-release-1.4 632 649 695 852
ci-kubernetes-e2e-gci-gce-reboot 854 1026
ci-kubernetes-e2e-gci-gke-reboot 476

Failed: [k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:98
Dec  2 08:04:17.931: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:158

@dims
Copy link
Member

dims commented Dec 9, 2016

@mtaufen Is it appropriate to move this to the next milestone? (and remove the non-release-blocker tag as well)

@k8s-github-robot
Copy link
Author

k8s-github-robot commented Dec 10, 2016

Builds:
ci-kubernetes-e2e-gci-gce-reboot 1047 1050 1053 1072 1076 1085 1105 1109 1113
ci-kubernetes-e2e-gci-gke-reboot 590

Failed: [k8s.io] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:102
Dec  9 17:31:48.947: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/reboot.go:168

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

9 participants