Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace containervm with GCI as default master image for GCE clusters #26197

Merged
merged 1 commit into from
May 24, 2016

Conversation

wonderfly
Copy link
Contributor

@wonderfly wonderfly commented May 24, 2016

GCE clusters start using GCI as the default OS image for masters

fixes #25977

I ran the default e2e test suites (same tests that kubernetes-e2e-gce-master runs) with this change and 165 out of 178 passed. The failed tests might be flaky. I'm taking a look at the failures now and will update on the PR once I have more details. I'd also like it to run through the slow and serial tests before merge, but want to give you an idea of how the change looks like.

@roberthbailey @dchen1107 @andyzheng0831 Can you take a look?

cc/ @lavalamp @kubernetes/goog-image

@wonderfly wonderfly added area/platform/gce sig/node Categorizes an issue or PR as relevant to SIG Node. area/os/gci labels May 24, 2016
@wonderfly wonderfly force-pushed the update_default_master_image branch from 032f756 to 5e1dc24 Compare May 24, 2016 19:00
@dchen1107 dchen1107 self-assigned this May 24, 2016
@dchen1107 dchen1107 added this to the v1.3 milestone May 24, 2016
@dchen1107 dchen1107 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 24, 2016
@dchen1107
Copy link
Member

I added release-note label for this one, but upto @roberthbailey to make the final decision.

@wonderfly
Copy link
Contributor Author

I took a look at the failures. 10 out of the 13 failures have been seen constantly in recent kubernetes-e2e-gke runs, which hopefully will be fixed by #26190. I am re-running the three new. Will update soon.

Test Flaky in kubernetes-e2e-gke
[Fail] [k8s.io] Kubectl client [k8s.io] Kubectl cluster-info [It] should check if Kubernetes master services is included in cluster-info [Conformance] Yes
[Fail] [k8s.io] DNS [It] should provide DNS for pods for Hostname and Subdomain Annotation Yes
[Fail] [k8s.io] Kubernetes Dashboard [It] should check that the kubernetes-dashboard instance is alive Yes
[Fail] [k8s.io] DNS [It] should provide DNS for services [Conformance] Yes
[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance] Yes
[Fail] [k8s.io] Networking [It] should provide Internet connection for containers [Conformance] Yes
[Fail] [k8s.io] Services [It] should create endpoints for unready pods Yes
[Fail] [k8s.io] Horizontal pod autoscaling (scale resource: CPU) [k8s.io] ReplicationController light [It] Should scale from 1 pod to 2 pods Yes
[Fail] [k8s.io] Kubectl client [k8s.io] Guestbook application [It] should create and stop a working application [Conformance] Yes
[Fail] [k8s.io] Horizontal pod autoscaling (scale resource: CPU) [k8s.io] ReplicationController light [It] Should scale from 2 pods to 1 pod Yes
[Fail] [k8s.io] Monitoring [It] should verify monitoring pods and all cluster nodes are available on influxdb using heapster. No
[Fail] [k8s.io] Kibana Logging Instances Is Alive [It] should check that the Kibana logging instance is alive No
[Fail] [k8s.io] Addon update [It] should propagate add-on file changes No

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 24, 2016
@roberthbailey
Copy link
Contributor

@wonderfly can you rebase on top of #26195? I expect that will allow the e2e tests to pass.

# TODO(#26183): Provide a way to differentiate master OS distro and node OS
# distro.
OS_DISTRIBUTION=${KUBE_OS_DISTRIBUTION:-gci}
MASTER_IMAGE=${KUBE_GCE_MASTER_IMAGE:-}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment that for gci leaving this blank will auto-select an appropriate image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@roberthbailey
Copy link
Contributor

@dchen1107 This should definitely have the release-note label set (as you've done).

@wonderfly wonderfly force-pushed the update_default_master_image branch from 5e1dc24 to 36cebbb Compare May 24, 2016 20:43
@wonderfly
Copy link
Contributor Author

This should definitely have the release-note label set (as you've done).

Added release notes. Let me know if you have any suggestions.

@lavalamp lavalamp added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 24, 2016
@lavalamp
Copy link
Member

We had another multi hour submit queue blockage due to this this morning. P0 to get GCI tested in presumbits.

@wonderfly
Copy link
Contributor Author

Good news is most tests start to pass after the rebase. The only one failing is a known flaky (#26131). The unit/integration test failure is false alert too.

Bad news is, my desktop died and I am still waiting till it comes back...

@k8s-bot test this issue #26131

@roberthbailey
Copy link
Contributor

integration tests are failing:

Verifying ./hack/../hack/verify-flags-underscore.py
Found illegal 'flag' usage. If these are false positives you should run `hack/verify-flags-underscore.py -e > hack/verify-flags/exceptions.txt` to update the list.
cluster/gce/config-default.sh:# reloads <os_distro>/helper.sh in the gap between when the master is created
FAILED   ./hack/../hack/verify-flags-underscore.py  57s
Build step 'Execute shell' marked build as failure

@wonderfly
Copy link
Contributor Author

Yes, that's what I referred to as a false alert. I'm updating the exception list as suggested.

@wonderfly wonderfly force-pushed the update_default_master_image branch from 36cebbb to 3d95151 Compare May 24, 2016 22:31
@k8s-bot
Copy link

k8s-bot commented May 24, 2016

GCE e2e build/test passed for commit 3d95151.

@roberthbailey roberthbailey added lgtm "Looks good to me", indicates that a PR is ready to be merged. e2e-not-required labels May 24, 2016
@lavalamp
Copy link
Member

@wonderfly: thanks for fixing that-- for future reference, that wasn't a "false alert", if that had been merged we'd have been broken at head.

@@ -347,7 +347,7 @@ function find-release-tars() {

# This tarball is used by GCI, Ubuntu Trusty, and CoreOS.
KUBE_MANIFESTS_TAR=
if [[ "${KUBE_OS_DISTRIBUTION:-}" == "trusty" || "${KUBE_OS_DISTRIBUTION:-}" == "gci" || "${KUBE_OS_DISTRIBUTION:-}" == "coreos" ]]; then
if [[ "${OS_DISTRIBUTION:-}" == "trusty" || "${OS_DISTRIBUTION:-}" == "gci" || "${OS_DISTRIBUTION:-}" == "coreos" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Contributor Author

@wonderfly wonderfly May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because now that OS_DISTRIBUTION defaults to gci, one doesn't have to set KUBE_OS_DISTRIBUTION any more to start a GCI cluster. In that case, the old logic would fail.

For other distros on this list, they all have to set KUBE_OS_DISTRIBUTION as before, but OS_DISTRIBUTION will always equal to KUBE_OS_DISTRIBUTION, so it should work for them too. Are you thinking of anything that could potentially break because of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that OS_DISTRIBUTION defaults to gci, one doesn't have to set KUBE_OS_DISTRIBUTION=gci any more to start a GCI cluster, and the old logic will start to fail in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. I was concerned it wasn't set in all paths that call this, but on looking more it looks fine.

@fgrzadkowski
Copy link
Contributor

@mwielgus Is it possible that this PR breaks our deployments scripts? For some reason we don't create manifest for cluster autoscaler any more. How can I see startup script logs? I don't see anything in /var/log/

@wonderfly
Copy link
Contributor Author

@mwielgus Is it possible that this PR breaks our deployments scripts? For some reason we don't create manifest for cluster autoscaler any more. How can I see startup script logs? I don't see anything in /var/log/

Can you give more details how your scripts might be broken? If the same way as @euank , it should be fixed by my follow up PR. Note that this only changes the master to GCI. If you want to find the logs from the master, you can use the journalctl command as GCI aggregates all logs to systemd journald. More general instructions on how to use GCI: https://cloud.google.com/compute/docs/containers/vm-image

cc/ @adityakali cc/ @kubernetes/goog-image

@mwielgus
Copy link
Contributor

I confirm, salt-based manifest for cluster-autoscaler is not added anymore. Previously we were adding it it with https://github.com/kubernetes/kubernetes/blob/master/cluster/saltbase/salt/top.sls. How should we add it now?

@girishkalele
Copy link

Am investigating #26239 and the salt-based kube-dns is no longer getting deployed on the GCI master. My issue looks similar to @mwielgus

@adityakali
Copy link
Contributor

@mwielgus Change ac4b380 was not ported to GCI specific setup it seems. Most of the code is common, but I think we need to update cluster/gce/gci/configure-helper.sh to do equivalent of the salt part (cluster/saltbase/salt/cluster-autoscaler/init.sls).
Would you mind creating a separate issue?
Also, is there a test that verifies this functionality?

Also, I just started looking at this in detail, and I understand what @ihmccreery meant on #26235 a bit better. I think we need better way going forward to verify consistency in existing salt configs and GCI (non-salt based) setup. Also preferably find a way for future changes to move away from adding more salt configs.

k8s-github-robot pushed a commit that referenced this pull request May 27, 2016
Automatic merge from submit-queue

Support for cluster autoscaler in GCE Trusty and GCI images

Fixes: #26346
Ref: #26197

cc: @fgrzadkowski  @vulpecula @piosz @jszczepkowski
wonderfly added a commit to wonderfly/kubernetes that referenced this pull request Jun 1, 2016
This change recovers some of the side effects of
kubernetes#26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
k8s-github-robot pushed a commit that referenced this pull request Jun 2, 2016
Automatic merge from submit-queue

Move the defaults setting of GCI to util.sh

fixes #26291 

This change recovers some of the side effects of
#26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.

@euank @roberthbailey Can you review?
mtaufen pushed a commit to mtaufen/kubernetes that referenced this pull request Jun 6, 2016
This change recovers some of the side effects of
kubernetes#26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
@wonderfly wonderfly deleted the update_default_master_image branch June 8, 2016 21:30
mikedanese added a commit to mikedanese/kubernetes that referenced this pull request Feb 4, 2020
I have 140 commits in this directory and I get a lot of cleanup reviews
and want to be able to approve changes to hack/.golint_failures.

0e69316 delete unused cache
b9c7007 enable token review when openapi is generated
d5bbc35 make deps-approvers the approvers of sample-cli-plugin/Godeps
4186abf bzl: fix update-bazel.sh
7b47229 remove deprecated /proxy paths
b973840 gke-certificates-controller: rm -rf
4961065 cluster: remove unused functions
1e2b644 cluster: move logging library to hack/
bef68f7 cluster: build gci mounter like other go binaries
fe7ba9e kubeadm: use kubelet bootstrap instead of reimplementing
3c39173 fixit: break sig-cluster-lifecycle tests into subpackage
64f77eb enable race detection on integration tests
cdcfa35 promote tls-bootstrap to beta
ff4a814 migrate set generation to go genrule
3600d49 delete benchmark integration tests that don't work at all
21617a6 don't use build tags to mark integration tests
59fc948 bump rules_go and go version for bazel builds
ba5c285 bazel: implement git build stamping
ad42b42 move kubeadm api group testing to kubeadm package
c8ce55f Revert "Merge pull request kubernetes#41132 from kubernetes/revert-40893-kubelet-auth"
cbe5bd9 bump gazel to v14
86d9493 remove second CA used for kubelet auth in favor of webhook auth
04a7880 update repo local config to allow redirects from gopkg.in
44b7246 autogenerated
96c146c promote certificates.k8s.io to beta
087016d update gazel to v8
837eee4 pin gazel to v3
e225625 add a configuration for kubelet to register as a node with taints
584689f implement kubectl procelain csr commands
93f737e fix verify-bazel.sh on mac and windows
5dc7554 bazel: implement set-gen as a bazel genrule
61bd6aa remove docs/user-guide from bindata search path
224e32b make godep licenses/copyright check case insensitive
1cd2968 godep: vendor go-bindata
d380cb1 fix realpath issue on mac
ea632fa Revert "disable bazel build"
27116c6 rename build/ to build-tools/
ee15c80 disable bazel build
999c967 ignore BUILD in the flags-underscore.py validation
b250a88 don't check BUILD file when verifying godeps
a2eec91 add bazel presubmits to verify BUILD files are up to date
c17a8a7 kubectl: apply prune should fallback to basic delete when a resource has no reaper
25e4dcc kubeadm: fix conversion macros and add kubeadm to round trip testing
6d17a87 kubectl: add two more test of kubectl apply --prune
62960aa add a test for kubectl apply --prune
6339d91 add a test to test-cmd.sh for apply -f with label selector
b421bf4 build kube-discovery and kubeadm with release
0c76cf5 fix hack/verify-codegen.sh
9f379df add an option to controller-manager to auto approve all CSRs
95e2e29 move kube-dns to the cluster/addons/ directory
f3de21b move integration tests into individual pacakges
af0177e cleanup hack/verify-govet.sh to throttle process creation
2c93ea5 Merge pull request kubernetes#27289 from mikedanese/split-verify
ee34c76 split verify out of unit/integration suite
d046275 now that go test runs iteration loops, use that instead of custom executor
1ef1906 Merge pull request kubernetes#26197 from wonderfly/update_default_master_image
fbf6bbc Merge pull request kubernetes#25596 from derekparker/inotify
3e1c0b5 run kube-addon-manager in a pod
c5cc0c3 Merge pull request kubernetes#24277 from ihmccreery/upgrade-timeout
132c427 add linux fastbuild option to ./build/release.sh
2857baa use defaults in test-dockerized for etcd prefix and api versions
695211e Merge pull request kubernetes#21105 from caesarxuchao/watchCacheForIntegration
2172e0d Merge pull request kubernetes#21108 from mml/slow-flake
1478cf3 Merge pull request kubernetes#21090 from ihmccreery/feature-reboot
b3172a4 kubelet: add a pidfile
b1743a6 this is a manual reversion of kubernetes#20702
5b27055 Merge pull request kubernetes#19378 from ihmccreery/remove-update-jobs
b743827 Merge pull request kubernetes#19659 from ihmccreery/timeout-reboot
a6589f7 hack: ignore cluster/env.sh in boilerplate check
f71657d retrofit the scheduler with the leader election client.
bf763bb Merge pull request kubernetes#19498 from pwittrock/nodelabels
22cfa5e build: move some of hack/lib/ into a new cluster/lib/
b174fc9 Merge pull request kubernetes#18994 from bprashanth/flannel_suite
a09d85b expose master count configuration in a cli option on apiserver
c2753d7 bump ci go version to 1.5.2
0655e65 fall back to old behavior when deciding mem availablity during build
1d9d11c run kube-proxy in a static pod
91de3a1 cleanup some nits in hack/get-build.sh
cd79c6c fix unbound variable error in hace/get-build.sh
5e64590 renable enable var to correct name and only use it when needed
9bdb860 add apigroup installer and tests
e6d3b47 add componentconfig api group to autogen stuff
88008de Merge pull request kubernetes#16459 from mikedanese/enable-exp
d28d134 Merge pull request kubernetes#16533 from ihmccreery/upgrade-test-fixes
3343522 enable deployment and daemonset in gce upgrade tests
7cbf249 Merge pull request kubernetes#15836 from wojtek-t/codecgen_from_godeps
92404e7 add upgrade test between 1.0 and 1.1 for gce
95b8394 Merge pull request kubernetes#15861 from mikedanese/upgrade-num-minion
ece5779 increase NUM_MINIONS for jenkins gce upgrade test
b8b35af actually promote daemonset simple test out of flaky and skip all daemonset tests in gke
d379a36 copy directory not contents of directory
402e68e add slow test for terminated pod garbage collection
c0943f1 add intermediate e2e runs to gce upgrade
10d56ff promote simple daemonset test out of flaky
b635fc5 Merge pull request kubernetes#15228 from mesosphere/sttts-conformance-tags
392f33e Merge pull request kubernetes#14054 from mikedanese/register-master
fa60bbe add flag to kubelet to ignore the cidr passed down by the apiserver on the master
53e14c7 diff all of pkg/ when verifying swagerspec instead of just pkg/api/
05ef8ed Merge pull request kubernetes#15104 from mikedanese/ds-e2e
fe820fc break up daemonset test into two tests
833be48 enable all experimental flags with one controller
905e971 be explicit about minion group size in upgrade test
ae7d3d5 add gce-upgrade to jenkins/e2e.sh
376faea add pod garbage collection
b0457be Merge pull request kubernetes#13058 from mvdan/go1.5
a48f218 Merge pull request kubernetes#13754 from tummychow/labels-deps
1fec199 Merge pull request kubernetes#13824 from kubernetes/revert-13547-hpa-kubeup
fa40ced move contrib/for-tests to test/images
f061875 updating all references in .sh scripts
8326697 rewrite all links to prs to k8s links
fb02b33 fix build
8e48431 Revert "demote to flaky tests from parallel e2e"
b56edd1 Merge pull request kubernetes#11727 from ZJU-SEL/build-nonstatic-hyperkube
cf4cb1a Merge pull request kubernetes#10474 from kargakis/scale-multiple-controllers
e376a09 demote to flaky service tests from parallel e2e
7c47d6b Merge pull request kubernetes#12009 from smarterclayton/fix_cmd_config
0269e2b Merge pull request kubernetes#11941 from GoogleCloudPlatform/enact_version_md
94a387d Revert "Improve conversion to support multiple packages"
1a613c4 Merge pull request kubernetes#9971 from smarterclayton/make_conversion_more_flexible
0ae48c4 Merge pull request kubernetes#11927 from wojtek-t/remove_shell_services
59a1dd4 Merge pull request kubernetes#11789 from mbforbes/nodesNetwork
6294070 Merge pull request kubernetes#11803 from wojtek-t/move_back_from_flaky
daa6d4d Merge pull request kubernetes#11285 from liggitt/ca
9f16fd9 Merge pull request kubernetes#11860 from ingvagabund/delimiter-for-X-option-eparis
c0acfbc Merge pull request kubernetes#11421 from nikhiljindal/exposeServcPort
ae1c8e5 Merge pull request kubernetes#11737 from thockin/cleanup-remove-v1beta3
01ee1b8 Merge pull request kubernetes#10840 from jbeda/master
d4d99de make mungedoc exit 1 if manual changes are needed and wire up erro message.
337772a fix all tests
055115a fake realpath, and standardize treatment of trailing / of dirs in gendoc
b4514ee fix run-gendocs to point to new repo location
c053b9a add documentation and script on how to get recent and "nightly" builds
719870f add publishing of latest-green.txt to jenkins e2e tests on success
1e130e0 remove --machines from code and docs
dbb47fe remove e2e run before cluster upgrade
de55e17 e2e test cluster stability during upgrade
c9fcf45 fix bad cmd-test for patch.
9f91532 fix error where we can't use patch and add cmd-test for patch and file update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform/gce lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch master node on GCE to GCI image by default