Prepare kube-system pods manifest for trusty nodes. #18115

andyzheng0831 · 2015-12-03T00:10:29Z

The current trusty node support does not rely on salt, but it still downloads the salt tarball to extract needed kube-system pods manifest files for simplicity. However, this method will not work for master on trusty, because there are several manifest files with salt configuration. This change get rids of using salt tarball in trusty nodes. Instead, we put the manifest files under the directory cluster/gce/trusty/kube-manifest. For ongoing work of master on trusty, we will follow the same logic to get kube-system pods manifest.

andyzheng0831 · 2015-12-03T00:16:13Z

cc/ @dchen1107 @roberthbailey @zmerlynn
cc/ @yinghan @wonderfly

k8s-github-robot · 2015-12-03T00:25:20Z

Labelling this PR as size/L

k8s-bot · 2015-12-03T00:48:45Z

GCE e2e test build/test passed for commit 3f66fd1746a10585016550976e9d93c7d5aa6076.

k8s-github-robot · 2015-12-03T00:49:18Z

The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label?

wonderfly · 2015-12-03T18:42:12Z

This change get rids of using salt tarball in trusty nodes. Instead, we put the manifest files under the directory cluster/gce/trusty/kube-manifest.

Are those manifest files versioned? How do you make sure the checked in version is always the latest one (if you want the latest)?

wonderfly · 2015-12-03T18:44:09Z

cluster/gce/trusty/configure.sh

+# $1 is the file to creat
+# $2 is the URL to download
+download_or_bust() {
+  rm -f $1


Nit: you could do rm -f $1 >& /dev/null to suppress the warning xxx does not exist in case $1 does not exist.

andyzheng0831 · 2015-12-03T18:53:49Z

There is no way to make sure everything is update to date, unless we simply copy files from salt tar, which was the old way I took. But now, considering the master side change, it is very hard to keep that manner. Some master manifest files contain salt configuration and variables, which need to dynamically determine the variable values through salt and then replace the variables with values when put manifest files under /etc/kubernetes/manifest. As long as this kind of dynamic variable evaluation exists, copying files from salt tar does not work for trusty. This is also a problem in running k8s on coreos.

It means two possible ways: (1) make a script to parse and evaluate variables. This is almost very error-prone, because some manifest files do not follow exact the same style; (2) simply make a copy of manifest files without salt config. In either way, a change in the manifest file under salt dir can break the usage in trusty, so why not use a simple way.

k8s-bot · 2015-12-03T19:54:37Z

GCE e2e test build/test passed for commit 7c9e71646107a52ec7a8762c77edf66d2b9232ba.

dchen1107 · 2015-12-03T21:47:31Z

cluster/gce/trusty/kube-manifest/fluentd-es.yaml

@@ -0,0 +1,29 @@
+apiVersion: v1


Why we need an extra copy here for trusty? Shouldn't they are same for all os distro?

An alternative: for the manifest files without salt configuration, such as these three files, we can simply copy them from cluster/saltbase/salt when creating the kube-manifest.tar.gz. So, the dir cluster/gce/trusty/kube-manifest only contains the manifests containing salt configurations, such as etcd, api-server. This should be used by both CoreOS and trusty, so I am fine to move it to be cluster/gce/kube-manifest. Does it solve your concern?

dchen1107 · 2015-12-03T21:50:05Z

Can you bring up a cluster with trusty image and run e2e test against it? Please copy & paste the test result here too. Thanks!

andyzheng0831 · 2015-12-03T23:19:32Z

@dchen1107 , I revised the code to simply copy the three manifests from salt dir, because we can directly use them on trusty. The master side manifests will follow this framework, i.e, make copies in a tmp dir, pack them, and upload the tarball to GCS. If a master manifest contains salt config, we cannot directly use it on tursty/coreos. Maybe we should make a dir under cluster/gce/ to host copies. But that is left to next PRs instead of this one, to make this PR self-contained and clean.

I will paste e2e test results later. It is still running

k8s-bot · 2015-12-04T00:17:53Z

GCE e2e build/test failed for commit 59f3bdea651eb8e08674fe9aa20851efb7e4cf8f.

andyzheng0831 · 2015-12-04T00:34:00Z

@dchen1107 here is the e2e tests. One failure, but I see it is listed in the GCE flaky test. I did not blacklist when running e2e. I just ran this particular tests 4 times, it passed all the time.

Summarizing 1 Failure:

[Fail] NodeOutOfDisk [BeforeEach] runs out of disk space
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/nodeoutofdisk.go:252

Ran 128 of 206 Specs in 4530.573 seconds
FAIL! -- 127 Passed | 1 Failed | 2 Pending | 76 Skipped --- FAIL: TestE2E (4531.52s)
FAIL

andyzheng0831 · 2015-12-04T00:40:29Z

@k8s-bot test this please

k8s-bot · 2015-12-04T01:15:15Z

GCE e2e build/test failed for commit 59f3bdea651eb8e08674fe9aa20851efb7e4cf8f.

roberthbailey · 2015-12-04T06:54:04Z

cluster/gce/trusty/configure.sh

+#
+# $1 is the file to creat
+# $2 is the URL to download
+download_or_bust() {


The version in configure-vm.sh uses an until to actually keep trying. The comment here says it retries but it doesn't seem as though that is implemented.

Ah, yes, the code mismatches the comment. Which do you think is more reasonable, loop permanently until successfully downloading it, or try several times then fail? Originally I simply copied the code form configure-vm.sh, but later I thought permanently loop may not make much sense.

Why wouldn't a permanent loop make sense? What is the behavior of the node if one of these downloads fails? Can it ever be a useful node in the cluster? Continuing to retry, in the hope that the underlying problem will be solved, seems preferable to giving up permanently.

I just remember there is a timeout for nodes to finish registering with master. It is not permanently trying, but trying with a minutes level timeout. In this manner, I think it makes sense. Will restore it to the original logic

This change refactors the code of preparing kube-system manifests for trusty based cluster. The manifests used by nodes do not contain salt configuration, so we can simply copy them from the directory cluster/saltbase/salt, make a tarball, and upload to Google Storage.

andyzheng0831 · 2015-12-04T23:52:40Z

@roberthbailey , please see the new version which addresses your comments. Thanks

k8s-bot · 2015-12-05T00:12:36Z

GCE e2e test build/test passed for commit 2bdb85f16ca3a30f5e16a6929b012ccfe9a1f169.

k8s-bot · 2015-12-05T00:34:44Z

GCE e2e test build/test passed for commit 816b295.

andyzheng0831 · 2015-12-05T06:06:29Z

@dchen1107 @roberthbailey please add "ok-to-merge" if you think it is ready for that

k8s-github-robot · 2015-12-07T00:29:56Z

@k8s-bot test this

Tests are more than 48 hours old. Re-running tests.

k8s-bot · 2015-12-07T01:03:25Z

GCE e2e build/test failed for commit 816b295.

andyzheng0831 · 2015-12-07T01:36:56Z

@k8s-bot test this

k8s-bot · 2015-12-07T02:12:27Z

GCE e2e build/test failed for commit 816b295.

andyzheng0831 · 2015-12-07T17:22:31Z

@k8s-bot test this

k8s-bot · 2015-12-07T17:57:03Z

GCE e2e build/test failed for commit 816b295.

andyzheng0831 · 2015-12-07T18:43:50Z

@k8s-bot test this

Hard to believe how a trusty change will hurt tests on containervm.

k8s-bot · 2015-12-07T19:14:35Z

GCE e2e test build/test passed for commit 816b295.

andyzheng0831 · 2015-12-08T17:43:01Z

Is there anything missing for the PR to be merged?

roberthbailey · 2015-12-08T21:24:38Z

Nope, the submit queue has been stuck for almost a day.

k8s-github-robot · 2015-12-09T07:31:52Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2015-12-09T08:04:44Z

GCE e2e build/test failed for commit 816b295.

andyzheng0831 · 2015-12-09T17:09:12Z

@k8s-bot test this

k8s-bot · 2015-12-09T18:00:52Z

GCE e2e test build/test passed for commit 816b295.

k8s-github-robot · 2015-12-09T20:50:23Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2015-12-09T21:26:09Z

GCE e2e test build/test passed for commit 816b295.

k8s-github-robot · 2015-12-09T21:26:32Z

Automatic merge from submit-queue

Auto commit by PR queue bot

googlebot added the cla: yes label Dec 3, 2015

k8s-github-robot assigned davidopp Dec 3, 2015

k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 3, 2015

k8s-github-robot added the needs-ok-to-merge label Dec 3, 2015

wonderfly reviewed Dec 3, 2015
View reviewed changes

dchen1107 assigned roberthbailey and unassigned davidopp Dec 3, 2015

dchen1107 reviewed Dec 3, 2015
View reviewed changes

roberthbailey reviewed Dec 4, 2015
View reviewed changes

roberthbailey added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2015

roberthbailey added ok-to-merge and removed needs-ok-to-merge labels Dec 5, 2015

k8s-github-robot pushed a commit that referenced this pull request Dec 9, 2015

Merge pull request #18115 from andyzheng0831/trusty

d0965bc

Auto commit by PR queue bot

k8s-github-robot merged commit d0965bc into kubernetes:master Dec 9, 2015

This was referenced Dec 9, 2015

Run master node on ubuntu trusty on GCE #16702

Closed

Add release support for trusty kube-system manifests #18485

Merged

yifan-gu mentioned this pull request Dec 22, 2015

cluster/gce/coreos: Update configs to enable coreos cluster on gce #17243

Merged

Prepare kube-system pods manifest for trusty nodes. #18115

Prepare kube-system pods manifest for trusty nodes. #18115

Conversation

andyzheng0831 commented Dec 3, 2015

andyzheng0831 commented Dec 3, 2015

k8s-github-robot commented Dec 3, 2015

k8s-bot commented Dec 3, 2015

k8s-github-robot commented Dec 3, 2015

wonderfly commented Dec 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyzheng0831 commented Dec 3, 2015

k8s-bot commented Dec 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dchen1107 commented Dec 3, 2015

andyzheng0831 commented Dec 3, 2015

k8s-bot commented Dec 4, 2015

andyzheng0831 commented Dec 4, 2015

andyzheng0831 commented Dec 4, 2015

k8s-bot commented Dec 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyzheng0831 commented Dec 4, 2015

k8s-bot commented Dec 5, 2015

k8s-bot commented Dec 5, 2015

andyzheng0831 commented Dec 5, 2015

k8s-github-robot commented Dec 7, 2015

k8s-bot commented Dec 7, 2015

andyzheng0831 commented Dec 7, 2015

k8s-bot commented Dec 7, 2015

andyzheng0831 commented Dec 7, 2015

k8s-bot commented Dec 7, 2015

andyzheng0831 commented Dec 7, 2015

k8s-bot commented Dec 7, 2015

andyzheng0831 commented Dec 8, 2015

roberthbailey commented Dec 8, 2015

k8s-github-robot commented Dec 9, 2015

k8s-bot commented Dec 9, 2015

andyzheng0831 commented Dec 9, 2015

k8s-bot commented Dec 9, 2015

k8s-github-robot commented Dec 9, 2015

k8s-bot commented Dec 9, 2015

k8s-github-robot commented Dec 9, 2015