Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare kube-system pods manifest for trusty nodes. #18115

Merged
merged 1 commit into from
Dec 9, 2015
Merged

Prepare kube-system pods manifest for trusty nodes. #18115

merged 1 commit into from
Dec 9, 2015

Conversation

andyzheng0831
Copy link

The current trusty node support does not rely on salt, but it still downloads the salt tarball to extract needed kube-system pods manifest files for simplicity. However, this method will not work for master on trusty, because there are several manifest files with salt configuration. This change get rids of using salt tarball in trusty nodes. Instead, we put the manifest files under the directory cluster/gce/trusty/kube-manifest. For ongoing work of master on trusty, we will follow the same logic to get kube-system pods manifest.

@andyzheng0831
Copy link
Author

@k8s-github-robot
Copy link

Labelling this PR as size/L

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 3, 2015
@k8s-bot
Copy link

k8s-bot commented Dec 3, 2015

GCE e2e test build/test passed for commit 3f66fd1746a10585016550976e9d93c7d5aa6076.

@k8s-github-robot
Copy link

The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label?

@wonderfly
Copy link
Contributor

This change get rids of using salt tarball in trusty nodes. Instead, we put the manifest files under the directory cluster/gce/trusty/kube-manifest.

Are those manifest files versioned? How do you make sure the checked in version is always the latest one (if you want the latest)?

# $1 is the file to creat
# $2 is the URL to download
download_or_bust() {
rm -f $1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you could do rm -f $1 >& /dev/null to suppress the warning xxx does not exist in case $1 does not exist.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

@andyzheng0831
Copy link
Author

There is no way to make sure everything is update to date, unless we simply copy files from salt tar, which was the old way I took. But now, considering the master side change, it is very hard to keep that manner. Some master manifest files contain salt configuration and variables, which need to dynamically determine the variable values through salt and then replace the variables with values when put manifest files under /etc/kubernetes/manifest. As long as this kind of dynamic variable evaluation exists, copying files from salt tar does not work for trusty. This is also a problem in running k8s on coreos.

It means two possible ways: (1) make a script to parse and evaluate variables. This is almost very error-prone, because some manifest files do not follow exact the same style; (2) simply make a copy of manifest files without salt config. In either way, a change in the manifest file under salt dir can break the usage in trusty, so why not use a simple way.

@k8s-bot
Copy link

k8s-bot commented Dec 3, 2015

GCE e2e test build/test passed for commit 7c9e71646107a52ec7a8762c77edf66d2b9232ba.

@dchen1107 dchen1107 assigned roberthbailey and unassigned davidopp Dec 3, 2015
@@ -0,0 +1,29 @@
apiVersion: v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need an extra copy here for trusty? Shouldn't they are same for all os distro?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative: for the manifest files without salt configuration, such as these three files, we can simply copy them from cluster/saltbase/salt when creating the kube-manifest.tar.gz. So, the dir cluster/gce/trusty/kube-manifest only contains the manifests containing salt configurations, such as etcd, api-server. This should be used by both CoreOS and trusty, so I am fine to move it to be cluster/gce/kube-manifest. Does it solve your concern?

@dchen1107
Copy link
Member

Can you bring up a cluster with trusty image and run e2e test against it? Please copy & paste the test result here too. Thanks!

@andyzheng0831
Copy link
Author

@dchen1107 , I revised the code to simply copy the three manifests from salt dir, because we can directly use them on trusty. The master side manifests will follow this framework, i.e, make copies in a tmp dir, pack them, and upload the tarball to GCS. If a master manifest contains salt config, we cannot directly use it on tursty/coreos. Maybe we should make a dir under cluster/gce/ to host copies. But that is left to next PRs instead of this one, to make this PR self-contained and clean.

I will paste e2e test results later. It is still running

@k8s-bot
Copy link

k8s-bot commented Dec 4, 2015

GCE e2e build/test failed for commit 59f3bdea651eb8e08674fe9aa20851efb7e4cf8f.

@andyzheng0831
Copy link
Author

@dchen1107 here is the e2e tests. One failure, but I see it is listed in the GCE flaky test. I did not blacklist when running e2e. I just ran this particular tests 4 times, it passed all the time.

Summarizing 1 Failure:

[Fail] NodeOutOfDisk [BeforeEach] runs out of disk space
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/nodeoutofdisk.go:252

Ran 128 of 206 Specs in 4530.573 seconds
FAIL! -- 127 Passed | 1 Failed | 2 Pending | 76 Skipped --- FAIL: TestE2E (4531.52s)
FAIL

@andyzheng0831
Copy link
Author

@k8s-bot test this please

@k8s-bot
Copy link

k8s-bot commented Dec 4, 2015

GCE e2e build/test failed for commit 59f3bdea651eb8e08674fe9aa20851efb7e4cf8f.

#
# $1 is the file to creat
# $2 is the URL to download
download_or_bust() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version in configure-vm.sh uses an until to actually keep trying. The comment here says it retries but it doesn't seem as though that is implemented.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, the code mismatches the comment. Which do you think is more reasonable, loop permanently until successfully downloading it, or try several times then fail? Originally I simply copied the code form configure-vm.sh, but later I thought permanently loop may not make much sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't a permanent loop make sense? What is the behavior of the node if one of these downloads fails? Can it ever be a useful node in the cluster? Continuing to retry, in the hope that the underlying problem will be solved, seems preferable to giving up permanently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remember there is a timeout for nodes to finish registering with master. It is not permanently trying, but trying with a minutes level timeout. In this manner, I think it makes sense. Will restore it to the original logic

This change refactors the code of preparing kube-system manifests
for trusty based cluster. The manifests used by nodes do not contain
salt configuration, so we can simply copy them from the directory
cluster/saltbase/salt, make a tarball, and upload to Google Storage.
@andyzheng0831
Copy link
Author

@roberthbailey , please see the new version which addresses your comments. Thanks

@k8s-bot
Copy link

k8s-bot commented Dec 5, 2015

GCE e2e test build/test passed for commit 2bdb85f16ca3a30f5e16a6929b012ccfe9a1f169.

@roberthbailey roberthbailey added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2015
@k8s-bot
Copy link

k8s-bot commented Dec 5, 2015

GCE e2e test build/test passed for commit 816b295.

@andyzheng0831
Copy link
Author

@dchen1107 @roberthbailey please add "ok-to-merge" if you think it is ready for that

@k8s-github-robot
Copy link

@k8s-bot test this

Tests are more than 48 hours old. Re-running tests.

@k8s-bot
Copy link

k8s-bot commented Dec 7, 2015

GCE e2e build/test failed for commit 816b295.

@andyzheng0831
Copy link
Author

@k8s-bot test this

@k8s-bot
Copy link

k8s-bot commented Dec 7, 2015

GCE e2e build/test failed for commit 816b295.

@andyzheng0831
Copy link
Author

@k8s-bot test this

@k8s-bot
Copy link

k8s-bot commented Dec 7, 2015

GCE e2e build/test failed for commit 816b295.

@andyzheng0831
Copy link
Author

@k8s-bot test this

Hard to believe how a trusty change will hurt tests on containervm.

@k8s-bot
Copy link

k8s-bot commented Dec 7, 2015

GCE e2e test build/test passed for commit 816b295.

@andyzheng0831
Copy link
Author

Is there anything missing for the PR to be merged?

@roberthbailey
Copy link
Contributor

Nope, the submit queue has been stuck for almost a day.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Dec 9, 2015

GCE e2e build/test failed for commit 816b295.

@andyzheng0831
Copy link
Author

@k8s-bot test this

@k8s-bot
Copy link

k8s-bot commented Dec 9, 2015

GCE e2e test build/test passed for commit 816b295.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Dec 9, 2015

GCE e2e test build/test passed for commit 816b295.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants