-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare kube-system pods manifest for trusty nodes. #18115
Conversation
Labelling this PR as size/L |
GCE e2e test build/test passed for commit 3f66fd1746a10585016550976e9d93c7d5aa6076. |
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
Are those manifest files versioned? How do you make sure the checked in version is always the latest one (if you want the latest)? |
# $1 is the file to creat | ||
# $2 is the URL to download | ||
download_or_bust() { | ||
rm -f $1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you could do rm -f $1 >& /dev/null
to suppress the warning xxx does not exist
in case $1
does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch
There is no way to make sure everything is update to date, unless we simply copy files from salt tar, which was the old way I took. But now, considering the master side change, it is very hard to keep that manner. Some master manifest files contain salt configuration and variables, which need to dynamically determine the variable values through salt and then replace the variables with values when put manifest files under /etc/kubernetes/manifest. As long as this kind of dynamic variable evaluation exists, copying files from salt tar does not work for trusty. This is also a problem in running k8s on coreos. It means two possible ways: (1) make a script to parse and evaluate variables. This is almost very error-prone, because some manifest files do not follow exact the same style; (2) simply make a copy of manifest files without salt config. In either way, a change in the manifest file under salt dir can break the usage in trusty, so why not use a simple way. |
GCE e2e test build/test passed for commit 7c9e71646107a52ec7a8762c77edf66d2b9232ba. |
@@ -0,0 +1,29 @@ | |||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need an extra copy here for trusty? Shouldn't they are same for all os distro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative: for the manifest files without salt configuration, such as these three files, we can simply copy them from cluster/saltbase/salt when creating the kube-manifest.tar.gz. So, the dir cluster/gce/trusty/kube-manifest only contains the manifests containing salt configurations, such as etcd, api-server. This should be used by both CoreOS and trusty, so I am fine to move it to be cluster/gce/kube-manifest. Does it solve your concern?
Can you bring up a cluster with trusty image and run e2e test against it? Please copy & paste the test result here too. Thanks! |
@dchen1107 , I revised the code to simply copy the three manifests from salt dir, because we can directly use them on trusty. The master side manifests will follow this framework, i.e, make copies in a tmp dir, pack them, and upload the tarball to GCS. If a master manifest contains salt config, we cannot directly use it on tursty/coreos. Maybe we should make a dir under cluster/gce/ to host copies. But that is left to next PRs instead of this one, to make this PR self-contained and clean. I will paste e2e test results later. It is still running |
GCE e2e build/test failed for commit 59f3bdea651eb8e08674fe9aa20851efb7e4cf8f. |
@dchen1107 here is the e2e tests. One failure, but I see it is listed in the GCE flaky test. I did not blacklist when running e2e. I just ran this particular tests 4 times, it passed all the time. Summarizing 1 Failure: [Fail] NodeOutOfDisk [BeforeEach] runs out of disk space Ran 128 of 206 Specs in 4530.573 seconds |
@k8s-bot test this please |
GCE e2e build/test failed for commit 59f3bdea651eb8e08674fe9aa20851efb7e4cf8f. |
# | ||
# $1 is the file to creat | ||
# $2 is the URL to download | ||
download_or_bust() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version in configure-vm.sh uses an until to actually keep trying. The comment here says it retries but it doesn't seem as though that is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, the code mismatches the comment. Which do you think is more reasonable, loop permanently until successfully downloading it, or try several times then fail? Originally I simply copied the code form configure-vm.sh, but later I thought permanently loop may not make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't a permanent loop make sense? What is the behavior of the node if one of these downloads fails? Can it ever be a useful node in the cluster? Continuing to retry, in the hope that the underlying problem will be solved, seems preferable to giving up permanently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remember there is a timeout for nodes to finish registering with master. It is not permanently trying, but trying with a minutes level timeout. In this manner, I think it makes sense. Will restore it to the original logic
This change refactors the code of preparing kube-system manifests for trusty based cluster. The manifests used by nodes do not contain salt configuration, so we can simply copy them from the directory cluster/saltbase/salt, make a tarball, and upload to Google Storage.
@roberthbailey , please see the new version which addresses your comments. Thanks |
GCE e2e test build/test passed for commit 2bdb85f16ca3a30f5e16a6929b012ccfe9a1f169. |
GCE e2e test build/test passed for commit 816b295. |
@dchen1107 @roberthbailey please add "ok-to-merge" if you think it is ready for that |
@k8s-bot test this Tests are more than 48 hours old. Re-running tests. |
GCE e2e build/test failed for commit 816b295. |
@k8s-bot test this |
GCE e2e build/test failed for commit 816b295. |
@k8s-bot test this |
GCE e2e build/test failed for commit 816b295. |
@k8s-bot test this Hard to believe how a trusty change will hurt tests on containervm. |
GCE e2e test build/test passed for commit 816b295. |
Is there anything missing for the PR to be merged? |
Nope, the submit queue has been stuck for almost a day. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test failed for commit 816b295. |
@k8s-bot test this |
GCE e2e test build/test passed for commit 816b295. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e test build/test passed for commit 816b295. |
Automatic merge from submit-queue |
Auto commit by PR queue bot
The current trusty node support does not rely on salt, but it still downloads the salt tarball to extract needed kube-system pods manifest files for simplicity. However, this method will not work for master on trusty, because there are several manifest files with salt configuration. This change get rids of using salt tarball in trusty nodes. Instead, we put the manifest files under the directory cluster/gce/trusty/kube-manifest. For ongoing work of master on trusty, we will follow the same logic to get kube-system pods manifest.