-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trusty: Avoid reaching GCE custom metadata size limit #22818
Conversation
@zmerlynn @dchen1107 @roberthbailey cc/ @fabioy and @wonderfly FYI. |
Labelling this PR as size/XXL |
GCE e2e build/test passed for commit 7f204951d90d7e751b6cd563a5b83477eb0677cd. |
The previous version failed on building. I just update hack/verify-flags/exceptions.txt to fix it. |
GCE e2e build/test passed for commit b08eb995fc70c581ff41129868ec046484fc3659. |
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
@roberthbailey I just made a critical fix PR #22882. I need to merge that one first and cherry pick in 1.2 branch. Then, this PR will need to rebase. So, please wait for a while to review this PR. Thanks |
@cjcullen and I were chatting about exceeding metadata size limits with startup scripts today and he found that by stripping comments we cut down the number of bytes by 25%. Have you considered stripping comments before sticking the script into metadata? We could also consider applying more aggressive minification to the scripts. |
I had thought about ways like stripping comments, refactoring code, or tar the file. Stripping comment can alleviate it for some time. But I expected we would still face this problem in near future. So I decided to make a long term solution by moving most part in a file and put the file in kube-manifest tarball. After all, we already put two kube-addon shell files there. This change should be transparent to GKE, as it does not change the metadata key and mapped file name. The 32768 bytes limit is for GCE custom metadata (https://cloud.google.com/compute/docs/metadata#custom). The "startup-script" is a reserved key, so I am not sure if the limit also applies to ContainerVM's configure-vm.sh |
@andyzheng0831 PR needs rebase |
@roberthbailey The PR #22882 blocking this one was merged, and this PR is rebased. Please feel free to continue to review and comment |
GCE e2e build/test passed for commit 1b622cdaf50e3bcdcad26c6f9c16a557fb95d554. |
Someone also reported this problem in #22842 (comment) |
@andyzheng0831 Thanks. Will this in v1.2? |
Probably not in 1.2 as it is very unlikely to cherry pick non critical fix now. But I expect this will be in 1.2.1 |
@andyzheng0831 OK. Thanks |
GCE e2e build/test passed for commit b80a4764ecd5ebb21c4bf7be38ad5ad2a15a14ef. |
@andyzheng0831 PR needs rebase |
All rebases are done, and all PRs blocking this one were merged. This PR needs to be merged first and then we can add support of regional registry. If the regional registry support in ContainerVM will be cherry picked in 1.2 release branch, we should do it too, so please mark this PR as cherry pick candidate. |
GCE e2e build/test passed for commit 0a8e68f. |
@andyzheng0831: Regional registry support shipped with 1.2.0, and 1.2.0 GKE clusters on ContainerVM will launch with fallback releases. We'll get this cherry-picked soon. |
Ah, I mixed the regional registry and fallback support together. The regional registry is supported in Trusty in PR #22616, but we did not mark it as cherry-pick candidate. Please add the label to that PR. Thanks. |
I believe #22616 shipped with the release? |
Yes, you are right. #22616 is in 1.2.0. |
GCE e2e build/test passed for commit 0a8e68f. |
@roberthbailey I think it would be better to merge this change ASAP for several reasons. (1) This PR has to be rebased with manual work after every of my other changes is merged. Although it is just about code refactoring, rebasing with manual work is error-prone. (2) I just hit the size limit again when writing code for another fix. As a result, I had to delete some comments in configure.sh to reduce the size. This is not a good practice. (3) This PR needs be cherry picked in release-1.2 branch, so that we can add other feature supports in 1.2 branch, such as the regional fallback. But currently I am working on a fix of heapster in trusty, which will not be cherry picked in 1.2 (the PR causing the breakage is not cherry picked). It means that we have to first merge this change and then my fix to the heapster. If the order is reversed, we will have a new trouble when cherry pick this change in 1.2 |
@k8s-bot test this Tests are more than 48 hours old. Re-running tests. |
GCE e2e build/test passed for commit 0a8e68f. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit 0a8e68f. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit 0a8e68f. |
Automatic merge from submit-queue |
Auto commit by PR queue bot (cherry picked from commit f8bb10b)
Commit 3e9daa2 found in the "release-1.2" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this s an error find help to get your PR picked. |
Auto commit by PR queue bot (cherry picked from commit f8bb10b)
Auto commit by PR queue bot (cherry picked from commit f8bb10b)
Auto commit by PR queue bot (cherry picked from commit f8bb10b)
Auto commit by PR queue bot (cherry picked from commit f8bb10b)
GCE custom metadata has a size limit of 32 KB. The file cluster/gce/trusty/configure.sh is uploaded as GCE custom metadata, but its size is already very close to the limit. Recently I tried to add code in the file to support a new feature, but cluster creation failed due to exceeding metadata size. So, it is the time to fix this problem.
Basically, most of this PR is code refactoring:
(1) I split the origin configure.sh into two file. The functions for installing k8s binary and configurations files are left in configure.sh, which will be uploaded as GCE metadata as before. I try to avoid changing instance metadata keys. All other functions are moved to configure-helper.sh, and this file is packed in kube-manifest.tar.gz. In future development, most likely we only need to change configure-helper.sh.
(2) Small changes in instance initialization sequence. Originally, we first prepare environment and then download k8s binary/configuration files. With this change, downloading becomes the first step, because we need to source configure-helper.sh for most remaining steps, which is packed in kube-manifest tarball.
I have run e2e tests on Trusty and no breakage caused by this PR.