-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addon updater should not retry too many times - specs may be invalid #10237
addon updater should not retry too many times - specs may be invalid #10237
Conversation
GCE e2e build/test failed for commit 68ffa0bd5792279f051c66a01897cf1282bceda2. |
68ffa0b
to
4980d87
Compare
@@ -181,7 +181,7 @@ spec: | |||
` | |||
|
|||
var addonTestPollInterval = 3 * time.Second | |||
var addonTestPollTimeout = 3 * time.Minute | |||
var addonTestPollTimeout = 1 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rollback of my previous "fix".
@zmerlynn , you reviewed the add-on updater. Could you also be the first reviewer of this fix? |
GCE e2e build/test failed for commit 4980d8786b894462f20ab5682b5cf1e3a80d1e85. |
@k8s-bot test this please |
|
@k8s-bot test this please |
GCE e2e build/test failed for commit 4980d8786b894462f20ab5682b5cf1e3a80d1e85. |
Investigating errors - may be patch-related. Do not merge. |
Indeed, failures are patch-related. The API server may be down and creating addons may fail. Retrying after 10 minutes is too late then. Conclusion - we'd better retry more often, But it doesn't make any sense to retry a single execution of kube-addon-updater.sh than the interval between consecutive invocations of this script. |
4980d87
to
ac74376
Compare
GCE e2e build/test passed for commit ac7437644d43e3a4fa39d0244d43532ff6bbe922. |
@@ -266,6 +271,12 @@ function create-objects() { | |||
local -r file_paths=$2 | |||
local file_path | |||
for file_path in ${file_paths}; do | |||
# Remember that the file may have disappear by now | |||
# But we don't want to check it here because | |||
# such race concition may always happen after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*condition
LGTM modulo nit. Needs rebase? |
ac74376
to
b59477b
Compare
rebased |
b59477b
to
7873e25
Compare
and typo fixed |
LGTM |
cc @alex-mohr @quinton-hoole for rubber stamps. |
GCE e2e build/test passed for commit b59477b8494f4fc2dd029a54283f8f810cc59084. |
GCE e2e build/test failed for commit 7873e25. |
@k8s-bot: retest this please |
@k8s-bot: ok to test |
ONCALL VIOLATING PROCESS: After careful consideration, this is at the bottom of a chain of 3 PRs related to a heavily triaged issue, and all the PRs need some incremental soak time today, in advance of the holiday weekend. If you wish to take my badge for violating process, you may. Merging "to be blessed". |
…too-long addon updater should not retry too many times - specs may be invalid
GCE e2e build/test passed for commit 7873e25. |
Fixes #10136 (error was found in production code, not in the test)
Risk of regression: small (e2e test passes; this is only retry on error; another retry is performed in 10 minutes, so cluster should recover even if these retries fail)
The patch is not so important for production code, but our e2e test fails because of a race condition (that may appear in production too)