-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self image lookup: Retry on empty string #1628
Conversation
Occasionally this appears to return am empty string, which then makes things fail down the line as an empty string is not a valid image. Retry if that happens. Output for when this happened: ``` {"level":"info","ts":"2022-07-28T16:55:17Z","logger":"setup","msg":"using operator image","image":""} ```
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
operatorImage, err = lookupOperatorImage(opts.ControlPlaneOperatorImage) | ||
if err != nil { | ||
return false, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because container.Status
needs some time to be populated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. I can not tell exactly why this happens, I just did observe it happen once and it leads to hard to understand issues down the line (Deployment invalid: "" is not a valid image
). I thought this would be the best workaround, as it is simple and stupid.
return false, err | ||
} | ||
// Apparently this is occasionally set to an empty string | ||
if hostedClusterConfigOperatorImage == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given lookupOperatorImage must find an Image, I think it should error when it doesn't e.g. hostedClusterConfigOperatorImage == "".
i.e modify lookupOperatorImage to check container.ImageID is not empty here and fail otherwise:
for _, container := range me.Status.ContainerStatuses {
// TODO: could use downward API for this too, overkill?
if container.Name == "control-plane-operator" {
return strings.TrimPrefix(container.ImageID, "docker-pullable://"), nil
}
}
Then we poll here and retry if lookupOperatorImage errors.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we break the retyring if we encounter an error because we assume that an error is fatal and not recoverable. The "We got an empty string back" is likely recoverable, so we retry.
/lgtm |
/hold cancel |
/retest-required |
@alvaroaleman: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required |
/cherrypick release-4.11 |
@alvaroaleman: new pull request created: #1695 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Occasionally this appears to return am empty string, which then makes
things fail down the line as an empty string is not a valid image. Retry
if that happens.
Output for when this happened:
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, use
fixes #<issue_number>(, fixes #<issue_number>, ...)
format, where issue_number might be a GitHub issue, or a Jira story:Fixes #
Checklist