-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLEG: reinspect pods that failed prior inspections #25077
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix the following sequence of events: 1. relist call 1 successfully inspects a pod (just has infra container) 1. relist call 2 gets an error inspecting the same pod (has infra container and a transient container that failed to create) and doesn't update the old/new pod records 1. relist calls 3+ don't inspect the pod any more (just has infra container so it doesn't look like anything changed) This change adds a new list that keeps track of pods that failed inspection and retries them the next time relist is called. Without this change, a pod in this state would never be inspected again, its entry in the status cache would never be updated, and the pod worker would never call syncPod again because the most recent entry in the status cache has an error associated with it. Without this change, pods in this state would be stuck Terminating forever, unless the user issued a deletion with a grace period value of 0.
ncdc
added
release-note
Denotes a PR that will be considered when it comes time to generate release notes.
sig/node
Categorizes an issue or PR as relevant to SIG Node.
labels
May 3, 2016
k8s-github-robot
added
the
size/L
Denotes a PR that changes 100-499 lines, ignoring generated files.
label
May 3, 2016
@yujuhong PTAL, thanks! |
LGTM. Thanks! kubelet will try to create the container infinitely with this PR, but I think that's expected and is consistent with other type of failures. |
yujuhong
added
the
lgtm
"Looks good to me", indicates that a PR is ready to be merged.
label
May 4, 2016
GCE e2e build/test passed for commit 3a87bfb. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit 3a87bfb. |
Automatic merge from submit-queue |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
lgtm
"Looks good to me", indicates that a PR is ready to be merged.
release-note
Denotes a PR that will be considered when it comes time to generate release notes.
sig/node
Categorizes an issue or PR as relevant to SIG Node.
size/L
Denotes a PR that changes 100-499 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix the following sequence of events:
container that failed to create) and doesn't update the old/new pod records
anything changed)
This change adds a new list that keeps track of pods that failed inspection and retries them the
next time relist is called. Without this change, a pod in this state would never be inspected again,
its entry in the status cache would never be updated, and the pod worker would never call syncPod
again because the most recent entry in the status cache has an error associated with it. Without
this change, pods in this state would be stuck Terminating forever, unless the user issued a
deletion with a grace period value of 0.
Fixes #24819
cc @kubernetes/rh-cluster-infra @kubernetes/sig-node