-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't create a new sandbox for pod with RestartPolicyOnFailure if all containers succeeded #92614
Conversation
Hi @tnqn. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
^ please add a release note that explains the change in the kubelet |
} | ||
changes.ContainersToStart = append(changes.ContainersToStart, idx) | ||
} | ||
changes.ContainersToStart = containersToStart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another nit: wouldn't it be more logical to move the loop back here - after the Init
containers check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to change ordering back to make sure that any changes related to containers start/stop are only applied after all Init container complete. Just to avoid potential issues in a future if more logic will be added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containersToStart
is also used to determine whether there's a need to create a sandbox. Wouldn't it cause repeated computation if moving the loop back here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the logic here - if the if len(pod.Spec.InitContainers) != 0 {
than we need to start an Init
container. So this condition check should go before any container-related ones. I was imagining situation when len(pod.Spec.Containers)
is 0
, i.e. pod has no containers, but has Init containers. In this case if order of checks will be as it's written, Init containers wouldn't run. The situation when there are no containers in a pod seems to be impossible today, thus it's just a recommendation to swap these checks. In anticipation that one day more conditions may be added above which semantically needs to run after all Init containers logic complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I understand the concern now, but swapping them back could cause sandbox still being recreated if any initContainer is present. The current workflow of a Pod with initContainer is and would be kept as below:
- create sandbox, create init container, create container (expected actions)
- all containers succeeded, delete sandbox (expected actions)
- create sandbox, create init container (unexpected actions)
If taking potential no container case into consideration, could we add a condition to ensure this is not the first time to create sandbox like
if !shouldRestartOnFailure(pod) && attempt != 0 && len(podStatus.ContainerStatuses) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w.r.t. Sergey's comment, is it possible that a pod has:
some init containers (> 0)
no container
some Ephemeral Containers (> 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure all initContainers can run once successfully regardless of there are containers to start, I added one more condition for not creating sandbox. It should address the potential case "init containers (> 0), no container".
For the case of Ephemeral Containers (> 0), I think it's not affected by this PR. It was never created along with (re-)creating sandbox, and was handled when createPodSandbox
is false after initContainer or normal container became running. The behavior is kept as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a comment above. Suggestion is to minimize the introduced changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe introducing a test with the situaiton @tedyu outlined will be beneficial. So future changes will take it into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already some tests to cover a Pod in TestComputePodActionsWithInitAndEphemeralContainers
for example:
"Kill pod and do not restart ephemeral container if the pod sandbox is dead": { |
And I have added
Create a new pod sandbox if the pod sandbox is dead, init container failed and RestartPolicy == OnFailure
to it to ensure sandbox can be recreated as long as initialization is not done regardless of regular containers and ephemeral containers' status.I didn't really delete regular containers from the pod spec of the tests, thinking it sounds strange to have a case with illegal input, but I think they should cover the situation @tedyu outlined implicitly
Done, thanks. |
Is it possible to add an e2e node test (under test/e2e_node) that covers the scenario ? Thanks |
/retest |
2 similar comments
/retest |
/retest |
I don't have permissions to lgtm, for what it worth: /lgtm |
/test pull-kubernetes-e2e-kind-ipv6 |
@SergeyKanzhelev @dashpole @dims could this PR get into 1.20 milestone? |
/milestone v1.20 |
If you want 1.19.1 cutoff is Friday I think. So need to do it today |
failure doesn't seem to be related /retest |
@alex-vmw @SergeyKanzhelev I just created #94725 to backport to 1.19, hope it can catch 1.19.2. |
@tnqn How about for 1.18.x? Any plans to backport there? |
@tnqn @SergeyKanzhelev this only need for 1.18 and 1.19, not for 1.17? |
@cpanato I just checked the issue can be reproduced on 1.17.12 as well, created #95508 for 1.17. |
…pstream-release-1.19 Automated cherry pick of #92614: Don't create a new sandbox for pod with
…pstream-release-1.18 Automated cherry pick of #92614: Don't create a new sandbox for pod with
…pstream-release-1.17 Automated cherry pick of #92614: Don't create a new sandbox for pod with
What type of PR is this?
/kind bug
What this PR does / why we need it:
The kubelet would attempt to create a new sandbox for a pod whose RestartPolicy is OnFailure even after all container succeeded. It caused unnecessary CRI and CNI calls, confusing logs and conflicts between the routine that creates the new sandbox and the routine that kills the Pod.
This patch checks the containers to start and stops creating sandbox if no container is supposed to start.
Which issue(s) this PR fixes:
Fixes #92613
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: