Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate restartable init containers ignoring not-started containers #125935

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

gjkim42
Copy link
Member

@gjkim42 gjkim42 commented Jul 7, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, the restartable init container (native sidecar container) may not receive SIGTERM if the last restartable init container hasn't run.

This PR ensures that the restartable init containers receive a termination signal even if there are any not-started restartable init containers, by ignoring the not-running containers.

Which issue(s) this PR fixes:

Fixes #125880

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a bug that init containers with `Always` restartPolicy may not terminate gracefully if the pod hasn't initialized yet.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 7, 2024
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 7, 2024
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 7, 2024

/cc @SergeyKanzhelev @tzneal

@gjkim42
Copy link
Member Author

gjkim42 commented Jul 7, 2024

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 7, 2024
test/e2e_node/container_lifecycle_test.go Outdated Show resolved Hide resolved
return false, nil
}
containerStatus := pod.Status.InitContainerStatuses[1]
return containerStatus.State.Running != nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also want to check if the startup probe failed.

Suggested change
return containerStatus.State.Running != nil, nil
return containerStatus.State.Running != nil && containerStatus.Ready == false, nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed to check that the containerStatus.Started is not true

test/e2e_node/container_lifecycle_test.go Outdated Show resolved Hide resolved
test/e2e_node/container_lifecycle_test.go Show resolved Hide resolved
test/e2e_node/container_lifecycle_test.go Outdated Show resolved Hide resolved
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 8, 2024

/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features

@gjkim42
Copy link
Member Author

gjkim42 commented Jul 8, 2024

/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features

@k8s-ci-robot
Copy link
Contributor

@gjkim42: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-e2e-containerd-features db6eca4 link false /test pull-kubernetes-node-e2e-containerd-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

This ensures that the restartable init containers receive a termination
signal even if there are any not-started restartable init containers, by
ignoring the not-running containers.
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 9, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features


// if it's not a running container, pre-close the channel so nothing
// waits on it
if _, isRunning := runningContainers[ic.Name]; !isRunning {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Contributor

@matthyx matthyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 9, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 93ff42ea9e1ee9003dcf14fd15b677da28dfcaf9

@Stono
Copy link

Stono commented Jul 15, 2024

Hello, forgive my nativity in the release process for kubernetes!
Will this be back ported to 1.29?
I'm trying to work out when this will land on GKE vs working around it as we probably hit it a couple of times a week. It's usually when a workload is rolling out but a pod also gets pre-empted

@matthyx
Copy link
Contributor

matthyx commented Jul 15, 2024

Hello, forgive my nativity in the release process for kubernetes! Will this be back ported to 1.29? I'm trying to work out when this will land on GKE vs working around it as we probably hit it a couple of times a week. It's usually when a workload is rolling out but a pod also gets pre-empted

for 1.30 it will probably be cherry-picked
for 1.29 there is little chance: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/cherry-picks.md#what-kind-of-prs-are-good-for-cherry-picks

@Stono
Copy link

Stono commented Jul 15, 2024

OK thanks @matthyx - will look into a workaround then!

@matthyx
Copy link
Contributor

matthyx commented Jul 15, 2024

OK thanks @matthyx - will look into a workaround then!

sorry, I replied too quickly, in 1.29 we were already beta: https://github.com/kubernetes/enhancements/blob/eb0b9a29ecf680fde9a02fdb79aa4ce89bb4dd0a/keps/sig-node/753-sidecar-containers/kep.yaml#L39C7-L39C16

so 1.30 and 1.29 should be cherry-picked

@bart0sh
Copy link
Contributor

bart0sh commented Jul 18, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 18, 2024
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@yujuhong
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjkim42, matthyx, SergeyKanzhelev, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2024
@k8s-ci-robot k8s-ci-robot merged commit fa4b8f3 into kubernetes:master Jul 23, 2024
16 of 17 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Jul 23, 2024
@gjkim42 gjkim42 deleted the fix-125880 branch July 24, 2024 01:18
@matthyx
Copy link
Contributor

matthyx commented Jul 24, 2024

we need a cherry pick for 1.29 and 1.30 here

@gjkim42
Copy link
Member Author

gjkim42 commented Jul 24, 2024

we need a cherry pick for 1.29 and 1.30 here

I'll make PRs for them.

k8s-ci-robot added a commit that referenced this pull request Jul 31, 2024
…935-upstream-release-1.29

Automated cherry pick of #125935: Terminate restartable init containers ignoring not-started containers
k8s-ci-robot added a commit that referenced this pull request Jul 31, 2024
…935-upstream-release-1.30

Automated cherry pick of #125935: Terminate restartable init containers ignoring not-started containers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Termination signals not sent to Native Sidecars when multiple Native Sidecars
8 participants