-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that DaemonSet respects termination #51279
Ensure that DaemonSet respects termination #51279
Conversation
/retest |
2fc0eae
to
f2c3153
Compare
@enisoc #43077 doesn't allow any Pods with a DeletionTimestamp to be considered with by ControllerRefManager. #44150 filters Pods with a DeletionTimestamp, both are referenced in #44144, which is the actual issue. #43077 introduces the code that makes the test in this PR fail. This line makes it impossible for the DaemonSet Controller to be responsive to terminated Pods. If you're claiming this is unnecessary, I can buy that, but we most certainly need to change |
/retest |
/test pull-kubernetes-unit |
The logic added in #44150 only applies to orphans. kubernetes/pkg/controller/controller_ref_manager.go Lines 69 to 80 in 05e7f6d
So your change to add back |
f2c3153
to
4248f2a
Compare
Do we also need to bring back protection against double deletion? https://github.com/kubernetes/kubernetes/pull/43077/files#diff-cb9055723bef8b239f9df77ac704ffacL556 Your PR is now basically a straight revert of #43077. I still think they intended that to have some effect unrelated to the problem of adopting terminating orphans (which they themselves fixed separately in #44150). We should make sure we understand what problem #43077 was trying to fix before we revert it. |
@@ -1816,12 +1831,6 @@ func TestGetNodesToDaemonPods(t *testing.T) { | |||
newPod("non-matching-owned-0-", "node-0", simpleDaemonSetLabel2, ds), | |||
newPod("non-matching-orphan-1-", "node-1", simpleDaemonSetLabel2, nil), | |||
newPod("matching-owned-by-other-0-", "node-0", simpleDaemonSetLabel, ds2), | |||
func() *v1.Pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this up to wantedPods
so we are clear to future readers that this behavior is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
@kow3ns It seems like the intent of #43077 was to omit terminating-but-owned Pods from the "available" count. However, for some reason no test case was added for such a Pod: https://github.com/kubernetes/kubernetes/pull/43077/files#diff-7f51010c4c3f3050dcce5c9d916fc3eeR138 Could you add a test case for a terminating-but-owned Pod in that function? I expect it will fail now that you've reverted the change from #43077. To avoid regressing on the bug that was meant to fix, we may have to add a new, more targeted fix in Also, you mentioned offline that you brought back the protection against double deletion. Did you not push that commit yet? |
manager.dsStore.Add(ds) | ||
|
||
syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 0) | ||
} | ||
} | ||
|
||
// DaemonSet should launch a pod on a not ready node with taint notReady:NoExecute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment needs to be updated
I suggest we revert #43077 but keep the test cases. If we want to count terminating pods as unavailable (what #43077 originally tried to fix), we should update below code to not increment kubernetes/pkg/controller/daemon/daemon_controller.go Lines 1033 to 1035 in 05e7f6d
I originally suggested that We should probably revisit |
4248f2a
to
010ef98
Compare
/retest |
010ef98
to
e892999
Compare
The goal of #43077 was to make daemon controller behavior similar to replicaset controller where pods marked for deletion are also ignored. |
@enisoc it's handled here https://github.com/kubernetes/kubernetes/pull/43077/files#diff-0dc431d36cd3969089646d1bd4640d1eR1270 |
For DaemonSets:
|
/retest |
1 similar comment
/retest |
/lgtm |
// Skip terminating pods | ||
if pod.DeletionTimestamp != nil { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add these back to dsc.manage()
(where it's deleted)?
… while waiting for a Pod that it has previously created to terminate.
e892999
to
8ad18bf
Compare
/lgtm |
/approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enisoc, janetkuo, kargakis, kow3ns Associated issue: 43077 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302) |
Since the regression was introduced in 1.7, we should probably cherrypick this one into 1.7 |
What this PR does / why we need it:
#43077 correctly prevents the DaemonSet controller from adopting deleted Pods, but, as pointed out in #50477, the controller now has no sensitivity to the termination lifecycle (i.e TerminationGracePeriodSeconds) of the Pods it creates. This PR attempts to balance the two. DaemonSet controller will now consider deleted Pods owned by a DaemonSet during creation, but it will not consider deleted Pods as targets for adoption.
fixes #50477