Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop deleting DaemonSet pods since they will be recreated immediately, and also require a flag when there are DaemonSet pods on the node. #20232

Merged
merged 1 commit into from
Feb 6, 2016

Conversation

mml
Copy link
Contributor

@mml mml commented Jan 27, 2016

It took some work to get all the cases right in terms of error and warning messages, but the transcript below shows that it's good. I will refactor that part and squash the commits, but putting the PR out for a little scrutiny of semantics.

% echo $m
kubernetes-minion-c9db
% _output/local/bin/linux/amd64/kubectl drain $m
node "kubernetes-minion-c9db" cordoned
error: pods managed by neither a ReplicationController nor a DaemonSet: naked (use --force to override) and DaemonSet-managed pods: mmlds-0w8zz (use --ignore-daemonsets to ignore)

% _output/local/bin/linux/amd64/kubectl drain $m --force
node "kubernetes-minion-c9db" already cordoned
error: DaemonSet-managed pods: mmlds-0w8zz (use --ignore-daemonsets to ignore)

% _output/local/bin/linux/amd64/kubectl drain $m --ignore-daemonsets
node "kubernetes-minion-c9db" already cordoned
error: pods managed by neither a ReplicationController nor a DaemonSet: naked (use --force to override)

% _output/local/bin/linux/amd64/kubectl drain $m --force --ignore-daemonsets
node "kubernetes-minion-c9db" already cordoned
WARNING: About to delete these pods managed by neither a ReplicationController nor a DaemonSet: naked
pod "naked" deleted
pod "kube-dns-v10-luuw6" deleted
node "kubernetes-minion-c9db" drained

@mml
Copy link
Contributor Author

mml commented Jan 27, 2016

@davidopp

@k8s-github-robot
Copy link

Labelling this PR as size/M

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 27, 2016
@k8s-bot
Copy link

k8s-bot commented Jan 27, 2016

GCE e2e test build/test passed for commit 1ca5a82de75531a79a99c0a803172c158f3287fc.

@davidopp davidopp assigned davidopp and unassigned brendandburns Jan 27, 2016
@davidopp
Copy link
Member

Yeah, figuring out the right semantics here is tough, I think what you chose is pretty reasonable (I can think of other semantics, but none are obviously better).

I have two suggestions
(1) maybe call it "--nowarn-daemonsets" instead of "--ignore-daemonsets" since I think "nowarn" is a little clearer about what the flag does.
(2) when all the requirements pass to do a drain and you actually do it (e.g. last case in your sample output above), print "skipping DaemonSet " whenever there is a DaemonSet on the node (one for each DaemonSet you find), to make it a little clearer what is going on.

Does that sound reasonable?

@mml
Copy link
Contributor Author

mml commented Jan 27, 2016

I have two suggestions
(1) maybe call it "--nowarn-daemonsets" instead of "--ignore-daemonsets" since I think "nowarn" is a little clearer about what the flag does.

The flag doesn't just turn off a warning. In fact, without it we won't drain a node with a DaemonSet pod. With it, we literally skip over those pods, not deleting them but still saying "drained". There may be a better name, but "nowarn" makes it sound like it only toggles diagnostics and not behavior.

(2) when all the requirements pass to do a drain and you actually do it (e.g. last case in your sample output above), print "skipping DaemonSet " whenever there is a DaemonSet on the node (one for each DaemonSet you find), to make it a little clearer what is going on.

This I can add.

@davidopp
Copy link
Member

The flag doesn't just turn off a warning. In fact, without it we won't drain a node with a DaemonSet pod. With it, we literally skip over those pods, not deleting them but still saying "drained". There may be a better name, but "nowarn" makes it sound like it only toggles diagnostics and not behavior.

Ah yes, I was just thinking about the fact that we never actually kill DaemonSet pods under any circumstances, hence the flag's behavior with respect to DaemonSet pods is really just a warning. But you're right, it changes the behavior with respect to machines (i.e. whether we drain other pods on the machine or skip the machine). So current flag name sounds good.

@k8s-bot
Copy link

k8s-bot commented Jan 27, 2016

GCE e2e test build/test passed for commit 8df7861555caffd1814b36678f38931847d1da5e.

@k8s-bot
Copy link

k8s-bot commented Jan 27, 2016

GCE e2e build/test failed for commit b9fd103dc718244027adfb102950add449a3edca.

@mml
Copy link
Contributor Author

mml commented Jan 27, 2016

@davidopp here's the new output.

% _output/local/bin/linux/amd64/kubectl drain kubernetes-minion-c9db --ignore-daemonsets --force
node "kubernetes-minion-c9db" already cordoned
WARNING: About to delete these pods managed by neither a ReplicationController nor a DaemonSet: naked and DaemonSet-managed pods: mmlds-l1ekr
pod "naked" deleted
pod "kube-dns-v10-sp51v" deleted
node "kubernetes-minion-c9db" drained

@mml
Copy link
Contributor Author

mml commented Jan 27, 2016

NM. Just realized I printed exactly the wrong thing, fixing it now.

@mml
Copy link
Contributor Author

mml commented Jan 27, 2016

WARNING: About to delete these pods managed by neither a ReplicationController nor a DaemonSet: naked
WARNING: Skipping DaemonSet-managed pods: mmlds-bmee1
pod "naked" deleted
pod "kube-dns-v10-rlohl" deleted
node "kubernetes-minion-c9db" drained

@mml mml force-pushed the daemonset-warn branch 2 times, most recently from 5e89921 to 91d8f7b Compare January 28, 2016 22:09
@mml
Copy link
Contributor Author

mml commented Jan 28, 2016

@davidopp PTAL

@k8s-bot
Copy link

k8s-bot commented Jan 28, 2016

GCE e2e test build/test passed for commit 91d8f7bc535eda0c671038472c8d202061b3ab06.

pods unless you use \-\-force.
the API server) and DaemonSet\-managed pods. If there are DaemonSet\-managed
pods, drain will not proceed without \-\-ignore\-daemonsets, since those pods will
be immediately replaced by the NodeController. If there are any pods that are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never actually moved this to the NodeController; the daemons would be replaced by the DaemonSet controller.

@mml
Copy link
Contributor Author

mml commented Jan 29, 2016

I think we also want to print this message in the case where unreplicatedPodNames == 0 && o.IgnoreDaemonsets && len(daemonSetPodNames) > 0, but if I'm understanding the logic correctly, you won't do that.

Correct. Fixed. Also removed all mention of NodeController in favor of DaemonSet Controller. PTAL

@k8s-bot
Copy link

k8s-bot commented Jan 29, 2016

GCE e2e test build/test passed for commit fa0a01173e5a7ba28c448c1b229281c740b3b1ee.

managed by a ReplicationController or DaemonSet, then drain will not delete any
pods unless you use \-\-force.
the API server) and DaemonSet\-managed pods. If there are DaemonSet\-managed
pods, drain will not proceed without \-\-ignore\-daemonsets, since those pods will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "those pods will be immediately replaced by the DaemonSet Controller" thing is a little confusing because drain actually won't delete the DaemonSet pods, even if you pass --ignore-daemonsets. I would rephrase as something like "If there are DaemonSet-managed pods, drain will not proceed without --ignore-daemonsets, and regardless it will not delete any DaemonSet-managed pods, because those pods would be immediately replaced by the DaemonSet controller, which ignores unschedulable markings."

Of course for all of this stuff we need to make sure your Job PR merges properly, i.e. ReplicationController -> ReplicationController, Job

}
fmt.Fprintf(o.out, "WARNING: About to delete these pods managed by neither a ReplicationController nor a DaemonSet: %s\n", joined)
} else if len(unreplicatedPodNames) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else not necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, ignore that

@bgrant0607
Copy link
Member

Have we effectively reverted #17318 -- will daemons land on unschedulable nodes?

@davidopp
Copy link
Member

Have we effectively reverted #17318 -- will daemons land on unschedulable nodes?

Yes, it's one of the changes in #20061 which is LGTMd and waiting to merge.

@k8s-github-robot
Copy link

PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2016
We do this because they will be recreated immediately by the
DaemonSet Controller.  In addition, we also require a specific flag
(--ignore-daemonsets) when there are DaemonSet pods on the node.
@mml
Copy link
Contributor Author

mml commented Feb 2, 2016

I didn't quite do 16 cases, but I think it's pretty clear now. LMK

@k8s-bot
Copy link

k8s-bot commented Feb 2, 2016

GCE e2e test build/test passed for commit 10164659ef077d956ab92dd9116edcf7a845d692.

@k8s-bot
Copy link

k8s-bot commented Feb 2, 2016

GCE e2e test build/test passed for commit f2d5375.

func unmanagedMsg(unreplicatedNames []string, daemonSetNames []string, include_guidance bool) string {
msgs := []string{}
if len(unreplicatedNames) > 0 {
msg := fmt.Sprintf("pods not managed by ReplicationController, Job, or DaemonSet: %s", strings.Join(unreplicatedNames, ","))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not say DaemonSet, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong. These are pods with no created-by ref, so they aren't managed by RC, Job, or DS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, the thing I missed was that you only append to unreplicatedPodNames in a case branch that comes after daemonset_pods, thus daemonset is never in unreplicatedPodNames.

@davidopp
Copy link
Member

davidopp commented Feb 2, 2016

LGTM

@davidopp davidopp added lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-merge labels Feb 2, 2016
@k8s-github-robot
Copy link

@k8s-bot test this issue: #IGNORE

Tests have been pending for 24 hours

1 similar comment
@k8s-github-robot
Copy link

@k8s-bot test this issue: #IGNORE

Tests have been pending for 24 hours

@k8s-bot
Copy link

k8s-bot commented Feb 5, 2016

GCE e2e test build/test passed for commit f2d5375.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Feb 6, 2016

GCE e2e test build/test passed for commit f2d5375.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Feb 6, 2016

GCE e2e test build/test passed for commit f2d5375.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Feb 6, 2016
@k8s-github-robot k8s-github-robot merged commit 6c5540b into kubernetes:master Feb 6, 2016
@mml mml deleted the daemonset-warn branch April 5, 2016 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants