Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop 'kubectl drain' deleting pods with local storage. #26667

Merged
merged 1 commit into from
Jun 9, 2016

Conversation

mml
Copy link
Contributor

@mml mml commented Jun 1, 2016

Kubectl drain will not continue if there are pods with local storage unless
forced with --delete-local-data.

Fixes #23972

@mml mml added this to the v1.3 milestone Jun 1, 2016
@mml mml added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 1, 2016
@k8s-github-robot k8s-github-robot added kind/old-docs size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 1, 2016
@mml mml force-pushed the skip-local-data branch from 80f6d41 to b418a91 Compare June 2, 2016 19:18
@mml
Copy link
Contributor Author

mml commented Jun 2, 2016

*    782 Failed to create firewall e2e-gce-master-0-minion-e2e-gce-master-0-nodeports in kubernetes-jenkins-pull
     783 2016/06/02 13:12:32 e2e.go:218: Error running up: exit status 1
     784 2016/06/02 13:12:32 e2e.go:214: Step 'up' finished in 24m19.816424886s
     785 2016/06/02 13:12:32 e2e.go:114: Error starting e2e cluster. Aborting.
     786 exit status 1
     787 + dump_cluster_logs_and_exit
     788 + local -r exit_status=1
     789 + dump_cluster_logs
     790 + [[ -x cluster/log-dump.sh ]]
     791 + ./cluster/log-dump.sh /var/lib/jenkins/jobs/kubernetes-pull-build-test-e2e-gce/workspace@2/_artifacts
     792 ERROR: gcloud crashed (error): [Errno 111] Connection refused
     793 
     794 If you would like to report this issue, please run the following command:
     795   gcloud feedback
     796 + chmod -R o+r /var/lib/jenkins/jobs/kubernetes-pull-build-test-e2e-gce/workspace@2/_artifacts
     797 + rc=1
     798 + [[ 1 -ne 0 ]]
     799 + [[ -x cluster/log-dump.sh ]]
     800 + [[ -d _artifacts ]]
     801 + echo 'Dumping logs for any remaining nodes'
     802 Dumping logs for any remaining nodes
     803 + ./cluster/log-dump.sh _artifacts
     804 ERROR: gcloud crashed (error): [Errno 111] Connection refused
     805 
     806 If you would like to report this issue, please run the following command:
     807   gcloud feedback
*    808 Build step 'Execute shell' marked build as failure
     809 Recording test results

@mml
Copy link
Contributor Author

mml commented Jun 2, 2016

k8s-bot test this please: issue #20576

@mml
Copy link
Contributor Author

mml commented Jun 2, 2016

Fatally missing @ sign. 👎

@k8s-bot test this: issue #20576

@mml
Copy link
Contributor Author

mml commented Jun 2, 2016

Well, at least this one's more time efficient.

      64 + go run ./hack/e2e.go -v --build
      65 2016/06/02 14:44:43 e2e.go:212: Running: build-release
      66 ERROR: gcloud crashed (error): [Errno 111] Connection refused

@mml
Copy link
Contributor Author

mml commented Jun 2, 2016

@k8s-bot test this: issue #20576

@spxtr
Copy link
Contributor

spxtr commented Jun 2, 2016

@k8s-bot e2e test this issue: #23545

2 similar comments
@spxtr
Copy link
Contributor

spxtr commented Jun 2, 2016

@k8s-bot e2e test this issue: #23545

@mml
Copy link
Contributor Author

mml commented Jun 2, 2016

@k8s-bot e2e test this issue: #23545

if !o.IgnoreDaemonsets {
return false, nil, &fatal{kDaemonsetFatal}
}
return false, &warning{kDaemonsetWarning}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. No matter what, we never delete the daemonset pods because the DS controller doesn't respect unschedulable. The flag only toggles whether we consider their presence an error or not.

I will add a comment that clarifies this.

@davidopp
Copy link
Member

davidopp commented Jun 3, 2016

Sorry for the delay in reviewing this. I like the simplification. I just had a few comments.


// getPodsForDeletion returns all the pods we're going to delete. If there are
// any unmanaged pods and the user didn't pass --force, we return that list in
// an error.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: update this comment.

@mml mml force-pushed the skip-local-data branch from b418a91 to cabb457 Compare June 3, 2016 22:23
@@ -33,6 +33,10 @@ will make the node schedulable again.

.SH OPTIONS
.PP
\fB\-\-delete\-local\-data\fP=false
Continue even if there are pods with local data (EmptyDir).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should just say "local data" since HostDir is also local data. Maybe say "Continue even if there are pods using emptyDir (local data that will be deleted when the node is drained)."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think the flag name is fine, I'm just referring to the way you described it (here and elsewhere in the PR).

@mml mml force-pushed the skip-local-data branch from cabb457 to 4b2fe3e Compare June 6, 2016 18:39
@mml
Copy link
Contributor Author

mml commented Jun 6, 2016

@davidopp fixed docs PTAL

@davidopp
Copy link
Member

davidopp commented Jun 7, 2016

LGTM

Please squash the commits and then apply LGTM label.

@mml mml force-pushed the skip-local-data branch from 4b2fe3e to f211336 Compare June 7, 2016 19:34
@mml mml added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2016
@k8s-github-robot
Copy link

@mml
You must link to the test flake issue which caused you to request this manual re-test.
Re-test requests should be in the form of: k8s-bot test this issue: #<number>
Here is the list of open test flakes.

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 8, 2016
@davidopp
Copy link
Member

davidopp commented Jun 8, 2016

The last message makes no sense, as @mml didn't actually request a re-test.

@k8s-bot test this: issue #IGNORE

@mml mml added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 8, 2016
@mml
Copy link
Contributor Author

mml commented Jun 8, 2016

@spxtr any idea what's going on with the bot? It's talking nonsense about a re-test, and then it took the lgtm label off without saying why.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 8, 2016
Unless forced with --delete-local-data.  Also a refactoring of the
kubectl drain logic that selects/rejects pods and produces error/warning
messages.
@mml mml force-pushed the skip-local-data branch from f211336 to d09af4a Compare June 8, 2016 21:59
@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 8, 2016
@spxtr
Copy link
Contributor

spxtr commented Jun 8, 2016

@spxtr any idea what's going on with the bot? It's talking nonsense about a re-test, and then it took the lgtm label off without saying why.

No idea. Note that if you ask for a retest without adding the issue: it will delete your comment.

@mml mml added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 8, 2016
@k8s-bot
Copy link

k8s-bot commented Jun 8, 2016

GCE e2e build/test passed for commit d09af4a.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jun 9, 2016

GCE e2e build/test passed for commit d09af4a.

@k8s-github-robot
Copy link

Automatic merge from submit-queue


func hasLocalStorage(pod api.Pod) bool {
for _, volume := range pod.Spec.Volumes {
if volume.EmptyDir != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about hostPath?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matt and I discussed this and eventually decided to leave it out. We don't know what people are going to do after the drain. They might be doing something that doesn't delete all the data (for example, upgrade OS and reboot). Arguably we should have yet another flag to control whether to consider hostPath, but for simplicity we decided to just block on the case we know for sure will be deleted, namely emptyDir.

For cluster scale-down, the node goes away "forever" so it makes sense to block on hostPath. But for kubectl drain I think it's not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants