Add --force to kubectl delete and explain force deletion #35484

smarterclayton · 2016-10-25T02:44:50Z

--force is required for --grace-period=0. --now is == --grace-period=1.
Improve command help to explain what graceful deletion is and warn about
force deletion.

Part of #34160 & #29033

In order to bypass graceful deletion of pods (to immediately remove the pod from the API) the user must now provide the `--force` flag in addition to `--grace-period=0`.  This prevents users from accidentally force deleting pods without being aware of the consequences of force deletion.  Force deleting pods for resources like StatefulSets can result in multiple pods with the same name having running processes in the cluster, which may lead to data corruption or data inconsistency when using shared storage or common API endpoints.

This change is

smarterclayton · 2016-10-25T02:45:52Z

@kubernetes/kubectl see referenced proposal for description of changes, @foxish

foxish · 2016-10-25T03:14:45Z

pkg/kubectl/cmd/delete.go

+		gracePeriod = 1
+	}
+	if gracePeriod == 0 && !cmdutil.GetFlagBool(cmd, "force") {
+		return fmt.Errorf("Immediate deletion may result in multiple copies of your resources running at the same time. You must pass --force to delete with grace period 0.")


I think this error message may not be clear to someone who isn't familiar with the entire issue. Also, if it's force deleting a simple pod object, the multiple copies comment does not apply. Maybe we could rephrase it to say something along the lines of:

Immediate deletion does not wait for confirmation that the running resource has been terminated and may leave it running in case of a network partition.

foxish · 2016-10-25T03:20:07Z

pkg/kubectl/cmd/delete.go

+		down or unable to communicate to the server. To force delete a resource, you must pass a grace
+		period of 0 and specify the --force flag.
+
+		IMPORTANT: Force deleting pods may result in multiple instances of that pod running at the same


Maybe clarification that multiple instances may be left running if the pod is part of a collection managed by a higher level controller (RS/RC/PS).

smarterclayton · 2016-10-27T19:00:32Z

Updated text (but not generated files)

smarterclayton · 2016-11-01T15:54:12Z

@foxish @vishh @bprashanth if you guys have more opinions on the text here please let me know, otherwise going to generate docs and make sure the tests are right.

smarterclayton · 2016-11-01T15:54:50Z

Also any opinionated kubectl reviewers on messages on delete. This is an explicit CLI API change for safety purposes and can break scripts that were doing unsafe things before.

vishh · 2016-11-01T16:00:04Z

pkg/kubectl/cmd/delete.go

+		gracePeriod = 1
+	}
+	if gracePeriod == 0 && !cmdutil.GetFlagBool(cmd, "force") {
+		return fmt.Errorf("Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.")


nit: I'd place a WARNING: prefix to this message!

Agree, but we can't do that today without changes to the command stuff. I'll open a follow up.

Excuse my ignorance, why is it not possible to add a string prefix to the error message here?

All errors are prefixed with error:. So if we added a prefix it would be error: WARNING: blah which doesn't make much sense.

vishh · 2016-11-01T16:04:10Z

pkg/kubectl/cmd/delete.go

+		before they are forcibly terminated (the grace period) but you may override that value with
+		the --grace-period flag, or pass --now to set a grace-period of 1. Because these resources often
+		represent entities in the cluster, deletion may not be acknowledged immediately. If a resource
+		like a pod has not been deleted within the grace period, the node that hosts that pod may be


Grace period can be as low as 1 second. We should not place expectations on the system to gracefully delete pod objects in 1 second. Instead, let's say that if a pod isn't deleted in a reasonable amount of time, the node might be down and/or unreachable. In such scenarios, it might be necessary to forcibly delete pods to free up cluster resources.

vishh · 2016-11-01T16:04:58Z

pkg/kubectl/cmd/delete.go

+		period of 0 and specify the --force flag.
+
+		IMPORTANT: Force deleting pods does not wait for confirmation that the pod's processes have been
+		terminated, which means the pod may still be running on the cluster. This may result in


Not just concurrent access, it will also lead to resource leakage on the nodes which over time can lead to unexpected evictions.

I don't think most users care about that - that's a kubelet problem. Most users care about "my database is now corrupted" >>>> "nodes may not have as many resources as they said".

Well, users don't care about resources up and until their pods start getting evicted. Would it hurt conveying this additional information?

I'm just trying not to make this a novel. Let me see if I can work in a sentence.

Added a sentence at the end.

foxish · 2016-11-01T19:18:12Z

@smarterclayton Looks fine to me.
cc @pwittrock

k8s-ci-robot · 2016-11-02T02:30:34Z

Jenkins GKE smoke e2e failed for commit 9a8d8016f25263645d2df42cc23f38d396a11743. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T02:30:34Z

Jenkins GCI GKE smoke e2e failed for commit 9a8d8016f25263645d2df42cc23f38d396a11743. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T02:43:07Z

Jenkins GCI GCE e2e failed for commit 9a8d8016f25263645d2df42cc23f38d396a11743. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T02:43:08Z

Jenkins GCE etcd3 e2e failed for commit 9a8d8016f25263645d2df42cc23f38d396a11743. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T02:43:35Z

Jenkins GCE e2e failed for commit 9a8d8016f25263645d2df42cc23f38d396a11743. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T03:24:24Z

Jenkins unit/integration failed for commit 46972ec0c54381910fbe8e492d16423ae9aca5e0. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T03:25:21Z

Jenkins GCE Node e2e failed for commit 46972ec0c54381910fbe8e492d16423ae9aca5e0. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

smarterclayton · 2016-11-02T04:44:54Z

node e2e looks like a flake (or maybe needs a --force and is failing hard somewhere)

I1101 20:16:11.753] Connection to 104.154.151.234 closed by remote host.
I1101 20:16:11.753] 
I1101 20:16:11.753] Failure Finished Test Suite on Host tmp-node-e2e-4254901c-gci-dev-55-8872-18-0
I1101 20:16:11.754] [command [ssh -i /home/jenkins/.ssh/google_compute_engine -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o CheckHostIP=no -o StrictHostKeyChecking=no -o ServerAliveInterval=30 -o LogLevel=ERROR 104.154.151.234 -- sh -c 'cd /tmp/gcloud-e2e-404546779 && timeout -k 30s 2700.000000s ./ginkgo --nodes=8 --skip="\[Flaky\]|\[Slow\]|\[Serial\]" --flakeAttempts=2 ./e2e_node.test -- --logtostderr --v 4 --node-name=tmp-node-e2e-4254901c-gci-dev-55-8872-18-0 --report-dir=/tmp/gcloud-e2e-404546779/results --report-prefix=gci-family-gci-dev-55-8872-18-0 --experimental-mounter-path=/tmp/gcloud-e2e-404546779/cluster/gce/gci/mounter/mounter --experimental-mounter-rootfs-path=/media/root '] failed with error: exit status 255 and output:

soltysh · 2016-11-02T10:56:23Z

LGTM

smarterclayton · 2016-11-02T18:03:34Z

Any other comments or is this good enough for government work?

vishh · 2016-11-02T21:28:09Z

@smarterclayton two more open comments.

--force is required for --grace-period=0. --now is == --grace-period=1. Improve command help to explain what graceful deletion is and warn about force deletion.

k8s-github-robot · 2016-11-05T17:42:16Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-11-05T18:19:04Z

Automatic merge from submit-queue

krousey · 2016-11-18T19:15:20Z

@smarterclayton
This is causing upgrade tests to fail. 1.3 e2e tests running with a 1.5 kubectl. I'm open to clever solutions.

cc @saad-ali

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke-container_vm-1.3-container_vm-1.5-upgrade-cluster/143?log

Expected error:
    <*errors.errorString | 0xc820019820>: {
        s: "Error running &{/workspace/kubernetes_skew/cluster/kubectl.sh [/workspace/kubernetes_skew/cluster/kubectl.sh --server=https://199.223.235.243 --kubeconfig=/workspace/.kube/config delete --grace-period=0 -f - --namespace=e2e-tests-kubectl-up6nc] []  0xc8207b9b60  error: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.\n [] <nil> 0xc8206fe440 exit status 1 <nil> true [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4720 0xc820ea4738 0xc820ea4748] [0xa97840 0xa979a0 0xa979a0] 0xc82027b800}:\nCommand stdout:\n\nstderr:\nerror: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.\n\nerror:\nexit status 1\n",
    }
    Error running &{/workspace/kubernetes_skew/cluster/kubectl.sh [/workspace/kubernetes_skew/cluster/kubectl.sh --server=https://199.223.235.243 --kubeconfig=/workspace/.kube/config delete --grace-period=0 -f - --namespace=e2e-tests-kubectl-up6nc] []  0xc8207b9b60  error: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.
     [] <nil> 0xc8206fe440 exit status 1 <nil> true [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4720 0xc820ea4738 0xc820ea4748] [0xa97840 0xa979a0 0xa979a0] 0xc82027b800}:
    Command stdout:

    stderr:
    error: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.

    error:
    exit status 1

not to have occurred

saad-ali · 2016-11-18T19:37:11Z

@smarterclayton This is causing upgrade tests to fail. 1.3 e2e tests running with a 1.5 kubectl. I'm open to clever solutions.

Thanks for raising this issue @krousey! Can you create an bug with the 1.5 milestone to track this. @smarterclayton any thoughts on how to handle this? Can you have server side check client version before enforcing the new behavior?

foxish · 2016-11-18T19:41:34Z

This change is completely at the client-side and is a departure from the old behavior. I imagine one could check the server version and then enforce the use of the --force flag, but that sounds icky.
Is there another option here?

krousey · 2016-11-18T19:42:49Z

@foxish I just created an issue. Please move discussion to there.

bgrant0607 · 2016-11-18T20:37:43Z

This breaks all previous uses of --grace-period=0? If so, we should revert this PR.

We need to not break scripts using kubectl where there was no well defined way of ensuring forward compatibility in order to avoid such breakage.

smarterclayton · 2016-11-18T22:28:53Z

The break was intentional and the reasoning laid in the proposal for why allowing this to continue is unsafe.

smarterclayton · 2016-11-18T22:31:48Z

Let's continue discussion in #37117

googlebot added the cla: yes label Oct 25, 2016

smarterclayton assigned bprashanth Oct 25, 2016

smarterclayton added this to the v1.5 milestone Oct 25, 2016

smarterclayton added the sig/cli Categorizes an issue or PR as relevant to SIG CLI. label Oct 25, 2016

smarterclayton mentioned this pull request Oct 25, 2016

Proposal - Pod safety and termination guarantees #34160

Closed

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Oct 25, 2016

foxish reviewed Oct 25, 2016

View reviewed changes

smarterclayton force-pushed the delete_warning branch from 3cd0d74 to 4f2504b Compare October 27, 2016 19:00

vishh self-assigned this Nov 1, 2016

vishh reviewed Nov 1, 2016

View reviewed changes

smarterclayton force-pushed the delete_warning branch from 4f2504b to 9a8d801 Compare November 2, 2016 02:07

smarterclayton force-pushed the delete_warning branch from 9a8d801 to 46972ec Compare November 2, 2016 03:02

smarterclayton force-pushed the delete_warning branch from 46972ec to 3c81f5a Compare November 2, 2016 04:44

smarterclayton added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Nov 2, 2016

Add --force to kubectl delete and explain force deletion

6e25830

--force is required for --grace-period=0. --now is == --grace-period=1. Improve command help to explain what graceful deletion is and warn about force deletion.

smarterclayton force-pushed the delete_warning branch from 3c81f5a to 6e25830 Compare November 2, 2016 21:44

smarterclayton added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Nov 3, 2016

k8s-github-robot merged commit 4751996 into kubernetes:master Nov 5, 2016

krousey mentioned this pull request Nov 18, 2016

kubectl --force deletion is breaking upgrade tests #37117

Closed

chentao1596 mentioned this pull request Dec 5, 2016

WIP:kubelet: support multi-headers when getting pod from HTTP source #38089

Closed

Random-Liu mentioned this pull request Apr 24, 2017

Pod not removed from the API server after deletion with grace period #44777

Closed

sgotti mentioned this pull request Sep 7, 2017

doc: persistent volumes for etcd data proposal coreos/etcd-operator#1374

Merged

Add --force to kubectl delete and explain force deletion #35484

Add --force to kubectl delete and explain force deletion #35484

Conversation

smarterclayton commented Oct 25, 2016 • edited Loading

smarterclayton commented Oct 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton commented Oct 27, 2016

smarterclayton commented Nov 1, 2016

smarterclayton commented Nov 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

foxish commented Nov 1, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

smarterclayton commented Nov 2, 2016

soltysh commented Nov 2, 2016

smarterclayton commented Nov 2, 2016

vishh commented Nov 2, 2016

k8s-github-robot commented Nov 5, 2016

k8s-github-robot commented Nov 5, 2016

krousey commented Nov 18, 2016

saad-ali commented Nov 18, 2016

foxish commented Nov 18, 2016

krousey commented Nov 18, 2016

bgrant0607 commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Oct 25, 2016 •

edited

Loading