Fix logic error in graceful deletion #37721

derekwaynecarr · 2016-11-30T22:59:42Z

If a resource has the following criteria:

deletion timestamp is not nil
deletion graceperiod seconds as persisted in storage is 0

the resource could never be deleted as we always returned pending graceful.

k8s-oncall · 2016-11-30T22:59:48Z

This change is

derekwaynecarr · 2016-11-30T23:03:01Z

I need to get a test case for this tomorrow, but this came up in a scenario where we create 100s of namespaces, each namespace with a 100 pods, and we en-masse delete all namespaces. With about 10k pods, we would end up with ~70 or so pods that could never be deleted. Inspection of the state showed that the pod had a gracePeriodSeconds=0 and a deletionTimestamp=some_value . If a pod was in the state, forceful deletion would never occur. As for why the resource was in this state, is reason for another investigation yet to happen, but it appears that when a kubelet did a force-deletion, the deletion was given an OK response, but the object was not truly removed and instead had its local state updated as above. Once this happened, it was never able to be removed without direct access to etcd.

I think this is a release-blocker for 1.5, and I will need to cherry-pick to 1.4.x I think as well.

/cc @smarterclayton @ingvagabund @sjenning @caesarxuchao @deads2k @saad-ali @eparis

smarterclayton · 2016-11-30T23:29:31Z

Logically this PR makes sense, should be a small change to pkg/api/rest/resttest to verify it.

In 1.7 once we move to etcd3 I will open a follow up item to set grace period to 0 and then delete the pod in a single transaction, which should avoid the potential race error here (we want consumers to see 0 and observe the deletion as discrete steps).

derekwaynecarr · 2016-11-30T23:44:11Z

Marking do-not-merge pending unit tests

k8s-ci-robot · 2016-12-01T00:26:38Z

Jenkins GCE e2e failed for commit 1ec5411. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-12-01T00:30:07Z

Jenkins GCE etcd3 e2e failed for commit 1ec5411. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

derekwaynecarr · 2016-12-01T19:28:46Z

@smarterclayton -- added test case.

The reason this happened is if the update succeeded, and the delete failed (due to etcd being temporarily unreachable, or having an internal error), we would find ourselves in a non-recoverable state.

k8s-ci-robot · 2016-12-01T21:09:42Z

Jenkins GCI GCE e2e failed for commit 1473084. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

derekwaynecarr · 2016-12-01T21:22:03Z

@k8s-bot gci gce e2e test this

k8s-github-robot · 2016-12-02T13:44:58Z

Automatic merge from submit-queue

@yarntime

#37834-#37723-#37668-#37721-#37381-#37944-#37997-#37939-#37990-upstream-release-1.5 Automatic merge from submit-queue Automated cherry pick of #35272 #37834 #37723 #37668 #37721 #37381 #37944 #37997 #37939 #37990 upstream release 1.5 Batch cherry pick PRs #35272 #37834 #37723 #37668 #37721 #37381 #37944 #37997 #37939 #37990 from master to release-1.5 branch. PRs #37997 had merge conflicts that needed to be resolved (due to large PRs that merged to master but not 1.5, see this for details) CC PR Authors: @yarntime @ixdy @mtaufen @ymqytw @derekwaynecarr @jszczepkowski @Kargakis @foxish @jingxu97

k8s-cherrypick-bot · 2016-12-03T08:50:27Z

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2016

derekwaynecarr added this to the v1.5 milestone Nov 30, 2016

derekwaynecarr added cherrypick-candidate release-blocker labels Nov 30, 2016

k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Nov 30, 2016

smarterclayton self-assigned this Nov 30, 2016

derekwaynecarr added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 30, 2016

Fix logic in graceful deletion if existing grace period was 0

1473084

derekwaynecarr force-pushed the fix-graceful-delete branch from 1ec5411 to 1473084 Compare December 1, 2016 19:25

derekwaynecarr added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-label-needed labels Dec 1, 2016

smarterclayton added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2016

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 1, 2016

derekwaynecarr mentioned this pull request Dec 1, 2016

Automated cherry pick of #37721 #37849

Closed

k8s-github-robot merged commit 05af3ab into kubernetes:master Dec 2, 2016

saad-ali mentioned this pull request Dec 3, 2016

Automated cherry pick of #35272 #37834 #37723 #37668 #37721 #37381 #37944 #37997 #37939 #37990 #37945 upstream release 1.5 #38035

Closed

saad-ali added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 3, 2016

saad-ali mentioned this pull request Dec 3, 2016

Automated cherry pick of #35272 #37834 #37723 #37668 #37721 #37381 #37944 #37997 #37939 #37990 upstream release 1.5 #38038

Merged

k8s-cherrypick-bot removed the cherrypick-candidate label Dec 3, 2016

derekwaynecarr mentioned this pull request Dec 5, 2016

Namespace stuck on terminating on delete #37554

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix logic error in graceful deletion #37721

Fix logic error in graceful deletion #37721

derekwaynecarr commented Nov 30, 2016

k8s-oncall commented Nov 30, 2016

derekwaynecarr commented Nov 30, 2016

smarterclayton commented Nov 30, 2016

derekwaynecarr commented Nov 30, 2016

k8s-ci-robot commented Dec 1, 2016

k8s-ci-robot commented Dec 1, 2016

derekwaynecarr commented Dec 1, 2016 •

edited

Loading

k8s-ci-robot commented Dec 1, 2016

derekwaynecarr commented Dec 1, 2016

k8s-github-robot commented Dec 2, 2016

k8s-cherrypick-bot commented Dec 3, 2016

Fix logic error in graceful deletion #37721

Fix logic error in graceful deletion #37721

Conversation

derekwaynecarr commented Nov 30, 2016

k8s-oncall commented Nov 30, 2016

derekwaynecarr commented Nov 30, 2016

smarterclayton commented Nov 30, 2016

derekwaynecarr commented Nov 30, 2016

k8s-ci-robot commented Dec 1, 2016

k8s-ci-robot commented Dec 1, 2016

derekwaynecarr commented Dec 1, 2016 • edited Loading

k8s-ci-robot commented Dec 1, 2016

derekwaynecarr commented Dec 1, 2016

k8s-github-robot commented Dec 2, 2016

k8s-cherrypick-bot commented Dec 3, 2016

derekwaynecarr commented Dec 1, 2016 •

edited

Loading