Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic error in graceful deletion #37721

Merged

Conversation

derekwaynecarr
Copy link
Member

If a resource has the following criteria:

  1. deletion timestamp is not nil
  2. deletion graceperiod seconds as persisted in storage is 0

the resource could never be deleted as we always returned pending graceful.

@k8s-oncall
Copy link

This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2016
@derekwaynecarr
Copy link
Member Author

I need to get a test case for this tomorrow, but this came up in a scenario where we create 100s of namespaces, each namespace with a 100 pods, and we en-masse delete all namespaces. With about 10k pods, we would end up with ~70 or so pods that could never be deleted. Inspection of the state showed that the pod had a gracePeriodSeconds=0 and a deletionTimestamp=some_value . If a pod was in the state, forceful deletion would never occur. As for why the resource was in this state, is reason for another investigation yet to happen, but it appears that when a kubelet did a force-deletion, the deletion was given an OK response, but the object was not truly removed and instead had its local state updated as above. Once this happened, it was never able to be removed without direct access to etcd.

I think this is a release-blocker for 1.5, and I will need to cherry-pick to 1.4.x I think as well.

/cc @smarterclayton @ingvagabund @sjenning @caesarxuchao @deads2k @saad-ali @eparis

@derekwaynecarr derekwaynecarr added this to the v1.5 milestone Nov 30, 2016
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Nov 30, 2016
@smarterclayton
Copy link
Contributor

Logically this PR makes sense, should be a small change to pkg/api/rest/resttest to verify it.

In 1.7 once we move to etcd3 I will open a follow up item to set grace period to 0 and then delete the pod in a single transaction, which should avoid the potential race error here (we want consumers to see 0 and observe the deletion as discrete steps).

@smarterclayton smarterclayton self-assigned this Nov 30, 2016
@derekwaynecarr derekwaynecarr added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 30, 2016
@derekwaynecarr
Copy link
Member Author

Marking do-not-merge pending unit tests

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 1ec5411. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE etcd3 e2e failed for commit 1ec5411. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@derekwaynecarr derekwaynecarr added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-label-needed labels Dec 1, 2016
@derekwaynecarr
Copy link
Member Author

derekwaynecarr commented Dec 1, 2016

@smarterclayton -- added test case.

The reason this happened is if the update succeeded, and the delete failed (due to etcd being temporarily unreachable, or having an internal error), we would find ourselves in a non-recoverable state.

@smarterclayton smarterclayton added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2016
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 1, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit 1473084. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@derekwaynecarr
Copy link
Member Author

@k8s-bot gci gce e2e test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 05af3ab into kubernetes:master Dec 2, 2016
@saad-ali saad-ali added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 3, 2016
k8s-github-robot pushed a commit that referenced this pull request Dec 3, 2016
#37834-#37723-#37668-#37721-#37381-#37944-#37997-#37939-#37990-upstream-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #35272 #37834 #37723 #37668 #37721 #37381 #37944 #37997 #37939 #37990 upstream release 1.5

Batch cherry pick PRs #35272 #37834 #37723 #37668 #37721 #37381 #37944 #37997 #37939 #37990 from master to release-1.5 branch.

PRs #37997 had merge conflicts that needed to be resolved (due to large PRs that merged to master but not 1.5, see this for details)

CC PR Authors: @yarntime @ixdy @mtaufen @ymqytw @derekwaynecarr @jszczepkowski @Kargakis @foxish @jingxu97
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-blocker release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants