Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default value of deleting-pods-burst to 1 #27422

Merged
merged 1 commit into from
Jun 15, 2016

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Jun 15, 2016

Fix. #27413

@gmarek gmarek added this to the v1.3 milestone Jun 15, 2016
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Jun 15, 2016
@k8s-bot
Copy link

k8s-bot commented Jun 15, 2016

GCE e2e build/test passed for commit aec5dfb.

@gmarek gmarek added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jun 15, 2016
@davidopp
Copy link
Member

LGTM

Thanks!

@davidopp davidopp added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. cherrypick-candidate labels Jun 15, 2016
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@piosz
Copy link
Member

piosz commented Jun 16, 2016

This PR probably broke gce-serial and gce-serail suites. Both fail with:

[k8s.io] DaemonRestart [Disruptive] Kubelet should not restart containers across restart

Reverting it.

@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

I'm investigating. I found a problem with graceful deletion of Pods on a Kubelet side (run 1500):

...
I0615 21:20:04.025942   10821 kubelet.go:2516] SyncLoop (UPDATE, "api"): "daemonrestart10-b1b0f766-3335-11e6-8a92-0242ac110002-g4icu_e2e-tests-daemonrestart-u5hdb(6636171a-333e-11e6-9d07-42010af00002):DeletionTimestamp=2016-06-15T21:20:34Z"
...
I0615 21:20:12.560890   10821 kubelet.go:2516] SyncLoop (UPDATE, "api"): "daemonrestart10-b1b0f766-3335-11e6-8a92-0242ac110002-g4icu_e2e-tests-daemonrestart-u5hdb(6636171a-333e-11e6-9d07-42010af00002):DeletionTimestamp=2016-06-15T21:20:42Z"
...

I.e. deletion timestamp is updated in every "step". Those logs seem correlated with DELETE calls from namespacecontroller:

...
I0615 21:20:04.054745       5 handlers.go:165] DELETE /api/v1/namespaces/e2e-tests-daemonrestart-u5hdb/pods: (11.206196ms) 200 [[kube-controller-manager/v1.4.0 (linux/amd64) kubernetes/6209b1b/namespace-controller] 127.0.0.1:60266]
...
I0615 21:20:12.586205       5 handlers.go:165] DELETE /api/v1/namespaces/e2e-tests-daemonrestart-u5hdb/pods: (53.886941ms) 200 [[kube-controller-manager/v1.4.0 (linux/amd64) kubernetes/6209b1b/namespace-controller] 127.0.0.1:60358]
...

I have no idea how default value bump might cause something like this. I suspect another PR that was merged in that time: #26801 @saad-ali @caesarxuchao @yujuhong

@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

I still don't know why this change might have this effect, but I found a problem with graceful deletion code (I'll send a fix in a sec), that should solve this particular problem. It's kind of strange that it didn't bit us earlier.

The problem itself appeared now because for some reason pods we scheduled didn't finish when received SIGTERM from Kubelet. It earlier runs all daemons were deleted quickly, so there were no SIGKILLs required. Now pods refuse to die (which I think is regression - possibly caused by #26801) - @saad-ali to verify - and the bug in graceful deletion logic prevented Kubelet from issueing SIGKILL.

@yujuhong
Copy link
Contributor

Without reading the details, the daemonrestart test is failing because of the new behavior introduced by #26801. The explanations are in the comments of #26290

@wojtek-t
Copy link
Member

We should revert that PR then to unblock submit-queue.
@lavalamp @k8s-oncall ^^

k8s-github-robot pushed a commit that referenced this pull request Jul 5, 2016
Automatic merge from submit-queue

Fix logic of consecutive DELETE calls when gracefull deletion is enabled

Previously it was possible to extend delete timestamp to infinity by issuing repeated DELETE calls. This PR prevents that. Ref. discussion in #27422

cc @wojtek-t @smarterclayton @lavalamp @piosz @dchen1107
@gmarek gmarek deleted the burst branch August 30, 2016 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants