Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic of consecutive DELETE calls when gracefull deletion is enabled #27539

Merged
merged 1 commit into from
Jul 5, 2016

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Jun 16, 2016

Previously it was possible to extend delete timestamp to infinity by issuing repeated DELETE calls. This PR prevents that. Ref. discussion in #27422

cc @wojtek-t @smarterclayton @lavalamp @piosz @dchen1107

@gmarek gmarek added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 16, 2016
@gmarek gmarek added this to the v1.3 milestone Jun 16, 2016
@gmarek gmarek added kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. labels Jun 16, 2016
@k8s-github-robot k8s-github-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 16, 2016
@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

Failures are strange:

Jun 16 07:53:42.596: INFO: Pod e2e-tests-job-pl5jn scale-down-6fwje on node e2e-gce-agent-pr-13-0-minion-group-8ljf remains, has deletion timestamp 2016-06-16T07:48:41-07:00
Jun 16 07:53:42.596: INFO: Pod e2e-tests-job-pl5jn scale-down-min8x on node e2e-gce-agent-pr-13-0-minion-group-vefa remains, has deletion timestamp 2016-06-16T07:49:34-07:00

I.e. deletion timestamp was correct, but for some reason we didn't delete anything.

@smarterclayton
Copy link
Contributor

Please add a test case (since I'm pretty sure this is a regression in 1.3).

@@ -69,14 +69,13 @@ func BeforeDelete(strategy RESTDeleteStrategy, ctx api.Context, obj runtime.Obje
// only a shorter grace period may be provided by a user
if options.GracePeriodSeconds != nil {
period := int64(*options.GracePeriodSeconds)
if period > *objectMeta.DeletionGracePeriodSeconds {
return false, true, nil
newDeletionTimestamp := unversioned.NewTime(unversioned.Now().Add(time.Second * time.Duration(*options.GracePeriodSeconds)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. How is it possible to extend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting grace period seconds should not reset the deletion timestamp.

Copy link
Contributor

@smarterclayton smarterclayton Jun 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should always be possible to shorten the grace period seconds, regardless of where timestamp is. I.e., a user who does:

DELETE 30s at t1
DELETE 15s at t2

should end up with a resource with grace period 15s with deletion timestamp t1 +30 -15.

Logic I would expect here is:

if period > *objectMeta.DeletionGracePeriodSeconds {
  return
} 
delta := period - objectMeta.DeletionGracePeriodSeconds
objectMeta.DeletionGracePeriodSeconds = period
if objectMeta.DeletionTimestamp == nil {
  // set deletion timestamp to now + grace period
  return
}
objectMeta.DeletionTimestamp = objectMeta.DeletionTimestamp.Sub(delta)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting the timestamp should only be done based on the delta, not on a new timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this with @wojtek-t and basically there are two ways of seeing that. My was the same as yours (that the second delete is basically an update on grace period). Wojteks was that it's the second, completely independent delete, and we should proceed with one that will happen sooner. After short consideration I think the latter is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the guarantee that we designed the graceful deletion path for, so I'm not in favor of making a change like that right now without a longer discussion. The shorter grace period duration must be recorded because an external client may not observe the first request before starting the second. The timestamp is advisory - the grace period is authoritative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it crystal clear: you say that if we have a deletion in t1 with period p1, and deletion in t2 < t1 + p1 with p2 < t1-t2 we should delete object instantly. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two cases for a user invoking two deletes in order (30s and 15s).

  1. The owner of graceful (in this case, for a pod, the kubelet) sees the first request, acts on it, then sees the second
  2. The owner of graceful sees both requests at the same time (or within the same interval)

For the first case, the graceful deleter should trigger a 30s delete, then when it sees a shorter interval, needs to make a decision in its own timeframe (in the causal sense) about whether to terminate or shorten. In the second case, the initial deletion would be 15s.

@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

I don't think it's a regression - judging by the git blame it always was this way...

@lavalamp
Copy link
Member

Needs a test.

@smarterclayton
Copy link
Contributor

It's a regression in intent (introduced by me) :). The change in 02dbb95 to let the DeletionTimestamp get pushed was incorrect.

@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

OK, I can't argue with this definition of regression:). I'll add a test when we'll agree on semantics.

@caesarxuchao
Copy link
Member

I think the value of the DeletionTimestamp is not that critical, the behavior of kubelet (and I believe other components as well) is driven by the grace period, not DeletionTimestamp. I think our components only check if DeletionTimestamp is nil. @smarterclayton did I get it right?

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 16, 2016

@caesarxuchao Yes - DeletionTimestamp is advisory for naive clients, but not for the owners of those components (since clients are not expected to share a common timeline). All critical components depend on DeletionGracePeriodSeconds as it applies to their own timeline (like the kubelet and the namespace controller).

@smarterclayton
Copy link
Contributor

So no matter what we always need to record changes to DeletionGracePeriodSeconds, and Timestamp should be best effort based on what we know at the current time.

@smarterclayton
Copy link
Contributor

So the description of this issue is not correct - I don't see how you can infinitely extend. Why is this a P1?

@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

My impression was that deletion timestamp was the source of truth. If it's not the case, then the situation I observed was caused by something else. The thing that made me think that was the fact that I was unable to see Kubelet deleting a pod, and in the same time deletion timestamp was being extended infinitely.

@smarterclayton
Copy link
Contributor

There should be no code in the Kubernetes repo that depends on
DeletionTimestamp (other than printing it out). All components (node
controller, namespace controller, and kubelet) use a local clock that
begins when they observe the event.

On Thu, Jun 16, 2016 at 1:24 PM, Marek Grabowski notifications@github.com
wrote:

My impression was that deletion timestamp was the source of truth. If it's
not the case, then the situation I observed was caused by something else.
The thing that made me think that was the fact that I was unable to see
Kubelet deleting a pod, and in the same time deletion timestamp was being
extended infinitely.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#27539 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p8XYHQsMbH_oGqFtoyX9xI8dh-1Nks5qMYbSgaJpZM4I3a9G
.

@gmarek
Copy link
Contributor Author

gmarek commented Jun 16, 2016

You know which word worries me, right?

@smarterclayton
Copy link
Contributor

Doing a quick search:

$ grep -R ".DeletionTimestamp" --include=*.go --exclude=generated.pb.go --exclude=*_generated.go --exclude=types.generated.go --exclude=*_test.go * | grep -v "DeletionTimestamp != nil" | grep -v "DeletionTimestamp == nil"
pkg/api/meta/interfaces.go: GetDeletionTimestamp() *unversioned.Time
pkg/api/meta/interfaces.go: SetDeletionTimestamp(timestamp *unversioned.Time)
pkg/api/meta/meta.go:func (a genericAccessor) GetDeletionTimestamp() *unversioned.Time {
pkg/api/meta/meta.go:func (a genericAccessor) SetDeletionTimestamp(timestamp *unversioned.Time) {
pkg/api/meta.go:func (meta *ObjectMeta) GetDeletionTimestamp() *unversioned.Time { return meta.DeletionTimestamp }
pkg/api/meta.go:func (meta *ObjectMeta) SetDeletionTimestamp(deletionTimestamp *unversioned.Time) {
pkg/api/meta.go:    meta.DeletionTimestamp = deletionTimestamp
pkg/api/rest/create.go: objectMeta.DeletionTimestamp = nil
pkg/api/rest/delete.go:         objectMeta.DeletionTimestamp = &now
pkg/api/rest/delete.go: objectMeta.DeletionTimestamp = &now
pkg/api/types.go:   // DeletionTimestamp is the time after which this resource will be deleted. This
pkg/api/types.go:   DeletionTimestamp *unversioned.Time `json:"deletionTimestamp,omitempty"`
pkg/api/v1/types.go:    // DeletionTimestamp is RFC 3339 date and time at which this resource will be deleted. This
pkg/api/v1/types.go:    DeletionTimestamp *unversioned.Time `json:"deletionTimestamp,omitempty" protobuf:"bytes,9,opt,name=deletionTimestamp"`
pkg/api/validation/validation.go:       if !oldMeta.DeletionTimestamp.IsZero() {
pkg/api/validation/validation.go:           newMeta.DeletionTimestamp = oldMeta.DeletionTimestamp
pkg/api/validation/validation.go:       allErrs = append(allErrs, field.Invalid(fldPath.Child("deletionTimestamp"), newMeta.DeletionTimestamp, "field is immutable; may only be changed via deletion"))
pkg/api/validation/validation.go:   if newNamespace.DeletionTimestamp.IsZero() {
pkg/controller/controller_utils.go:// DeletionTimestamp on an object as a delete. To do so consistenly, one needs
pkg/controller/controller_utils.go:             p.Namespace, p.Name, p.Status.Phase, p.DeletionTimestamp)
pkg/controller/namespace/namespace_controller_utils.go: if namespace.DeletionTimestamp.IsZero() || namespace.Status.Phase == api.NamespaceTerminating {
pkg/controller/namespace/namespace_controller_utils.go: estimate, err := deleteAllContent(kubeClient, clientPool, opCache, groupVersionResources, namespace.Name, *namespace.DeletionTimestamp)
pkg/controller/petset/fakes.go:func (f *fakePetClient) setDeletionTimestamp(index int) error {
pkg/controller/petset/fakes.go: f.pets[index].pod.DeletionTimestamp = &unversioned.Time{Time: time.Now()}
pkg/controller/petset/pet.go:   glog.Infof("PetSet %v waiting on pet %v to die in %v", pet.parent.Name, realPet.pod.Name, realPet.pod.DeletionTimestamp)
pkg/controller/replicaset/replica_set.go:   glog.V(4).Infof("Pod %s/%s deleted through %v, timestamp %+v: %+v.", pod.Namespace, pod.Name, utilruntime.GetCaller(), pod.DeletionTimestamp, pod)
pkg/controller/replication/replication_controller.go:   glog.V(4).Infof("Pod %s/%s deleted through %v, timestamp %+v, labels %+v.", pod.Namespace, pod.Name, utilruntime.GetCaller(), pod.DeletionTimestamp, pod.Labels)
pkg/kubectl/describe.go:            fmt.Fprintf(out, "Status:\tTerminating (expires %s)\n", pod.DeletionTimestamp.Time.Format(time.RFC1123Z))
pkg/kubelet/config/config.go:       reflect.DeepEqual(existing.DeletionTimestamp, ref.DeletionTimestamp) &&
pkg/kubelet/config/config.go:   existing.DeletionTimestamp = ref.DeletionTimestamp
pkg/kubelet/kubelet.go:         // handle the work item. Check if the DeletionTimestamp has been
pkg/registry/generic/registry/store.go: accessor.SetDeletionTimestamp(nil)
pkg/registry/namespace/etcd/etcd.go:    if namespace.DeletionTimestamp.IsZero() {
pkg/registry/namespace/etcd/etcd.go:                if existingNamespace.DeletionTimestamp.IsZero() {
pkg/registry/namespace/etcd/etcd.go:                    existingNamespace.DeletionTimestamp = &now
pkg/registry/pod/strategy.go:   newPod.DeletionTimestamp = nil
pkg/runtime/types.go:func (u *Unstructured) GetDeletionTimestamp() *unversioned.Time {
pkg/runtime/types.go:func (u *Unstructured) SetDeletionTimestamp(timestamp *unversioned.Time) {

I think we're good.

@gmarek gmarek removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 17, 2016
@gmarek gmarek force-pushed the graceful branch 2 times, most recently from 7f8b4d6 to 6102d2f Compare June 17, 2016 08:17
@gmarek
Copy link
Contributor Author

gmarek commented Jun 17, 2016

Added test and changed the logic according to what @smarterclayton wrote.

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 17, 2016
@smarterclayton
Copy link
Contributor

Thanks, LGTM. Do we want to risk this for 1.3 or leave it as is? I'd be inclined to punt it.

@gmarek
Copy link
Contributor Author

gmarek commented Jun 17, 2016

@k8s-bot test this issue: #IGNORE
no test results

@k8s-bot
Copy link

k8s-bot commented Jun 17, 2016

GCE e2e build/test passed for commit 6102d2f27e61647946d4f55a8354f1f7bf3c08af.

@goltermann
Copy link
Contributor

What's the call here? In or out for 1.3?

@smarterclayton
Copy link
Contributor

My preference is out - the bug it fixes is not material to the operation of a 1.3 system.

@goltermann
Copy link
Contributor

Removing v1.3 milestone - speak up if you disagree

@goltermann goltermann modified the milestones: next-candidate, v1.3 Jun 20, 2016
@gmarek
Copy link
Contributor Author

gmarek commented Jul 4, 2016

@lavalamp PTAL as the master is open now.

@smarterclayton
Copy link
Contributor

Changes LGTM

@smarterclayton smarterclayton added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 5, 2016
@k8s-bot
Copy link

k8s-bot commented Jul 5, 2016

GCE e2e build/test passed for commit 6102d2f27e61647946d4f55a8354f1f7bf3c08af.

@k8s-bot
Copy link

k8s-bot commented Jul 5, 2016

GCE e2e build/test failed for commit 6102d2f27e61647946d4f55a8354f1f7bf3c08af.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@gmarek
Copy link
Contributor Author

gmarek commented Jul 5, 2016

@k8s-bot e2e test this issue: #IGNORE
github query failed.

@k8s-bot
Copy link

k8s-bot commented Jul 5, 2016

GCE e2e build/test passed for commit 6102d2f27e61647946d4f55a8354f1f7bf3c08af.

@k8s-bot
Copy link

k8s-bot commented Jul 5, 2016

GCE e2e build/test failed for commit 6102d2f27e61647946d4f55a8354f1f7bf3c08af.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@gmarek
Copy link
Contributor Author

gmarek commented Jul 5, 2016

No-op rebase. reapplying lgtm.

@gmarek gmarek added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 5, 2016
@k8s-bot
Copy link

k8s-bot commented Jul 5, 2016

GCE e2e build/test passed for commit 2a5914d.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit b37a5de into kubernetes:master Jul 5, 2016
@gmarek gmarek deleted the graceful branch August 30, 2016 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants