Kubectl delete should handle stopping an rc while it's starting replicas #9147

bprashanth · 2015-06-02T22:45:10Z

Since kubectl stop currently polls status.Replicas, if one creates an rc and stops it soon after there's a chance the polling hits status.Replicas=0 when in fact the rc has not updated status.replicas yet.

An elegant solution to this problem probably involves kubectl not deleting the rc, but somehow signaling to the rc manager that it wants the rc gone. A halfway solution involves kubectl not deleting the rc while it's working. One solution is as follows (hacks like defaulting status.replicas to -1 aside):

The current stop flow is:

resize
poll on status.Replicas
delete rc

At the time of the resize, the rc could be in 3 states:

stable (status.Replicas == spec.Replicas)
dormant (waiting on watch events from previously created/deleted replicas)
working (currently creating/deleting replicas)

It is not safe to poll status.Replicas while the rc is in working. If we create it in working and move it to dormant/stable after updating status.replicas, we can add a stage between 1 and 2 that blocks till this happens.

@lavalamp @davidopp

The text was updated successfully, but these errors were encountered:

bgrant0607 · 2015-06-02T23:08:39Z

We've encountered this race previously in other contexts. See also #7328.

bprashanth · 2015-06-02T23:21:32Z

So kubectl will compare the rv it got via its resize update with the rv in the rc's status? that would work but i think we've stayed away from comparing rvs for various reasons in many other clients. Fyi I think we should fix this for 1.0, either hacky or clean, or people will be confused. We started rate limiting the controller-manager to 20qps, so it takes 10s to create an rc with 200 pods. Stopping during those 10s has undefined consequences.

bgrant0607 · 2015-06-03T07:35:10Z

The proposal in #7328 would add a new sequence number, not use resourceVersion.

If we don't want to bite the bullet and properly implement graceful termination (#1535) in the server, then another alternative would be to just delete the replication controller, and then the pods matching its selector. Rolling update already has such a cleanup loop:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubectl/rolling_updater.go#L244

bprashanth · 2015-06-03T18:04:04Z

Just cleaning up orphaned pods won't cut it because deleting the rc != the rc manager noticing the delete. So if we cleanup orphaned pods and the rc manager is processing the rc, it will still create pods. Looping around till there are none left might, but feels dirty. I'm going to take a stab at a few experimental implementations and see what works best given our 1.0 constraints.

bprashanth · 2015-06-03T21:50:06Z

I think I'm going to stick with watching pods instead of status.Replicas in kubectl, that'll fix a majority of the confusion and confine the changes to kubectl, though some races will still be possible. In theory graceful termination is a feature and we have feature freeze on friday, so that's probably not getting.

bprashanth · 2015-06-09T17:32:29Z

This is too error prone to do without some changes to the rc manager. The best change discussed so far (and one that will be helpful long term) is to add a sequence number to the spec and status of the rc. Everytime the rc manager takes any action it mirrors the sequence number from the spec into the rc, and kubectl does:

resize to 0
poll on status.sequenceNumber from previous step so it knows the rc has seen the reisze
poll on status.Replicas
delte rc

bprashanth · 2015-08-11T16:48:56Z

Fixed via #9739

bprashanth added area/kubectl team/master labels Jun 2, 2015

bprashanth self-assigned this Jun 2, 2015

bprashanth added this to the v1.0-candidate milestone Jun 2, 2015

goltermann modified the milestones: v1.0-post, v1.0-candidate Jun 2, 2015

bgrant0607 added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/backlog Higher priority than priority/awaiting-more-evidence. kind/bug Categorizes issue or PR as related to a bug. labels Jun 2, 2015

bprashanth mentioned this issue Jun 3, 2015

scale: Watch pod events #8999

Closed

This was referenced Jun 9, 2015

Make 99% of API calls return in less than 1s; constant time to number of nodes and pods #4521

Closed

Fix kubectl stop rc with sequence numbers #9739

Merged

bprashanth mentioned this issue Jun 17, 2015

kubectl stop rc fails to delete all replicas #9979

Closed

0xmichalis mentioned this issue Jul 3, 2015

[DO NOT MERGE] scale: Watch pod events instead of polling rc.status.replicas openshift/origin#3000

Closed

bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015

bprashanth mentioned this issue Aug 11, 2015

Room for optimizing rc stop operations #8676

Closed

bprashanth closed this as completed Aug 11, 2015

aledbf mentioned this issue Aug 2, 2016

Adding wildcard support coredns/coredns#190

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubectl delete should handle stopping an rc while it's starting replicas #9147

Kubectl delete should handle stopping an rc while it's starting replicas #9147

bprashanth commented Jun 2, 2015

bgrant0607 commented Jun 2, 2015

bprashanth commented Jun 2, 2015

bgrant0607 commented Jun 3, 2015

bprashanth commented Jun 3, 2015

bprashanth commented Jun 3, 2015

bprashanth commented Jun 9, 2015

bprashanth commented Aug 11, 2015

Kubectl delete should handle stopping an rc while it's starting replicas #9147

Kubectl delete should handle stopping an rc while it's starting replicas #9147

Comments

bprashanth commented Jun 2, 2015

bgrant0607 commented Jun 2, 2015

bprashanth commented Jun 2, 2015

bgrant0607 commented Jun 3, 2015

bprashanth commented Jun 3, 2015

bprashanth commented Jun 3, 2015

bprashanth commented Jun 9, 2015

bprashanth commented Aug 11, 2015