Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubectl delete should handle stopping an rc while it's starting replicas #9147

Closed
bprashanth opened this issue Jun 2, 2015 · 7 comments
Closed
Assignees
Labels
area/kubectl kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@bprashanth
Copy link
Contributor

Since kubectl stop currently polls status.Replicas, if one creates an rc and stops it soon after there's a chance the polling hits status.Replicas=0 when in fact the rc has not updated status.replicas yet.

An elegant solution to this problem probably involves kubectl not deleting the rc, but somehow signaling to the rc manager that it wants the rc gone. A halfway solution involves kubectl not deleting the rc while it's working. One solution is as follows (hacks like defaulting status.replicas to -1 aside):

The current stop flow is:

  1. resize
  2. poll on status.Replicas
  3. delete rc

At the time of the resize, the rc could be in 3 states:

  • stable (status.Replicas == spec.Replicas)
  • dormant (waiting on watch events from previously created/deleted replicas)
  • working (currently creating/deleting replicas)

It is not safe to poll status.Replicas while the rc is in working. If we create it in working and move it to dormant/stable after updating status.replicas, we can add a stage between 1 and 2 that blocks till this happens.

@lavalamp @davidopp

@bprashanth bprashanth self-assigned this Jun 2, 2015
@bprashanth bprashanth added this to the v1.0-candidate milestone Jun 2, 2015
@goltermann goltermann modified the milestones: v1.0-post, v1.0-candidate Jun 2, 2015
@bgrant0607
Copy link
Member

We've encountered this race previously in other contexts. See also #7328.

@bgrant0607 bgrant0607 added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/backlog Higher priority than priority/awaiting-more-evidence. kind/bug Categorizes issue or PR as related to a bug. labels Jun 2, 2015
@bprashanth
Copy link
Contributor Author

So kubectl will compare the rv it got via its resize update with the rv in the rc's status? that would work but i think we've stayed away from comparing rvs for various reasons in many other clients. Fyi I think we should fix this for 1.0, either hacky or clean, or people will be confused. We started rate limiting the controller-manager to 20qps, so it takes 10s to create an rc with 200 pods. Stopping during those 10s has undefined consequences.

@bgrant0607
Copy link
Member

The proposal in #7328 would add a new sequence number, not use resourceVersion.

If we don't want to bite the bullet and properly implement graceful termination (#1535) in the server, then another alternative would be to just delete the replication controller, and then the pods matching its selector. Rolling update already has such a cleanup loop:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubectl/rolling_updater.go#L244

@bprashanth
Copy link
Contributor Author

Just cleaning up orphaned pods won't cut it because deleting the rc != the rc manager noticing the delete. So if we cleanup orphaned pods and the rc manager is processing the rc, it will still create pods. Looping around till there are none left might, but feels dirty. I'm going to take a stab at a few experimental implementations and see what works best given our 1.0 constraints.

@bprashanth
Copy link
Contributor Author

I think I'm going to stick with watching pods instead of status.Replicas in kubectl, that'll fix a majority of the confusion and confine the changes to kubectl, though some races will still be possible. In theory graceful termination is a feature and we have feature freeze on friday, so that's probably not getting.

@bprashanth
Copy link
Contributor Author

This is too error prone to do without some changes to the rc manager. The best change discussed so far (and one that will be helpful long term) is to add a sequence number to the spec and status of the rc. Everytime the rc manager takes any action it mirrors the sequence number from the spec into the rc, and kubectl does:

  • resize to 0
  • poll on status.sequenceNumber from previous step so it knows the rc has seen the reisze
  • poll on status.Replicas
  • delte rc

@bprashanth
Copy link
Contributor Author

Fixed via #9739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubectl kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

3 participants