-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubectl delete should handle stopping an rc while it's starting replicas #9147
Comments
We've encountered this race previously in other contexts. See also #7328. |
So kubectl will compare the rv it got via its resize update with the rv in the rc's status? that would work but i think we've stayed away from comparing rvs for various reasons in many other clients. Fyi I think we should fix this for 1.0, either hacky or clean, or people will be confused. We started rate limiting the controller-manager to 20qps, so it takes 10s to create an rc with 200 pods. Stopping during those 10s has undefined consequences. |
The proposal in #7328 would add a new sequence number, not use resourceVersion. If we don't want to bite the bullet and properly implement graceful termination (#1535) in the server, then another alternative would be to just delete the replication controller, and then the pods matching its selector. Rolling update already has such a cleanup loop: |
Just cleaning up orphaned pods won't cut it because deleting the rc != the rc manager noticing the delete. So if we cleanup orphaned pods and the rc manager is processing the rc, it will still create pods. Looping around till there are none left might, but feels dirty. I'm going to take a stab at a few experimental implementations and see what works best given our 1.0 constraints. |
I think I'm going to stick with watching pods instead of status.Replicas in kubectl, that'll fix a majority of the confusion and confine the changes to kubectl, though some races will still be possible. In theory graceful termination is a feature and we have feature freeze on friday, so that's probably not getting. |
This is too error prone to do without some changes to the rc manager. The best change discussed so far (and one that will be helpful long term) is to add a sequence number to the spec and status of the rc. Everytime the rc manager takes any action it mirrors the sequence number from the spec into the rc, and kubectl does:
|
Fixed via #9739 |
Since kubectl stop currently polls status.Replicas, if one creates an rc and stops it soon after there's a chance the polling hits status.Replicas=0 when in fact the rc has not updated status.replicas yet.
An elegant solution to this problem probably involves kubectl not deleting the rc, but somehow signaling to the rc manager that it wants the rc gone. A halfway solution involves kubectl not deleting the rc while it's working. One solution is as follows (hacks like defaulting status.replicas to -1 aside):
The current stop flow is:
At the time of the resize, the rc could be in 3 states:
stable
(status.Replicas == spec.Replicas)dormant
(waiting on watch events from previously created/deleted replicas)working
(currently creating/deleting replicas)It is not safe to poll status.Replicas while the rc is in
working
. If we create it in working and move it to dormant/stable after updating status.replicas, we can add a stage between 1 and 2 that blocks till this happens.@lavalamp @davidopp
The text was updated successfully, but these errors were encountered: