Make the status.Replica count useful to watchers #5745

bprashanth · 2015-03-20T23:49:24Z

Currently the status.Replica count is not very useful to watchers (like stop rc, which actually doesn't watch but should):

Its updated fillCurrentState style: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/registry/etcd/etcd.go#L114
If we remove that (we should, in any case), it will only get updated once in 10 seconds.

The quick and dirty solutions are:

Decreasing the polling interval: will probably lead to load on the apiserver.
- We can probably tolerate a quicker interval if we just list pods once in the manager, instead of listing in each rc.
Wait on the goroutines that create/delete replicas, assume a 200 response means the request was a success, and update the count then and there: If a pod just keeps dying (stuck in a create->death loop), the status.Replicas would never reflect this because we can always create the pod (and hence update Status.Replicas), it just dies. So the watcher won't be aware of it.

@lavalamp thoughts on just watching the status field of all pods in the manager through the controller framework? I feel like we either need to do that, or list once in the manager and decrease the poll interval.

The text was updated successfully, but these errors were encountered:

lavalamp · 2015-03-26T23:38:54Z

That code linked in 1. looks like it should have been deleted when we changed controller manager to fill in the replicas field. Can we take that out ASAP?

I think having watchers get updated "only" every 10 seconds is not actually a problem at this point in time.

bprashanth · 2015-03-26T23:44:28Z

That happened automatically when we started using generic etcd: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/master/master.go#L380. I forgot to delete the code in registry/etcd, will do.

lavalamp · 2015-03-26T23:54:45Z

Ah.

So is a fair TL;DR: "is a 10 second refresh for rc.status.replicas fast enough?"

bprashanth · 2015-03-26T23:59:43Z

Yes, I get the feeling it's not because it could be 20s before it is accurate right now, which slows down stop (even though stop will soon use watch, eliminating its own 3s poll interval).

We can probably make it as fast as we want when we stop listing all the things from the apiserver.

Correction: In spite of the 20s lag in the general case stop will only be slowed down by 10s, because resizing the rc to 0 will trigger a sync loop.

lavalamp · 2015-04-27T19:57:54Z

To answer my own question: Yes, 10 seconds sounds totally reasonable. I think this is working now, and the parts that aren't perfect are covered by other issues.

bprashanth added area/api Indicates an issue on api area. team/master labels Mar 20, 2015

bprashanth added this to the v1.0 milestone Mar 20, 2015

bprashanth mentioned this issue Mar 21, 2015

Migrate replication controllers to generic etcd #5746

Merged

mbforbes added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 21, 2015

bprashanth self-assigned this Mar 23, 2015

lavalamp closed this as completed Apr 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the status.Replica count useful to watchers #5745

Make the status.Replica count useful to watchers #5745

bprashanth commented Mar 20, 2015

lavalamp commented Mar 26, 2015

bprashanth commented Mar 26, 2015

lavalamp commented Mar 26, 2015

bprashanth commented Mar 26, 2015

lavalamp commented Apr 27, 2015

Make the status.Replica count useful to watchers #5745

Make the status.Replica count useful to watchers #5745

Comments

bprashanth commented Mar 20, 2015

lavalamp commented Mar 26, 2015

bprashanth commented Mar 26, 2015

lavalamp commented Mar 26, 2015

bprashanth commented Mar 26, 2015

lavalamp commented Apr 27, 2015