-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve replication controller manager performance #5884
Comments
@bgrant0607 can you remind me which of these you were telling me yesterday you thought was the most important (or maybe it's something that isn't on the list). I've forgotten. |
There's also an alternative to 3, which is list all pods once per tick instead of once per controller per tick. Feels like we're close enough to the watch-pod-status solution that we should just go for that (we default to using the podcache today but it's feature flagged: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/master/master.go#L384). |
Whether indexed or not, we shouldn't be doing a list of all matching pods per replication controller. Whether watching pod status or listing pods once per tick, we need to move to a model where the appropriate replication controller's count is updated based on the existence (or deletion) of a particular pod. @fgrzadkowski Is posting of pod status from kubelet really conditional? |
|
@bprashanth is right. Use of pod_cache is conditional, but kubelet already updates pod status when it changes (PR #5714). I've already sent PR #5854 to delete pod_cache completely. |
A strawman for the third problem:
Avoiding the periodic sync of the controller manager will make the system incremental, and thereby more fickle to things like network flake. |
I already started working on this. So assigning to myself. |
So am I (I'd just forgotten to assign to myself), I'm just waiting on @lavalamp's reflector framework to start on this :) Though if you really want this go ahead and I can find something else, or the otherway around Edit: I had assigned it to myself perviously, so just taking it back now |
I hoped to finish this today, but didn't make it. Since I'll be OOO until -- sent from a mobile device
|
Yeah I'm doing it, the controller framework mentioned isn't merged yet #6546, so unless there's some urgency to get this in we might as well work on different things, and might as well wait till the framework is in to reduce churn |
@bprashanth What's the ETA for this change? Without this change it's hard for me to move forward with profiling bottlenecks for large clusters. |
@fgrzadkowski got sidetracked writing benchmarks to prove this isn't going to have a negative impact on perf. I have a pr ready, will upload tomorrow (at least for the pod part, seems like the controller framework still has some kinks that need ironing out). |
@bprashanth what is the current status of this PR? |
@bprashanth - is it fixed now? |
Yep, |
This is an umbrella issue to address replication controller/manager performance that might matter for 1.0. In broad strokes:
stop
Is limited by a 3s polling interval. This needs to be a watch. We cannot currently watch specific fields of an rc (Make the status.Replica count useful to watchers #5745). We could also watch a subresource endpoint (Create subresource end points for replication controllers #4909).status.Replicas
reflects thisstop
notices anything@lavalamp thoughts (on specifically the last one, since it overlaps with your controller PR)?
The text was updated successfully, but these errors were encountered: