Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve replication controller manager performance #5884

Closed
bprashanth opened this issue Mar 24, 2015 · 15 comments
Closed

Improve replication controller manager performance #5884

bprashanth opened this issue Mar 24, 2015 · 15 comments
Assignees
Labels
area/controller-manager priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@bprashanth
Copy link
Contributor

This is an umbrella issue to address replication controller/manager performance that might matter for 1.0. In broad strokes:

@lavalamp thoughts (on specifically the last one, since it overlaps with your controller PR)?

@davidopp
Copy link
Member

@bgrant0607 can you remind me which of these you were telling me yesterday you thought was the most important (or maybe it's something that isn't on the list). I've forgotten.

@bprashanth
Copy link
Contributor Author

There's also an alternative to 3, which is list all pods once per tick instead of once per controller per tick. Feels like we're close enough to the watch-pod-status solution that we should just go for that (we default to using the podcache today but it's feature flagged: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/master/master.go#L384).

@bgrant0607
Copy link
Member

Whether indexed or not, we shouldn't be doing a list of all matching pods per replication controller.

Whether watching pod status or listing pods once per tick, we need to move to a model where the appropriate replication controller's count is updated based on the existence (or deletion) of a particular pod.

@fgrzadkowski Is posting of pod status from kubelet really conditional?

@bprashanth
Copy link
Contributor Author

Status.Replicas is already updated upon creation/deletion of a replica in the controller manager. The posting of the pod status isn't conditional if I've understood this PR correctly (https://github.com/fgrzadkowski/kubernetes/blob/336525a27d3a815954bf67792ae83d88c5e33096/pkg/kubelet/kubelet.go#L571), the use of the pod cache is (@fgrzadkowski correct me if I'm wrong). I'd rather not have to guess if our match filters are applied before or after decorating the status, so if removing status from the pod cache isn't on someones plate I'll get to first.

@fgrzadkowski
Copy link
Contributor

@bprashanth is right. Use of pod_cache is conditional, but kubelet already updates pod status when it changes (PR #5714). I've already sent PR #5854 to delete pod_cache completely.

@bprashanth
Copy link
Contributor Author

A strawman for the third problem:

  • Use the controller framework to watch for updates to replication controllers
  • Also watch for updates to all pods, through a separate reflector (memory issues?)
  • Periodically wake up the controller manager:
    • list pods from local store (based off Phase index?)
    • build a map[selector]{InActive pod-count}
    • list all rcs from local store, get pod count from map, do what we do today (i.e schedule/kill pods and update Status.Replicas)

Avoiding the periodic sync of the controller manager will make the system incremental, and thereby more fickle to things like network flake.

@bprashanth bprashanth added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 31, 2015
@fgrzadkowski
Copy link
Contributor

I already started working on this. So assigning to myself.

@bprashanth
Copy link
Contributor Author

So am I (I'd just forgotten to assign to myself), I'm just waiting on @lavalamp's reflector framework to start on this :)

Though if you really want this go ahead and I can find something else, or the otherway around

Edit: I had assigned it to myself perviously, so just taking it back now

@bprashanth bprashanth assigned bprashanth and unassigned fgrzadkowski Apr 9, 2015
@fgrzadkowski
Copy link
Contributor

I hoped to finish this today, but didn't make it. Since I'll be OOO until
Mon feel free to jump in. Just let me know.

-- sent from a mobile device
9 kwi 2015 17:45 "Prashanth B" notifications@github.com napisał(a):

So am I (I'd just forgotten to assign to myself), I'm just waiting on
@lavalamp https://github.com/lavalamp's reflector framework to start on
this :)

Though if you really want this go ahead and I can find something else,
or the otherway around


Reply to this email directly or view it on GitHub
#5884 (comment)
.

@bprashanth
Copy link
Contributor Author

Yeah I'm doing it, the controller framework mentioned isn't merged yet #6546, so unless there's some urgency to get this in we might as well work on different things, and might as well wait till the framework is in to reduce churn

@fgrzadkowski
Copy link
Contributor

@bprashanth What's the ETA for this change? Without this change it's hard for me to move forward with profiling bottlenecks for large clusters.

@bprashanth
Copy link
Contributor Author

@fgrzadkowski got sidetracked writing benchmarks to prove this isn't going to have a negative impact on perf. I have a pr ready, will upload tomorrow (at least for the pod part, seems like the controller framework still has some kinks that need ironing out).

@wojtek-t
Copy link
Member

@bprashanth what is the current status of this PR?

@davidopp davidopp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Apr 28, 2015
@wojtek-t
Copy link
Member

wojtek-t commented May 5, 2015

@bprashanth - is it fixed now?

@bprashanth
Copy link
Contributor Author

Yep, stop is still slow because kubectl uses polling but that has nothing to do with the replication manager. I'll spin off another bug for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller-manager priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants