iptables proxy could prefer local containers #24300

emaildanwilson · 2016-04-14T23:57:53Z

Currently when kube-proxy sets up iptables rules it round robins requests for a specific service to all containers on all minion nodes in the cluster matching the selector for said service. This has many benefits to resiliency for the services and improved latency as described in issue #3760 .

A further improvement to this design would be to prioritize local containers over remote containers so requests don't need to hit the wire and traverse the private underlay network most of the time. I think this would have a significant improvement on the scalability of services when running on clusters with a larger number of minions as well.

There is some concern that by preferring local containers that this would create issues w/ external load balancers in cases where there are not an even spread of containers across the cluster and that we would then need to set the priority or weight correctly on the external load balancers. I believe it's too much of a rabbit hole to try to tackle that aspect of this at this time because of the different ways that each vendor handles this with various load balancing algorithms, not to mention the fact that an overloaded node with a bunch of containers could end up with too many requests and cause a performance issue. Instead this can be dealt with by leveraging algorithms on the load balancer that would auto balance traffic based on the performance of each node. Nodes that have additional containers running locally for the service would complete requests more quickly and therefore even a simple least-connections algorithm would send more requests to that node. Likewise, if it was overloaded and completing requests more slowly then the load balancer would send it less traffic.

One additional optional optimization here would be making the node port for the service inactive if there are no containers for the service on that node. This would allow the load balancer to remove that node from the pool completely until a container does exist on it with a simple port check.

emaildanwilson · 2016-04-21T04:16:06Z

ping. @thockin you might be interested in commenting on this based on your previous work in this area.

thockin · 2016-04-21T04:36:52Z

I swear I opened a bug on this but I can't find it. I'm retitling this for clarity

What does "prefer" mean? Always use? 90%? Even in the face of overload? Once you go down this route, you need a more concrete heuristic than "prefer".

To say "There is some concern" about imbalance is being polite. This is fundamentally the main reason we DON'T implement this today. At least GCE doesn't have a notion of number of connections when it comes to L3 load-balancing, at least not that is visible or configurable. I can't say for sure whether it is smart enough to do something behind the scenes.

The other reason we have not implemented it is because it changes the syncronization model between Kubernetes and the external load-balancers. Today that is pretty static - program all nodes and let the fast-updating kube-proxy handle updating itself. This necessarily means that the client IP gets hidden as we bounce traffic around. This is bad, and we know it. If we always use local backends, we don't need to hide the client IP anymore - win! But now we can ONLY route traffic to nodes with backends on them. This is an ever-changing set, which means we need to be programming the cloud APIs much more frequently, and those APIs are not known for being "fast".

We could, perhaps, do as you suggest with port checks (pretty clever actually :), but that requires some thought in the face of UDP services.

All that said, I am not against doing this - it's just a LOT of work and it's sort of a tradeoff - having load-balancers be balanced vs preserving client IP. We'd have to run some experiments before fully committing. It's a good project. I'd like to work on it, but I have no time right now.

Is someone willing to work on it? Step 0 is a design doc.

thockin · 2016-06-09T16:37:38Z

Dup of #19754

minjs · 2016-09-21T22:51:04Z

@thockin , as my understand #19754 only fix part of this issue. Can we set kube-proxy only redirect traffic to container on local node instead of loadbalancing to all endpoints?

thockin · 2016-09-22T00:27:01Z

That is a much more delicate question because it is very likely to make for
imbalanced performance. Nodes with more frontends have less backends, but
pound on them harder...

On Wed, Sep 21, 2016 at 3:51 PM, Min Ren notifications@github.com wrote:

@thockin https://github.com/thockin , as my understand #19754
#19754 only fix part of
this issue. Can we set kube-proxy only redirect traffic to container on
local node instead of loadbalancing to all endpoints?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24300 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVNhvhMw2cFnbMDBMfttIkPSepQvmks5qsbUJgaJpZM4IH5Kv
.

minjs · 2016-09-22T04:40:38Z

@thockin, I guess it is more about NodePort service type. In our case, we have to connect to an external load balancer even with NodePort service. And we can not just set one node ip-port to external load balancer. Then there are two load balancing running, and kube-proxy load balancing real just give the request more latency.

thockin · 2016-09-22T04:57:08Z

Yes, NodePort support will be a followup to the work that went in for 1.4

On Wed, Sep 21, 2016 at 9:41 PM, Min Ren notifications@github.com wrote:

@thockin https://github.com/thockin, I guess it is more about NodePort
service type. In our case, we have to connect to an external load balancer
even with NodePort service. And we can not just set one node ip-port to
external load balancer. Then there are two load balancing running, and
kube-proxy load balancing real just give the request more latency.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24300 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVLKUMOJhz8tTFIujunRQTTAXMs09ks5qsgb3gaJpZM4IH5Kv
.

lavalamp added the team/cluster label Apr 15, 2016

thockin added sig/network Categorizes an issue or PR as relevant to SIG Network. area/kube-proxy labels Apr 21, 2016

thockin changed the title ~~Improve e2e latency and cluster scalability by routing requests to local containers first~~ iptables proxy could prefer local containers Apr 21, 2016

thockin closed this as completed Jun 9, 2016

thockin mentioned this issue Jun 9, 2016

load-balancers for a Service should only target nodes that actually have a backend for that Service #19754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iptables proxy could prefer local containers #24300

iptables proxy could prefer local containers #24300

emaildanwilson commented Apr 14, 2016

emaildanwilson commented Apr 21, 2016

thockin commented Apr 21, 2016 •

edited

Loading

thockin commented Jun 9, 2016

minjs commented Sep 21, 2016

thockin commented Sep 22, 2016

minjs commented Sep 22, 2016

thockin commented Sep 22, 2016

iptables proxy could prefer local containers #24300

iptables proxy could prefer local containers #24300

Comments

emaildanwilson commented Apr 14, 2016

emaildanwilson commented Apr 21, 2016

thockin commented Apr 21, 2016 • edited Loading

thockin commented Jun 9, 2016

minjs commented Sep 21, 2016

thockin commented Sep 22, 2016

minjs commented Sep 22, 2016

thockin commented Sep 22, 2016

thockin commented Apr 21, 2016 •

edited

Loading