Initial design doc for load-balancers & public-port services #7447

justinsb · 2015-04-28T16:55:28Z

FAO: @thockin @smarterclayton @quinton-hoole @ArtfulCoder

(Probably of interest also to @a-robinson @benmccann and @bakins)

Initial design doc for public-port services & load-balancers.

I have tried to synthesize all the discussion in #6910, but am also mindful of something that we can hope to achieve in the V1 timeframe.

A big simplification was that @a-robinson pointed out that the cloudprovider implementation can return the IPs that are associated with a load balancer (and that this is a good change long-term for reconciling failed create operations). That moves the reported IPs from Spec to Status, which then means we don't need to address the object-update-scoping problems now. I think.

I kept publicIPs more-or-less as-is, but they move from Spec to Status. Does this address your concerns with them @thockin? And then do we need loadbalancerIPs?

There has been some discussion of whether we should rename portalIP to something more self-evident. I think that is unrelated (although my vote is for clusterIP). While we're talking naming, I think I prefer publicPort to hostPort. And is sidecar-pod the right word?

Also not in-scope: adding phases / lifecycle events to services.

I can imagine that we might want to configure a system-pod that runs haproxy & listens on port 80 & 443. This would be very useful for Digital Ocean. I think this would be specially configured, and would bypass the API checks. I don't think we need to do anything for that just yet, so also out-of-scope.

ghost · 2015-04-28T17:35:55Z

On Tue, Apr 28, 2015 at 9:56 AM, Justin Santa Barbara <
notifications@github.com> wrote:

FAO: @thockin https://github.com/thockin @smarterclayton
https://github.com/smarterclayton @quinton-hoole
https://github.com/quinton-hoole @ArtfulCoder
https://github.com/ArtfulCoder

(Probably of interest also to @a-robinson https://github.com/a-robinson
@benmccann https://github.com/benmccann and @bakins
https://github.com/bakins)

Initial design doc for public-port services & load-balancers.

I have tried to synthesize all the discussion in #6910
#6910, but am
also mindful of something that we can hope to achieve in the V1 timeframe.

A big simplification was that @a-robinson https://github.com/a-robinson
pointed out that the cloudprovider implementation can return the IPs that
are associated with a load balancer (and that this is a good change
long-term for reconciling failed create operations). That moves the
reported IPs from Spec to Status, which then means we don't need to address
the object-update-scoping problems now. I think.

Cool. I like.

I kept publicIPs more-or-less as-is, but they move from Spec to Status.
Does this address your concerns with them @thockin
https://github.com/thockin? And then do we need loadbalancerIPs?

I'd vote for moving publicIP's from spec to status (and if necessary
renaming to externalIPs to make the transition easier). And yes, I vote
for getting rid of loadBalancerIP's.

There has been some discussion of whether we should rename portalIP to
something more self-evident. I think that is unrelated (although my vote is
for clusterIP).

Yes, I agree, clusterIP works for me.

While we're talking naming, I think I prefer publicPort to hostPort.

Agree. PublicPort tells application deployers what the thing is, which is
what we want. HostPort tells cluster builders how it works under the
covers, which is not what we want here.

And is sidecar-pod the right word?

Also not in-scope: adding phases / lifecycle events to services.

It breaks my heart, but agree with limiting scope for v1.0. Adding it
afterwards seems fairly straightforward.

I'll read your doc today, and comment there too :-)

Q

I can imagine that we might want to configure a system-pod that runs

haproxy & listens on port 80 & 443. This would be very useful for Digital
Ocean. I think this would be specially configured, and would bypass the API
checks. I don't think we need to do anything for that just yet, so also

out-of-scope.

You can view, comment on, or merge this pull request online at:

#7447
Commit Summary

Intial design doc for public-ports & load balancers

File Changes

A docs/load_balancer.md
https://github.com/GoogleCloudPlatform/kubernetes/pull/7447/files#diff-0
(278)

Patch Links:

https://github.com/GoogleCloudPlatform/kubernetes/pull/7447.patch

https://github.com/GoogleCloudPlatform/kubernetes/pull/7447.diff

—
Reply to this email directly or view it on GitHub
#7447.

thockin · 2015-04-29T05:39:31Z

docs/load_balancer.md

@@ -0,0 +1,278 @@
+# Load balancing & publicly accessible services


should be docs/design/public_ips.md or similar.

smarterclayton · 2015-04-29T23:07:11Z

I spent some time thinking about the general service status issue. I think we've got two parts of the story for services mostly thought through identity -> (name, selector) and backends -> (endpoints, ports). The other part though is ingress - how does someone getting a service know the different ways that a service can be approached.

There are a bunch of ways:

Direct to the host port
To a load balancer port set up in the infrastructure
To a load balancing router somewhere (multiplexed for HTTP/HTTPS on a single IP/Port)
Via DNS; potentially multiple - one or more internal names, possibly one or more internal domains, and then each of the others might also have DNS entries tied to them

Today in Kube there's no way to find out what the DNS name of your service is - we could define it by convention, but it's not discoverable by tools or end users.

In OpenShift we have a route; one or more DNS names (plus port or path or protocol) that themselves are located on an IP or a stable DNS name. We point to the service, but there's no way the service can describe the different ways it is currently being pointed to.

Why not create an ingress list? It's possibly more varied that endpoints, and it could be a separate resource, but at least right now it feels more like:

service:
  status:
    portals:
    - protocol: tcp
      type: InternalServicePortal
      dns:
      - kubernetes.default.k8s.io
      - kubernetes.default.mycompany.com
      ips:
      - 172.0.0.1
    - protocol: http
      type: ExternalRoute
      ref:
        kind: RouteCreatedInOpenShift
        name: some-route
        namespace: bar
      dns:
      - mycompany.foo.bar.com
      ips:
      - ip1.of.router1
      - ip2.of.router2

It's a fairly annoying problem that a client can't find out which internal point to talk to, or discover external paths. Exposing some sort of structured object like this (as service status or as a separate resource) would allow clients to determine how to talk to them.

smarterclayton · 2015-04-29T23:59:36Z

And to be clear, what is above is a sketch of the discovery side of services - actual objects (GCE API load balancer, OpenShift Route, future FloatingIP object) would still exist and have controllers that registered/announced themselves into the service.

a-robinson · 2015-04-30T00:43:32Z

docs/load_balancer.md

+1. API server assigns a hostPort (if the load balancer needs one), accepts and persists
+1. LB Controller wakes up, allocates a load-balancer, and `POST`s `/api/v1/namespaces/foo/services/bar` `Service{ Status{ publicIps: [ "1.2.3.4" ] }}`
+1. Service REST sets `Service.Status.publicIps = [ "1.2.3.4" ]`
+1. kube-proxy wakes up and sets iptables to receive on 1.2.3.4


Except when the load balancer allocates a DNS name, and it needs to listen on a host port? As written, this seems muddled together with the public case, rather than one being nested in the other.

Yes, I was definitely muddled here. Making loadbalancer => public simplifies this, I think. I wrote out the two main cases (GCE, AWS) separately.

They use very different models

justinsb · 2015-04-30T02:51:59Z

@smarterclayton I agree that figuring out the ingress method for a service is too complicated right now. I like the very structured method you suggested, though I don't think I'll have enough time to implement it.

I think the two main use cases here are an external program consuming a service, and a service published through a load-balancer for general Internet usage.

The load-balancer case is straightforward because k8s does not do a lot here: the operator currently has to set up a CNAME or A record after they create a service, pointing to the load balancer. We could make this better.
For another cluster consuming a service, that cluster can call the k8s API to list the service to get the hostPorts, and then list the minions to get the list of IPs. Not trivial, but it will always have to query the API, so this is not too bad... The simplicity of the visibility nesting model I think makes this tolerable: anything that is not 'cluster' will have a hostPort, anything that has a hostPort is reachable.

Definitely something to improve in V2.

What I suspect we'll end up doing in V1 or V1.1 is publishing some helpful records into skydns, and setting visibility==public on skydns.

smarterclayton · 2015-04-30T03:14:40Z

On Apr 29, 2015, at 10:52 PM, Justin Santa Barbara notifications@github.com wrote:

@smarterclayton I agree that figuring out the ingress method for a service is too complicated right now. I like the very structured method you suggested, though I don't think I'll have enough time to implement it.

I think the two main use cases here are an external program consuming a service, and a service published through a load-balancer for general Internet usage.

The load-balancer case is straightforward because k8s does not do a lot here: the operator currently has to set up a CNAME or A record after they create a service, pointing to the load balancer. We could make this better.
For another cluster consuming a service, that cluster can call the k8s API to list the service to get the hostPorts, and then list the minions to get the list of IPs. Not trivial, but it will always have to query the API, so this is not too bad... The simplicity of the visibility nesting model I think makes this tolerable: anything that is not 'cluster' will have a hostPort, anything that has a hostPort is reachable.
Definitely something to improve in V2.

What I suspect we'll end up doing in V1 or V1.1 is publishing some helpful records into skydns, and setting visibility==public on skydns.

My primary concern is not underestimating this problem - I'm fine with small tactical changes that get us better for 1.0, but I'd rather do less dramatic changes to service if we already know that larger work is coming.

The two things I think we have to do for 1.0 is

have a method for external traffic to ingress (variants on service public ip)
move portalIP allocation to allow multi-master setups (clients have to tolerate portalIP being empty) because we can't make that change after v1 is set

—
Reply to this email directly or view it on GitHub.

justinsb · 2015-04-30T03:22:52Z

@quinton-hoole I don't think I understand "move portalIP allocation to allow multi-master setups"

I believe this could be done synchronously, by essentially storing the currently-in-memory allocation pool in etcd. We could pack the data into one key and use a compare-and-swap, or we could have one key per portalIP. In one of my PRs I created a pool helper class (pool_allocator.go) - I had in the back of my mind that we would likely change the mechanism and wanted to avoid copying & pasting:
https://github.com/GoogleCloudPlatform/kubernetes/pull/6684/files

Is this the problem you're talking about, and do you think we could use etcd in this way?

smarterclayton · 2015-04-30T03:40:05Z

I'll assure you meant me.

Basically yes. We've talked about it in 6 or 7 different issues - the current thought is a CAS from a controller because we don't want to waste time solving contention / issues right now (want to get it out of crit path faster). If you want to tackle it (in whatever way you want) please do - it's on the short list of issues from the Red Hat side for 1.0 but we haven't gotten to it yet.

On Apr 29, 2015, at 11:23 PM, Justin Santa Barbara notifications@github.com wrote:

@quinton-hoole For "move portalIP allocation to allow multi-master setups" I don't think I understand.

I believe this could be done synchronously, by essentially storing the currently-in-memory allocation pool in etcd. We could pack the data into one key and use a compare-and-swap, or we could have one key per portalIP. In one of my PRs I created a pool helper class - I had in the back of my mind that we would likely change the mechanism and wanted to avoid copying & pasting:
https://github.com/GoogleCloudPlatform/kubernetes/pull/6684/files

Is this the problem you're talking about, and do you think we could use etcd in this way?

—
Reply to this email directly or view it on GitHub.

justinsb · 2015-04-30T03:53:09Z

Ooops, sorry Clayton (& Quinton!). That'll teach me to read threads on my phone...

I don't think moving to etcd is necessary for this particular piece of work? But it sounds like if I allocate a public-port, I should use something like that utility class, so that we can more easily replace it for an etcd-backed pool for multi-master.

bgrant0607 · 2015-05-07T00:23:31Z

docs/public_ips.md

+
+1. **namespace** The service is only accessible to pods belonging to the same namespace.  This is similar
+to _cluster_, but with additional firewall rules.
+1. **dmz** The service is accessible to pods that are in the DMZ.  This is similar to _cluster_,


Could someone point me to prior discussion on dmz? I don't understand why this isn't namespace-scoped visibility, plus another mechanism to restrict access to services outside the namespace, which seems like something that belongs in SecurityContext.

thockin · 2015-05-21T16:58:01Z

Given how much th eimpl drifted from this, I think we should nix this doc, and instead update services.md to detail the final implementation. Killing this PR.

Intial design doc for public-ports & load balancers

5985883

googlebot added the cla: yes label Apr 28, 2015

justinsb mentioned this pull request Apr 28, 2015

kube-proxy: use sysctl route_localnet to simplify logic #6910

Closed

Fix a few typos & clarify hostPort auto-assignemnt on create

9d8836d

thockin reviewed Apr 29, 2015
View reviewed changes

Rename load_balancer.md -> public_ips.md

7428ced

a-robinson reviewed Apr 30, 2015
View reviewed changes

Clarify that we can't guarantee that public is routable

27e1bf6

justinsb added 7 commits April 29, 2015 21:39

Don't rework iptables in kube-proxy; require disjoint ports instead

1e0510d

Don't imply we definitely could move the LB functionality into a sidecar

94c442a

loadbalancer => public, for simplicity

3243600

Clarify hostPort rule for visibility==cluster

d4948b4

Fix pseudo-code; don't look at legacy fields

9fe00fd

Fix description of firewall rules for LB. LB =>public.

9e43f98

Document AWS & GCE cases separately

f6d8c77

They use very different models

justinsb mentioned this pull request May 2, 2015

WIP: First pass at visibility setting for service #7679

Closed

roberthbailey assigned thockin May 4, 2015

bgrant0607 mentioned this pull request May 7, 2015

Create the v1 API #7018

Closed

bgrant0607 reviewed May 7, 2015
View reviewed changes

bgrant0607 mentioned this pull request May 15, 2015

Rename createExternalLoadBalancer #4811

Closed

thockin mentioned this pull request May 18, 2015

Add support for binding services to all addresses. #7615

Closed

thockin closed this May 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial design doc for load-balancers & public-port services #7447

Initial design doc for load-balancers & public-port services #7447

justinsb commented Apr 28, 2015

ghost commented Apr 28, 2015

out-of-scope.

thockin Apr 29, 2015

smarterclayton commented Apr 29, 2015

smarterclayton commented Apr 29, 2015

a-robinson Apr 30, 2015

justinsb Apr 30, 2015

justinsb commented Apr 30, 2015

smarterclayton commented Apr 30, 2015

justinsb commented Apr 30, 2015

smarterclayton commented Apr 30, 2015

justinsb commented Apr 30, 2015

bgrant0607 May 7, 2015

thockin commented May 21, 2015

		@@ -0,0 +1,278 @@
		# Load balancing & publicly accessible services

Initial design doc for load-balancers & public-port services #7447

Initial design doc for load-balancers & public-port services #7447

Conversation

justinsb commented Apr 28, 2015

ghost commented Apr 28, 2015

out-of-scope.

thockin Apr 29, 2015

Choose a reason for hiding this comment

smarterclayton commented Apr 29, 2015

smarterclayton commented Apr 29, 2015

a-robinson Apr 30, 2015

Choose a reason for hiding this comment

justinsb Apr 30, 2015

Choose a reason for hiding this comment

justinsb commented Apr 30, 2015

smarterclayton commented Apr 30, 2015

justinsb commented Apr 30, 2015

smarterclayton commented Apr 30, 2015

justinsb commented Apr 30, 2015

bgrant0607 May 7, 2015

Choose a reason for hiding this comment

thockin commented May 21, 2015