Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial design doc for load-balancers & public-port services #7447

Closed
wants to merge 11 commits into from

Conversation

justinsb
Copy link
Member

FAO: @thockin @smarterclayton @quinton-hoole @ArtfulCoder

(Probably of interest also to @a-robinson @benmccann and @bakins)

Initial design doc for public-port services & load-balancers.

I have tried to synthesize all the discussion in #6910, but am also mindful of something that we can hope to achieve in the V1 timeframe.

A big simplification was that @a-robinson pointed out that the cloudprovider implementation can return the IPs that are associated with a load balancer (and that this is a good change long-term for reconciling failed create operations). That moves the reported IPs from Spec to Status, which then means we don't need to address the object-update-scoping problems now. I think.

I kept publicIPs more-or-less as-is, but they move from Spec to Status. Does this address your concerns with them @thockin? And then do we need loadbalancerIPs?

There has been some discussion of whether we should rename portalIP to something more self-evident. I think that is unrelated (although my vote is for clusterIP). While we're talking naming, I think I prefer publicPort to hostPort. And is sidecar-pod the right word?

Also not in-scope: adding phases / lifecycle events to services.

I can imagine that we might want to configure a system-pod that runs haproxy & listens on port 80 & 443. This would be very useful for Digital Ocean. I think this would be specially configured, and would bypass the API checks. I don't think we need to do anything for that just yet, so also out-of-scope.

@ghost
Copy link

ghost commented Apr 28, 2015

On Tue, Apr 28, 2015 at 9:56 AM, Justin Santa Barbara <
notifications@github.com> wrote:

FAO: @thockin https://github.com/thockin @smarterclayton
https://github.com/smarterclayton @quinton-hoole
https://github.com/quinton-hoole @ArtfulCoder
https://github.com/ArtfulCoder

(Probably of interest also to @a-robinson https://github.com/a-robinson
@benmccann https://github.com/benmccann and @bakins
https://github.com/bakins)

Initial design doc for public-port services & load-balancers.

I have tried to synthesize all the discussion in #6910
#6910, but am
also mindful of something that we can hope to achieve in the V1 timeframe.

A big simplification was that @a-robinson https://github.com/a-robinson
pointed out that the cloudprovider implementation can return the IPs that
are associated with a load balancer (and that this is a good change
long-term for reconciling failed create operations). That moves the
reported IPs from Spec to Status, which then means we don't need to address
the object-update-scoping problems now. I think.

Cool. I like.

I kept publicIPs more-or-less as-is, but they move from Spec to Status.
Does this address your concerns with them @thockin
https://github.com/thockin? And then do we need loadbalancerIPs?

I'd vote for moving publicIP's from spec to status (and if necessary
renaming to externalIPs to make the transition easier). And yes, I vote
for getting rid of loadBalancerIP's.

There has been some discussion of whether we should rename portalIP to
something more self-evident. I think that is unrelated (although my vote is
for clusterIP).

Yes, I agree, clusterIP works for me.

While we're talking naming, I think I prefer publicPort to hostPort.

Agree. PublicPort tells application deployers what the thing is, which is
what we want. HostPort tells cluster builders how it works under the
covers, which is not what we want here.

And is sidecar-pod the right word?

Also not in-scope: adding phases / lifecycle events to services.

It breaks my heart, but agree with limiting scope for v1.0. Adding it
afterwards seems fairly straightforward.

I'll read your doc today, and comment there too :-)

Q

I can imagine that we might want to configure a system-pod that runs

haproxy & listens on port 80 & 443. This would be very useful for Digital
Ocean. I think this would be specially configured, and would bypass the API
checks. I don't think we need to do anything for that just yet, so also

out-of-scope.

You can view, comment on, or merge this pull request online at:

#7447
Commit Summary

  • Intial design doc for public-ports & load balancers

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#7447.

@@ -0,0 +1,278 @@
# Load balancing & publicly accessible services
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be docs/design/public_ips.md or similar.

@smarterclayton
Copy link
Contributor

I spent some time thinking about the general service status issue. I think we've got two parts of the story for services mostly thought through identity -> (name, selector) and backends -> (endpoints, ports). The other part though is ingress - how does someone getting a service know the different ways that a service can be approached.

There are a bunch of ways:

  • Direct to the host port
  • To a load balancer port set up in the infrastructure
  • To a load balancing router somewhere (multiplexed for HTTP/HTTPS on a single IP/Port)
  • Via DNS; potentially multiple - one or more internal names, possibly one or more internal domains, and then each of the others might also have DNS entries tied to them

Today in Kube there's no way to find out what the DNS name of your service is - we could define it by convention, but it's not discoverable by tools or end users.

In OpenShift we have a route; one or more DNS names (plus port or path or protocol) that themselves are located on an IP or a stable DNS name. We point to the service, but there's no way the service can describe the different ways it is currently being pointed to.

Why not create an ingress list? It's possibly more varied that endpoints, and it could be a separate resource, but at least right now it feels more like:

service:
  status:
    portals:
    - protocol: tcp
      type: InternalServicePortal
      dns:
      - kubernetes.default.k8s.io
      - kubernetes.default.mycompany.com
      ips:
      - 172.0.0.1
    - protocol: http
      type: ExternalRoute
      ref:
        kind: RouteCreatedInOpenShift
        name: some-route
        namespace: bar
      dns:
      - mycompany.foo.bar.com
      ips:
      - ip1.of.router1
      - ip2.of.router2

It's a fairly annoying problem that a client can't find out which internal point to talk to, or discover external paths. Exposing some sort of structured object like this (as service status or as a separate resource) would allow clients to determine how to talk to them.

@smarterclayton
Copy link
Contributor

And to be clear, what is above is a sketch of the discovery side of services - actual objects (GCE API load balancer, OpenShift Route, future FloatingIP object) would still exist and have controllers that registered/announced themselves into the service.

1. API server assigns a hostPort (if the load balancer needs one), accepts and persists
1. LB Controller wakes up, allocates a load-balancer, and `POST`s `/api/v1/namespaces/foo/services/bar` `Service{ Status{ publicIps: [ "1.2.3.4" ] }}`
1. Service REST sets `Service.Status.publicIps = [ "1.2.3.4" ]`
1. kube-proxy wakes up and sets iptables to receive on 1.2.3.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except when the load balancer allocates a DNS name, and it needs to listen on a host port? As written, this seems muddled together with the public case, rather than one being nested in the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was definitely muddled here. Making loadbalancer => public simplifies this, I think. I wrote out the two main cases (GCE, AWS) separately.

@justinsb
Copy link
Member Author

@smarterclayton I agree that figuring out the ingress method for a service is too complicated right now. I like the very structured method you suggested, though I don't think I'll have enough time to implement it.

I think the two main use cases here are an external program consuming a service, and a service published through a load-balancer for general Internet usage.

  • The load-balancer case is straightforward because k8s does not do a lot here: the operator currently has to set up a CNAME or A record after they create a service, pointing to the load balancer. We could make this better.
  • For another cluster consuming a service, that cluster can call the k8s API to list the service to get the hostPorts, and then list the minions to get the list of IPs. Not trivial, but it will always have to query the API, so this is not too bad... The simplicity of the visibility nesting model I think makes this tolerable: anything that is not 'cluster' will have a hostPort, anything that has a hostPort is reachable.

Definitely something to improve in V2.

What I suspect we'll end up doing in V1 or V1.1 is publishing some helpful records into skydns, and setting visibility==public on skydns.

@smarterclayton
Copy link
Contributor

On Apr 29, 2015, at 10:52 PM, Justin Santa Barbara notifications@github.com wrote:

@smarterclayton I agree that figuring out the ingress method for a service is too complicated right now. I like the very structured method you suggested, though I don't think I'll have enough time to implement it.

I think the two main use cases here are an external program consuming a service, and a service published through a load-balancer for general Internet usage.

The load-balancer case is straightforward because k8s does not do a lot here: the operator currently has to set up a CNAME or A record after they create a service, pointing to the load balancer. We could make this better.
For another cluster consuming a service, that cluster can call the k8s API to list the service to get the hostPorts, and then list the minions to get the list of IPs. Not trivial, but it will always have to query the API, so this is not too bad... The simplicity of the visibility nesting model I think makes this tolerable: anything that is not 'cluster' will have a hostPort, anything that has a hostPort is reachable.
Definitely something to improve in V2.

What I suspect we'll end up doing in V1 or V1.1 is publishing some helpful records into skydns, and setting visibility==public on skydns.

My primary concern is not underestimating this problem - I'm fine with small tactical changes that get us better for 1.0, but I'd rather do less dramatic changes to service if we already know that larger work is coming.

The two things I think we have to do for 1.0 is

  • have a method for external traffic to ingress (variants on service public ip)
  • move portalIP allocation to allow multi-master setups (clients have to tolerate portalIP being empty) because we can't make that change after v1 is set


Reply to this email directly or view it on GitHub.

@justinsb
Copy link
Member Author

@quinton-hoole I don't think I understand "move portalIP allocation to allow multi-master setups"

I believe this could be done synchronously, by essentially storing the currently-in-memory allocation pool in etcd. We could pack the data into one key and use a compare-and-swap, or we could have one key per portalIP. In one of my PRs I created a pool helper class (pool_allocator.go) - I had in the back of my mind that we would likely change the mechanism and wanted to avoid copying & pasting:
https://github.com/GoogleCloudPlatform/kubernetes/pull/6684/files

Is this the problem you're talking about, and do you think we could use etcd in this way?

@smarterclayton
Copy link
Contributor

I'll assure you meant me.

Basically yes. We've talked about it in 6 or 7 different issues - the current thought is a CAS from a controller because we don't want to waste time solving contention / issues right now (want to get it out of crit path faster). If you want to tackle it (in whatever way you want) please do - it's on the short list of issues from the Red Hat side for 1.0 but we haven't gotten to it yet.

On Apr 29, 2015, at 11:23 PM, Justin Santa Barbara notifications@github.com wrote:

@quinton-hoole For "move portalIP allocation to allow multi-master setups" I don't think I understand.

I believe this could be done synchronously, by essentially storing the currently-in-memory allocation pool in etcd. We could pack the data into one key and use a compare-and-swap, or we could have one key per portalIP. In one of my PRs I created a pool helper class - I had in the back of my mind that we would likely change the mechanism and wanted to avoid copying & pasting:
https://github.com/GoogleCloudPlatform/kubernetes/pull/6684/files

Is this the problem you're talking about, and do you think we could use etcd in this way?


Reply to this email directly or view it on GitHub.

@justinsb
Copy link
Member Author

Ooops, sorry Clayton (& Quinton!). That'll teach me to read threads on my phone...

I don't think moving to etcd is necessary for this particular piece of work? But it sounds like if I allocate a public-port, I should use something like that utility class, so that we can more easily replace it for an etcd-backed pool for multi-master.


1. **namespace** The service is only accessible to pods belonging to the same namespace. This is similar
to _cluster_, but with additional firewall rules.
1. **dmz** The service is accessible to pods that are in the DMZ. This is similar to _cluster_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could someone point me to prior discussion on dmz? I don't understand why this isn't namespace-scoped visibility, plus another mechanism to restrict access to services outside the namespace, which seems like something that belongs in SecurityContext.

@thockin
Copy link
Member

thockin commented May 21, 2015

Given how much th eimpl drifted from this, I think we should nix this doc, and instead update services.md to detail the final implementation. Killing this PR.

@thockin thockin closed this May 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants