-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decision: Standardize on Rudder/Flannel for k8s networking? #1307
Comments
Forgot to add -- does this sound like a good idea? I think this is compatible with where Rudder might go but not 100% sure. We want to make sure as we do things like IP-per-service and figure out access to external IPs we can make it work with or be built into rudder. I also like that rudder has independent utility outside of Kubernetes. cc: @brendandburns |
@rajatchopra can you speak to how OVS setups and things like OpenDaylight would fit here? In cases where you have an external network controller or IaaS - does Rudder add anything? |
In cases where you have an external network controller or IaaS - does Rudder add anything? How does OVS fit here? I believe, it is important that we keep an eye on flow based switching. Multicasts will not survive the scale, and future needs of kubernetes may include strict tenant isolation. Will Rudder evolve to manage mac-addresses, ip-addresses, vlan tags, vnid per tenant, and control the switching based on all above? |
Echoing @rajatchopra -- OVS/vxlan already solves the problems that rudder intends to solve eventually. A big advantage that OVS has is using flows to program the network to one's needs. Providing entire IP space to each tenant is already possible using flows. |
@mrunalp We need to support various environments. There are some environments (such as GCE) where you don't need to do any encap at all and it is about assigning subnets and configuring cloud control systems. Things I like about rudder:
|
Note that rudder is now called flannel. |
From various IRC conversations, I wanted to snapshot the different strategies that folks have taken to get networking for k8s up and running:
It be great if this were layered well instead of being all ad-hoc. |
That is a pretty good list. It shows that different use cases will require different networking setup/topologies. So, if one of them is chosen as the default, it would be good if the APIs still allow for using other advanced networking solutions. Use cases that VxLan with OVS/OpenFlow helps solve are:
|
Well layered is definitely important - I think default isn't an issue, it's just that the abstraction from Kube -> registration -> rudder / other -> node agent is should be very clear. |
Here is how I think about it. For a network solution to work, we need the following:
As long as this is met, kubernetes will work just fine. If Flannel can encapsulate making that happen it is a good fit. /cc @kelseyhightower |
I don't have any objections at the current level of detail, would like to see an in depth proposal to have an opportunity to understand what it means. |
@smarterclayton What types of networking do you think are a "must have" before we can switch over? Would you be happy, for example, if the vagrant set up switched over to using the UDP encap instead of GRE/OVS? |
I'm actually less concerned about the types of networking - instead, it's mostly that real networks and configurations are some of the most heavily opinionated and predefined aspects of deployments. I can imagine an almost infinite amount of flexibility that people will want in real deployments, but I don't think that Kubernetes has to solve that out of the box, nor that Flannel has to solve that out of the box. Instead, the list of requirements you listed above are what I consider a good first step, and is an excellent place for Flannel to be deployed and integrate with Kubernetes. My concern is solely that an integrator should be able to tie concepts and their own network topologies into Kubernetes (because they have vendor X network solution that can provide this). So I would want clean abstractions between "schedule this pod on this host" and "I have to schedule, then make a call to Flannel, etc". I don't think you're proposing the latter, but I did want to understand a bit more about what the integration with Flannel would look like. |
How about this -- we make Flannel the default mechanism for configuring the network for a Kubernetes cluster but we don't make it a hard requirements. If there is some feature that requires us to reach out and configure the networking layer we'd make it pluggable similar to the cloud provider stuff. But by working with Flannel hand-in-hand we'd guarantee we have something that works (perhaps not optimally) out of the box with a minimum of futzing. |
That works for me EDIT: That being said, we'd like to be able to use the vagrant environment as testbeds for alternative networking even if the default ootb is a "just working" Flannel config |
I'm OK with that once Flannel demonstrate their ability to run native GCE On Tue, Sep 23, 2014 at 1:29 PM, Clayton Coleman notifications@github.com
|
@kelseyhightower is way ahead of you Tim: https://github.com/kelseyhightower/flannel-route-manager My gut is that we should merge the route manager stuff into core flannel. I think it does a great job of showing how flannel is a general network config framework instead of a specific implementation (encap over UDP). |
@smarterclayton One thing I'd love to do is to simplify/minimize the salt config. Right now it is a rats nest where we conflate cloud, host os, and network strategy. Are you cool if the salt configs assume flannel? (with the ability to paramaterize the flannel config?) |
I think I am OK with this. Will this cause OVS people to get in a huff, or On Tue, Sep 23, 2014 at 3:32 PM, Joe Beda notifications@github.com wrote:
|
I'd see flannel as being a simple driver for OVS. We need to close the loop there though. |
There are two separate problems that flannel solves (atleast from what I've seen/read from their readme).
If Kubernetes/flannel were to allow plugins for both these features, then the use cases that we outlined above could be solved. There could be a custom IP assignment plugin that leases out same IPs to different tenants if necessary and an OVS plugin with either GRE/VxLan could be used to create the overlay network. |
I appreciate the salt config is complex, but the primary purpose of the vagrant env at least was to be able to test and develop Kubernetes in a range of configs. If there's no non-rudder path, that complicates how folks integrate and test against other overlay or network configs. I really would like to preserve the ability for people to hack on the harder bits of a kube setup in a controlled way - if it's an attention / time thing we can work harder to ensure that path stays working, or potentially we can limit it to the Ansible code path (assuming that was in tree and runnable against vagrant). I like opinionated choices that make the experience for people trying out kube better, and supporting general use on a wide range of topologies (which flannel does well and the vagrant gre/ovs setup does not). I guess I'm not convinced yet that some important deployment modes for kube won't have to deal with other SDN solutions, and that we won't want test beds for them in tree (like the iaas providers).
|
@mrunalp I'll leave it to @kelseyhightower and @eyakubovich to speak to pluggability on the IP assignment side. @smarterclayton I hear you -- but right now the Vagrant set up is pretty hard coded to the OVS/GRE strategy. The salt stuff is very brittle in general. I'm cool with writing the salt configs so that we can squeeze in other non-flannel strategies but I'd like to keep it as isolated as possible. How about we argue about this some more as the PR comes up for review and we have specifics? |
|
@mrunalp I wasn't planning on making those strategies pluggable in the sense of having 3rd-party plugins. In case of flannel doing UDP encap, it would require the plugin to communicate IP mappings in real time. We're working on VXLAN (without OVS) and I don't see much advantage of using GRE (except slightly smaller space overhead). @jbeda I would like to merge flannel-route-manager into flannel as well. |
@eyakubovich The only reason that I think users might prefer GRE over UDP is that any networking gear that is monitoring what is happening will be able to classify and monitor the GRE traffic separate from UDP. I think generally UDP is seen as "application" traffic and GRE is seen as "network infrastructure" traffic. In addition, network capture tools and the like know how to crack GRE and reconstruct what is going on inside the encap'd stream. Personally I like UDP encap as it is supported pretty much everywhere but I can understand why others would like GRE. |
As a note for people coming to this thread, there are other discussions in progress around how OVS/ODL could integrate into Kubernetes or Flannel. I'll make sure that there is an issue linked here to discuss that separately. |
It seems to me that using flannel (as it is today) is going to mean that etcd has to be exposed to the minion. Which I thought a goal was to remove the etcd access requirement from the kubelet/minion. Was I mistaken that was a desired architecture redesign? To make the only communication with etcd from the apiserver rather than from each and every node? |
@eparis That is a good point. There is no reason that k8s and flannel have to hit the same etcd. It also doesn't break the conceptional model if flannel requires etcd but k8s hides it. But as a practical matter it is crazy to run 2 etcd instances. Perhaps this is a impetus for the CoreOS guys to either (a) support some level of ACLing in etcd or (b) abstract out the etcd API such that it can be implemented by a domain specific proxy (and perhaps other stores). |
----- Original Message -----
And at small scales, I don't think it's a huge issue to talk directly to etcd from the minions. The bigger your cluster gets, and the more diverse your workloads become, the more important it becomes to introduce a separating abstraction between minions and the central data store.
I think both of those are valuable. The second could just be for the read api (the flannel component could still write/read directly). Another option c) could be to invest in making the etcd client be able to work with transparent proxies (so the apiserver could proxy /api/v1beta1/some/arbitrary/path directly to etcd as /some/prefix/some/arbitrary/path via path rewriting or something). We've actually been interested in that in general as offering transparent etcd-as-a-service as a resource to consumers of Kubernetes. |
cc: @philips |
/cc @pietern re. OVS |
@jbeda @mrunalp GUE is another option here for encap (flannel-io/flannel#64). And I think it is an interesting one because it keeps everything L3 and in-Kernel. @smarterclayton @jbeda On the ACL topic it is something we want to do and it would be great if someone could help define an API and implement it as a proxy. Taking that work and bringing it into etcd itself should be straightforward after that. In etcd 0.5.0 (currently in alpha) we have a proxy package for implementing simply etcd proxies now. Putting an ACL thing in front should be straightforward. Here is a simple proxy example built on the package that filters out certain key prefixes and HTTP verbs. |
@philips That looks interesting. I will check it out. Thanks. |
For reference, OpenStack's IP management API: http://docs.openstack.org/api/openstack-network/2.0/content/Overview-d1e71.html |
Description of IBM's SDN for containers: http://thoughtsoncloud.com/2014/12/can-enterprise-portable-network-docker-opportunity-sdn/ |
I believe that the original discussion here is mostly overtaken by the plugins introduced with #5069. The only question that still remains is whether there needs to a default plugin or not, but that's a separate discussion, I believe, as the context has changed with introduction of plugins. |
Given @errordeveloper's comment above, should this issue be closed? Came across it and the current state is slightly ambiguous. |
Yes, thanks. |
UPSTREAM: 110039: Add readinessProbe to aggregated api service test
Rudder could be built into a separable layer that takes various environments and has various techniques to make IP-per-pod a reality.
This is forked from the discussion in #1059.
cc: @thockin, @eyakubovich
Details on plumbing GCE advanced routing into Rudder:
From @eyakubovich:
I actually think that for GCE would be a little more complicated. Configuring routes in the GCE API would require permission to call out to the API. I think that we need a "reconciler" that takes the IP ranges assigned to a node and mirrors that into the routes in the GCE API.
A quick sketch of what this might look like:
I think that this model might be necessary for other places where network gear needs to be programed based on dynamic allocations made by rudder.
The text was updated successfully, but these errors were encountered: