Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Cluster Bootstrap with Gossip #30361

Conversation

lukemarsden
Copy link
Contributor

@lukemarsden lukemarsden commented Aug 10, 2016

@k8s-github-robot k8s-github-robot added kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Aug 10, 2016
@k8s-bot
Copy link

k8s-bot commented Aug 10, 2016

GCE e2e build/test passed for commit 7e9fe3d.

@philips
Copy link
Contributor

philips commented Aug 10, 2016

What problem does the gossip solve? I really struggle with this question when a Kubernetes cluster has a strong central API control plane.

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 10, 2016

What problem does the gossip solve? I really struggle with this question when a Kubernetes cluster has a strong central API control plane.

Cross-posting from my reply on #30360:

The point of gossip is to provide a user-friendly option for generating and distribute the CA cert and a list of API server URLs: precisely the inputs required to TLS bootstrapping.

Note that gossip as described here would be optional, including the "out of band" option that @alex-mohr suggested in the SIG yesterday for advanced users who want more control over what's happening, and want to specify the ca cert and API server URLs themselves.

We're assuming that's not 90% of new users who will want to kick the tires though; they'll want something that works with short strings they can copy and paste on the commandline.

@philips
Copy link
Contributor

philips commented Aug 10, 2016

@lukemarsden Right, but I still don't get how gossip helps. Let me run through my understanding:

  1. initial shared secret generated, and initial control node comes up. Initial control node (Init Node) IP is the gossip seed IP.
  2. I tell N other nodes the shared secret and IP of the initial control node for the gossip
  3. At this point the N other nodes could already talk to the API server on the Init Node and use the node API

So, what is gossip being used for?! Doesn't the shared secret (token) act as the bootstrap of trust? Why can't the initial control node generate the CA? Why is that happening over gossip?


### Gossip implementation

As soon as has Discover called on it, it attempts to form a secure mesh network using the token and the peers, using [a simple gossip protocol library](https://github.com/weaveworks/mesh). For more information on the library, see [this talk](http://infoq.com/presentations/weave-mesh) by the authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a word missing at the start of this sentence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked docs say that the gossip implementation only scales to 100 peers. Since k8s currently supports 2k node clusters (with the goal to grow quite a bit by the end of the year), how do we expect to get this out of alpha and into production?

@lukemarsden
Copy link
Contributor Author

I think the key point is that the node kubelets want the ca cert so that
they trust the api server. And that we want to do this without making the
user copy cert files around (breaks UX). Is that right @mikedanese?

On 10 Aug 2016 9:44 p.m., "Brandon Philips" notifications@github.com
wrote:

@lukemarsden https://github.com/lukemarsden Right, but I still don't
get how gossip helps. Let me run through my understanding:

  1. initial shared secret generated, and initial control node comes up.
    Initial control node (Init Node) IP is the gossip seed IP.
  2. I tell N other nodes the shared secrete and IP of the initial
    control node for the gossip
  3. At this point the N other nodes could already talk to the API
    server on the Init Node and use the node API

So, what is gossip being used for?! Doesn't the shared secret (token) act
as the bootstrap of trust? Why can't the initial control node generate the
CA? Why is that happening over gossip?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#30361 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAQJ0vgowSs7Cbo8TIdAAbTdY4yG8yoOks5qejgrgaJpZM4JhKV2
.

@lukemarsden
Copy link
Contributor Author

(I admit that I never questioned why the TLS bootstrap requires the ca cert
as an input!)

On 10 Aug 2016 10:29 p.m., "Luke Marsden" me@lukemarsden.net wrote:

I think the key point is that the node kubelets want the ca cert so that
they trust the api server. And that we want to do this without making the
user copy cert files around (breaks UX). Is that right @mikedanese?

On 10 Aug 2016 9:44 p.m., "Brandon Philips" notifications@github.com
wrote:

@lukemarsden https://github.com/lukemarsden Right, but I still don't
get how gossip helps. Let me run through my understanding:

  1. initial shared secret generated, and initial control node comes
    up. Initial control node (Init Node) IP is the gossip seed IP.
  2. I tell N other nodes the shared secrete and IP of the initial
    control node for the gossip
  3. At this point the N other nodes could already talk to the API
    server on the Init Node and use the node API

So, what is gossip being used for?! Doesn't the shared secret (token) act
as the bootstrap of trust? Why can't the initial control node generate the
CA? Why is that happening over gossip?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#30361 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAQJ0vgowSs7Cbo8TIdAAbTdY4yG8yoOks5qejgrgaJpZM4JhKV2
.

@mikedanese
Copy link
Member

Ya we need the ca.crt to trust the apiserver at initial post of the CSR.

@mikedanese
Copy link
Member

mikedanese commented Aug 10, 2016

Ok so we have two types of nodes:

  1. Nodes that know the IPs of the apiservers and know a ca public key
  2. Nodes that know the IPs of some other nodes and know a shared secret

Nodes of type 2 have to undergo a discovery process to become nodes of type 1. Why is it easier to create nodes of type 2 and do this process than to just create nodes of type 1 directly?


As soon as has Discover called on it, it attempts to form a secure mesh network using the token and the peers, using [a simple gossip protocol library](https://github.com/weaveworks/mesh). For more information on the library, see [this talk](http://infoq.com/presentations/weave-mesh) by the authors.

When that happens, kubelet uses [CRDTs](https://github.com/weaveworks/kubelet-mesh/blob/master/state.go) to support gossiping CA certs and lists of URLs of API servers. As soon as a kubelet learns of both of these pieces of information, it returns from the Discover method, and kubelet proceeds to attempt to perform TLS bootstrap against the API server running on the master.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I compromise a node that is used as the seed for other nodes, can I compromise all those other nodes?

Also dead link.

Copy link
Contributor Author

@lukemarsden lukemarsden Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIUI, if you compromise a node on the mesh and read the mesh key out of memory/disk then you'd be able to read the cacert public key and api server URLs, no more than that.

Fixed the dead link, sorry about that. The link goes to our trivial CRDT implementation for adding API server URLs to a set and deciding on CA certs deterministically based on their creation dates (falling back to comparing signatures if the dates are equal). This took us half a day to code, based on the sample code here and watching this explanation.

@mikedanese
Copy link
Member

This doesn't block #30360 right?

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 10, 2016

Why is it easier to create nodes of type 2 and do this process than to just create nodes of type 1 directly?

I think it boils down to not wanting to copy around cert files as part of the gating UX. As soon as you ask a busy person to manually scp a certificate, you lose.

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 10, 2016

This doesn't block #30360 right?

I think it does – at least, finding a discovery mechanism we can use does (could be discovery service, but I got the impression that building and operating that would be tricky plus we heard from @aronchick that users don't want to leave their firewall or operate auxiliary internal services).

Put another way, if we don't have a discovery mechanism, the best UX we can do is the one described in "out-of-band" and that is unacceptably unfriendly as the default (copying certs to machines) and thereby not competitive.

@jbeda
Copy link
Contributor

jbeda commented Aug 10, 2016

Playing devil's advocate here -- do we need a full gossip implementation?

Could we do something simple like have the type 2 node create a JWT (hmac signed by the shared secret) and ask the type 1 node for the info it needs? That response could have the certificate (along with other API servers in an HA world) encoded in another JWT (also hmac signed).

More fully, I'd think about having a "cluster parameters" bag of data that a client needs to talk to the cluster -- this includes a bunch of ways to reach the cluster (DNS, internal IPs, external IPs) along with a set of root certs to trust for that cluster. In this case a "client" includes other server components, node components (kubelet/kube-proxy) and kubectl/kubeadm. Ideally the client code would periodically ask the API server for an updated bag (version numbers?) and cache those results. The only way things would really get screwed up is if (a) the set of API servers turns over completely or (b) the root certs rotate out in between client pings.

Obviously gossip is one way to distribute this bag, but it may not be really necessary as we do already have a strongly consistent control plane.

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 10, 2016

Obviously gossip is one way to distribute this bag, but it may not be really necessary as we do already have a strongly consistent control plane.

I think the point here is that we don't yet have a control plane set up, and we're trying to bootstrap that control plane in a simple and user-friendly way. Writing a tiny amount of code which uses an existing well-tested gossip library to securely distribute the cacert and api server URLs to the peers seems sensible. Having the gossip network available will also dramatically simplify bootstrapping multi-master setups in such a way that the IP used to join the nodes to the masters isn't "special" (although that's not in scope for Phase I), I think.

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 10, 2016

Ideally the client code would periodically ask the API server for an updated bag (version numbers?) and cache those results.

This sounds like what the gossip library is already good at: tracking a changing set of nodes without requiring quorum or careful operational management. We can get into this more in Phase II when we get to changing/adding/removing masters in multi-master, but IMO it's the "right tool for the job".


#### Masters

Initializing a new master shouldn't require TLS bootstrap, because the master already has privileged access to the API server. New masters have to add their own address into the API server URL list in the mesh case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new master needs TLS serving certs. is this assuming the master will generate self-signed ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this scheme the first master would generate the CA cert and put its public key out on the gossip network so that the joining nodes can trust and know they're talking to the real API server.

Advanced users may want to specify this cert, we shouldn't stop them. In fact, the "out of band" option will allow them to do exactly this.

We may also want to allow the user to provide a DNS name for the API server, and to have the initial cert have that added as subjectAltName etc. The default "I know nothing" assumption would be to put the IP address of the first master into gossip as the initial API server address.

@thockin
Copy link
Member

thockin commented Aug 11, 2016

The part I am missing (though some of the other questions here resonate for me) is how you end up with a "type 2" node at all? How do these gossip peers find each other?

@lukemarsden
Copy link
Contributor Author

@thockin

Sorry that this wasn't clearer. From @mikedanese's definitions:

  1. Nodes that know the IPs of the apiservers and know a ca public key
  2. Nodes that know the IPs of some other nodes and know a shared secret

In my mind a "type 2 node" is one in "gossiping" state in the proposal – it helps to look at the rendered markdown view where you can see the diagrams. A type 1 node is one that's progressed to the "bootstrapping" state.

The part I am missing (though some of the other questions here resonate for me) is how you end up with a "type 2" node at all? How do these gossip peers find each other?

A user ends up with a type 2 node by typing the commands proposed in #30360. Assuming the master is on 10.0.0.1, a user would... (simplified slightly for legibility).

Run on the master:

master# kubeadm init master
Token: ABCDEFG

And run on the node:

node# kubeadm join node --token=ABCDEFG 10.0.0.1
Joined!

(Under the hood, there is auto-signing of CSRs going on here as the node gets the cacert and API server address from gossip, then performs TLS bootstrap – let's assume that in future users would be able to turn off auto-signing and then it accepting the node in would involve the more-secure kubectl list-approvals and kubectl approve csr-abcd.)

To expand on this: when the gossip protocol is running on all the kubelets, you can join any server into the cluster just by specifying the (short) shared secret and the address of one other machine in the cluster. It doesn't have to be the same IP of some master that was originally set up months ago and which has since been destroyed: the gossip library handles keeping track of peers (and I'd propose that it continues to do that for the group of masters when we go multi-master so that nodes can always discover a working set of API servers on startup).

Immediately in the single-master kick-the-tires case gossip saves the user from scping a cacert around and manually figuring out the API server url – which is important for UX! In the future it could also simplify having a changing set of masters over time in multi-master setups. Additionally, it will allow cacert changes to be distributed "underneath" the control plane.

But what's critical in the next few days is that we push ahead with a discovery mechanism which gets us to kubeadm init and kubeadm join nirvana so that Kubernetes can remain competitive, and get this into an alpha feature in time for 1.4!

@jbeda
Copy link
Contributor

jbeda commented Aug 11, 2016

Correct me if I'm wrong here -- gossip really excels when you have a set of nodes with dynamic membership where there is no explicit leader. That isn't the case here. From the very start (after the first kube init master we have a leader. The requirements here are simpler than what you get from gossip -- (1) make sure the API servers know about each other and (2) communicate that full set to clients.

Since we have etcd behind the scenes, (1) is actually pretty easy. The API servers register with it. As for (2) we can have an explicit API call where the clients can ask for and cache the current set of api servers. That along with some sort of ringdown (if one API server doesn't answer try another) means the clients can find the cluster as long as one of the API servers that they knew about last time is still around.

While we could have the clients of the API server (pretty much all other actors in the k8s world including kubectl) join the gossip network to get this stuff it seems a little wacky to have kubectl do so.

While it looks like gossip may provide some value here, I suspect it may be overkill. And it introduces new failure modes and dependencies.

Am I missing something big here?

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 11, 2016

@jbeda

Think of our proposal for adding gossip, at least for the single master case we're trying to land in the next 8 days, just as a replacement for the discovery service concept in #28422 – critically, one which doesn't rely on an external discovery. We've heard that users wouldn't like that: they don't want to leave the firewall, they don't want their cluster to rely on a third party service which might go down, and they don't want to turn up auxilliary service just to start another service (Kubernetes).

All that gossip is really doing in this case is turning a long ca cert in a file (and the API server URL) into a short gossip token on the commandline, but this makes the UX much, much better. That in turn enables easy, secure distribution of that ca cert to the nodes so that they can kick off TLS bootstrapping.

The ringdown stuff makes sense for multi-master, sure. Given that the output of the gossip conversation would be a list of API servers, we'd need to do something like that even if we use gossip for discovery.

While we could have the clients of the API server (pretty much all other actors in the k8s world including kubectl) join the gossip network to get this stuff it seems a little wacky to have kubectl do so.

We're not suggesting this :) Once gossip is used to distribute the cacert for TLS bootstrapping, the rest will be normal kube APIs.

@xiang90
Copy link
Contributor

xiang90 commented Aug 11, 2016

@lukemarsden

Run on the master:

master# kubeadm init master
Token: ABCDEFG

And run on the node:

node# kubeadm join node --token=ABCDEFG 10.0.0.1
Joined!

What benefits gossip provide us if we already need to provide a master IP to other nodes? Why cannot the master node distribute information by either pushing to the contacted nodes, or the nodes directly pull from the given master? Why do we need to use gossip to distribute information?

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 11, 2016

@xiang90
Because we're assuming the network is untrusted, and we need to get the cacert securely to the node without tampering. We could implement Diffie-Hellman ourselves ;) but the benefit of using the gossip library we're proposing is that it handles securing the link between the master and the bootstrapping node for us (using NaCL, from the Golang stdlib), so that the node can learn the cacart prior to doing TLS bootstrapping. The gossip network is only being proposed for this very narrow use-case, the rest of the cluster installation would be standard Kubernetes.

@xiang90
Copy link
Contributor

xiang90 commented Aug 11, 2016

@lukemarsden So we use a gossip library but not using it gossip functionality? We actually want the secure transport between the announced node and its receiver? Can we make this clear in the proposal?

@lukemarsden
Copy link
Contributor Author

@xiang90 true, the desired behavior could be achieved with secure point-to-point (request/response) functionality for the single-master case. The gossip functionality will become more useful in multi-master case I think, as I was discussing with @mikedanese earlier.

@derekwaynecarr
Copy link
Member

There are a lot of kubelet bootstrapping stuff in the works: #29459

@kubernetes/sig-node @mtaufen

@jbeda
Copy link
Contributor

jbeda commented Aug 11, 2016

I wrote up an alternate/additional approach based on JWS -- please take a look and let me know what you think. If it resonates I'm happy to markdownify and submit a PR. https://docs.google.com/document/d/1GVMLTBrEH5kXGxo0fWTqiRxG4dSxQQzyUGLA7PxhzRY/edit#heading=h.i97w20td4jrk

@aronchick
Copy link
Contributor

Alternatively, what's the risk with gossip? Yes, it does way more than we need, but it's super well understood, there are lots of libraries out there, and gives us optionality in the future (e.g. if the master goes away, we could use it to elect a new master). Is it just that it's more than we need now?

@aronchick
Copy link
Contributor

Had a discussion offline - I withdraw my previous comment.

@thockin
Copy link
Member

thockin commented Aug 12, 2016

Can you elucidate? I don't feel like I am well-equipped to derive it from
first-principles just now.

On Thu, Aug 11, 2016 at 7:54 PM, David Aronchick notifications@github.com
wrote:

Had a discussion offline - I withdraw my previous comment.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#30361 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEV9Bgdx_1f12Mme3VuJFblZ8YASks5qe-BbgaJpZM4JhKV2
.

@thockin
Copy link
Member

thockin commented Aug 12, 2016

@lukemarsden if we assert that the short token is secure enough to talk to
a random peer, then surely it is secure enough to talk to the apiserver.
Given that, why do we need to make it gossip rather than "simpler
point-to-point" ?

On Thu, Aug 11, 2016 at 11:31 PM, Tim Hockin thockin@google.com wrote:

Can you elucidate? I don't feel like I am well-equipped to derive it from
first-principles just now.

On Thu, Aug 11, 2016 at 7:54 PM, David Aronchick <notifications@github.com

wrote:

Had a discussion offline - I withdraw my previous comment.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#30361 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEV9Bgdx_1f12Mme3VuJFblZ8YASks5qe-BbgaJpZM4JhKV2
.

@thockin
Copy link
Member

thockin commented Aug 12, 2016

I read over @jbeda's doc and while I am the last person you want doing
security, it seems net simpler to me. Can you rebut @lukemarsden?

On Thu, Aug 11, 2016 at 11:34 PM, Tim Hockin notifications@github.com
wrote:

@lukemarsden if we assert that the short token is secure enough to talk to
a random peer, then surely it is secure enough to talk to the apiserver.
Given that, why do we need to make it gossip rather than "simpler
point-to-point" ?

On Thu, Aug 11, 2016 at 11:31 PM, Tim Hockin thockin@google.com wrote:

Can you elucidate? I don't feel like I am well-equipped to derive it from
first-principles just now.

On Thu, Aug 11, 2016 at 7:54 PM, David Aronchick <
notifications@github.com

wrote:

Had a discussion offline - I withdraw my previous comment.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/kubernetes/kubernetes/pull/
30361#issuecomment-239349700>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVEV9Bgdx_
1f12Mme3VuJFblZ8YASks5qe-BbgaJpZM4JhKV2>
.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#30361 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGkfoS4LMbiq_vSEy-0j2-jjCAkdks5qfBP7gaJpZM4JhKV2
.

@lukemarsden
Copy link
Contributor Author

@thockin @aronchick @jbeda @mikedanese @philips

Thanks everyone for the input on this!

@jbeda, we really like your JWT solution. It seems simpler and more consistent with the rest of Kubernetes, and it satisfies the requirements. I still think gossip + CRDTs can have value for multi-master and federated setups, but multi-master isn't what we're trying to get to in this first iteration of the new UX, which we all seem to agree on in #30360.

So, @errordeveloper and I are busy coding. What we're doing is:

  1. Implement kubeadm so that it implements the "out-of-band" option as an "advanced option", leaving the UX open for "init" and "join" top-level implementations.
  2. Plug the existing gossip prototype into kubeadm so we get to the e2e desired UX.
  3. Discuss/demo it in the SIG on Tuesday, then work with the community to swap it out for the JWT based approach – if that is what the community consensus is.

In order to get this all working by the end of next week we will need some help!

Hope this is helpful. Happy weekend folks!

@jbeda
Copy link
Contributor

jbeda commented Aug 12, 2016

Sounds like we have a plan. I'll convert my google doc to a markdown proposal some time over the weekend.

@philips
Copy link
Contributor

philips commented Aug 13, 2016

It is very unlikely that we will get an API change of this magnitude merged into v1.4. And, I know that prototyping quickly is an important goal of this effort. But, I think that this workflow can be accomplished without any changes to k8s:

kubeadm init master

Self-hosted:

  1. User installs and configures kubelet to attach to API server http://localhost:8080
  2. API server CA certs are generated by kubeadm (https://github.com/coreos/bootkube/blob/master/pkg/tlsutil/tlsutil.go#L67)
  3. kubeadm launches temporary API server (maybe vendor bootkube?)
  4. temporary API server asks kubelet to launch an API server and etcd
  5. temporary API server shuts down and self-hosted API server takes over (Retry when apiserver fails to listen on insecure port #28797)
  6. kubeadm pushes replica set from 3. into self-hosted API server
  7. kubeadm pushes replica set for prototype jsw-server and the JWS into API server with host-networking so it is listening on the master node IP
  8. kubeadm prints out the IP:port of JWS server and JWS token

Static Manifest:

  1. User installs and configures kubelet put manifests in /etc/kubernetes/manifests
  2. API server CA certs are generated by kubeadm (https://github.com/coreos/bootkube/blob/master/pkg/tlsutil/tlsutil.go#L67)
  3. kubeadm generates pod manifests to launch API server and etcd
  4. kubeadm pushes replica set for prototype jsw-server and the JWS into API server with host-networking so it is listening on the master node IP
  5. kubeadm prints out the IP:port of JWS server and JWS token

kubeadm join IP:port token

This next steps requires #30090 cc @mtaufen

  1. User installs and configures kubelet to have a ConfigMap at /etc/kubernetes/kubelet-config-map but the kubelet is in a crash loop and is restarted by host init system
  2. kubeadm talks to IP:port with token and gets the API server IP:port, cert, etc and generates a kubelet ConfigMap
  3. kubeadm places ConfigMap into /etc/kubernetes/kubelet-config-map and waits for kubelet to restart
  4. Mission accomplished, I think.

Thanks to @aaronlevy for whiteboarding this with me.

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 13, 2016

Thanks @philips and @aaronlevy! This looks great. @jbeda what do you think?

Our WIP efforts to implement kubeadm are currently here:
https://github.com/errordeveloper/kubernetes/pull/1/files

Can we collaborate early next week to converge these ideas and make this happen?

cc @errordeveloper


## Motivation

As part of the dramatically simplified cluster creation UX described in the above linked proposal, it is desirable to have a mechanism for discovery which enables the desired UX without depending on an external network service, such as a discovery service, so that we don’t have to operate one, and users don’t have to leave their firewall to provision a cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed the discussion around this, but I'm not convinced that it's valuable to avoid having a discovery meet-up point (by default) to simplify installation.

@roberthbailey
Copy link
Contributor

@lukemarsden:

We've heard that users wouldn't like that: they don't want to leave the firewall, they don't want their cluster to rely on a third party service which might go down, and they don't want to turn up auxilliary service just to start another service (Kubernetes).

The discovery proposal was a way to simplify the UX for folks that didn't want to scp around a cert and type in multiple arguments. But the intent was not to preclude that case either. In a scenario where someone doesn't want to rely on an external service or to run their own bootstrap endpoint, it seems like they might be ok typing a few extra arguments on each node to set up a private cluster. For larger deployments, you'd just run your own bootstrap server (since it's just a docker container and it should be easy to run).

@lukemarsden
Copy link
Contributor Author

lukemarsden commented Aug 16, 2016

@philips @aaronlevy are you sure that #30090 is required for the crashloop plan to work? We think we can make this work already by having kubeadm write out a kubeconfig and have kubelet run with --kubeconfig=<file which kubeadm eventually writes>.

@jbeda
Copy link
Contributor

jbeda commented Aug 16, 2016

@roberthbailey It seems like we are starting to gel on the JWS based approach. I need to put that in a PR as a markdown file. I'll try to find time today to get that going. I'd like to make it compatible with what you've been thinking around a bootstrap API but also allow for the API server itself to satisfy this for a lighter weight experience.

@lukemarsden
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.