Proposal for Deferred Creation of Addon Objects #3579

zmerlynn · 2015-01-16T22:00:57Z

Deferred Creation of "Addon Objects"

In PRs #2224 and #3292, we added a considerable amount of initialization of state in kube-up.sh after the actual kube-up call. This pattern really doesn't work very well for GKE: In a GKE, kube-up is essentially a single API call that can be made from a variety of clients that aren't kube-up.sh.

This bug proposes a way to pieces of the initialization in kube-up.sh onto the master, so that, aspirationally, kube-up.sh results in a single call to a provider-specific kube-up.

Phase 1: Cheesy `init.d` script on the master to create objects

In this phase, we move (some) initialization to an init.d script on the master. The actual YAML files for the created objects are distributed in salt tarball. We add a tiny /etc/init.d script via salt to the master that essentially does the exact same logic that that kube-up.sh was doing, with the guiding environment variables plumbed through appropriately to salt on the master (e.g. ENABLE_CLUSTER_DNS, ENABLE_CLUSTER_MONITORING, etc.). The only complexity for this script is that it needs to wait for the apiserver to be ready before it starts creating objects.

Phase 2: Move deferred addon objects to a container

Rather than having an init.d script handle this, load the deferred installation into a container. This has the added complexity of needing a kubectl that can auth to the master (so possibly cheat and use hostDir to steal the kubelet's creds), but it makes this piece robust. Turn this from a silly script into a bit more of a daemon that actually enforces these "system containers" are present at any given time.

The main benefits of a container are:

Resilience - The script may be finnicky to apiserver instability.
Handling configuration change - The init.d script could handle this, but especially given previous point, it's probably better as a real daemon.
Disconnect from salt - We can just pass env variables in to the pod. Initially these would be from salt, but they don't have to be.

The complexity is primarily high around the build, hence phase 1.

This proposal is meant to address #3305 and follow on issues from pr #3292. I hope to have the phase 1 work complete soon, so comments appreciated.

cc @thockin @satnam6502 @vishh @brendanburns

The text was updated successfully, but these errors were encountered:

satnam6502 · 2015-01-17T01:28:41Z

+1 to some equivalent to an init.d approach for letting users configure their clusters and for teasing apart the core cluster creation operations from "add ons".

thockin · 2015-01-20T17:22:32Z

I'm not sure I quite get this. Sadly, DNS config involves passing new flags to kubelets. How does that happen from init.d?

zmerlynn · 2015-01-20T17:28:26Z

That part is still getting plumbed through salt. All this is doing is
handling the apiserver calls on the master rather than in kube-up.sh.
Essentially, the user has expressed a desired cluster configuration of
"ENABLE_CLUSTER_DNS", and the master (and salt config) then enforces it
rather than having the addon objects slapped on by the install client.

On Tue, Jan 20, 2015 at 9:22 AM, Tim Hockin notifications@github.com
wrote:

I'm not sure I quite get this. Sadly, DNS config involves passing new
flags to kubelets. How does that happen from init.d?

—
Reply to this email directly or view it on GitHub
#3579 (comment)
.

erictune · 2015-01-20T18:09:39Z

We want to start the scheduler and controller-manager in pods. We should make sure the same solution works for those.

erictune · 2015-01-20T18:15:12Z

would the init.d script be run-once? Or would it keep trying to ensure that the objects from the tarball exist on the master with the right values?

If the former, then users can later delete/modify the pods that implement dns, monitorind, etc, and bork the system, resulting in more support overhead. We could potentially put those pods in a special namespace for system services which the admin account does not have permission to modify (depends on getting per-namespace permissions code checked into OSS, but that may be ready in a few weeks.

erictune · 2015-01-20T18:17:02Z

I don't understand the "load deferred installation into a container". What starts that container? There has to be a list of containers to start initially, and there has to be something with permission to start that pod. The init.d running on the master seems like a fine way to do that.

erictune · 2015-01-20T18:21:35Z

For the firewall-rules: should these be automatically created by the master based on attributes of a service (and should there be a service for heapster @vishh ?)

zmerlynn · 2015-01-20T18:30:30Z

@erictune: It's currently run-once. The intent is to put them in separate namespaces to prevent deletion later, and also to make this into a loop that continually enforces that the objects are around with the right values. (The current kube-up.sh is run once, too.)

zmerlynn · 2015-01-20T18:31:36Z

@erictune: If we load the deferred install into a container, there's a still a chicken-egg, yes. I'm actually not completely sold on the complexity of phase 2, either.

zmerlynn · 2015-01-20T18:35:10Z

@erictune: And yeah, I would hope we could create firewall rules based on the attributes of the service. We have the scopes for it. I'm still investigating this part, as was @cjcullen. That part is somewhat speculative at this part.

vishh · 2015-01-20T18:39:47Z

@erictune: A service for 'heapster' itself will be added soon which will most likely not need a firewall rule. There are two other services for monitoring that require firewall rules. We can get rid of these firewall rules once the proxy on the apiserver works for Grafana.
The firewall rules are required to enable access to hostPort's of a Pod. If we were to infer them, we would have to do that from the Pod Spec and not the service spec.

erictune · 2015-01-20T18:41:52Z

@zmerlynn is there an issue about services creating firewalls, and if not
would you feel up to filing one?

On Tue, Jan 20, 2015 at 10:40 AM, Vish Kannan notifications@github.com
wrote:

@erictune https://github.com/erictune: A service for 'heapster' itself
will be added soon which will most likely not need a firewall rule. There
are two other services for monitoring that require firewall rules. We can
get rid of these firewall rules once the proxy on the apiserver works for
Grafana.
The firewall rules are required to enable access to hostPort's of a Pod.
If we were to infer them, we would have to do that from the Pod Spec and
not the service spec.

—
Reply to this email directly or view it on GitHub
#3579 (comment)
.

zmerlynn · 2015-01-20T18:55:02Z

@erictune: created #3636

dchen1107 · 2015-01-21T19:52:21Z

Just another brainstorming idea: instead of introducing a init.d script, how about master works like kubelet also taking service / pod configuration from file or a directory of files managed by master?

This implements phase 1 of the proposal in kubernetes#3579, moving the creation of the pods, RCs, and services to the master after the apiserver is available. This is such a wide commit because our existing initial config story is special: * Add kube-addons service and associated salt configuration: ** We configure /etc/kubernetes/addons to be a directory of objects that are appropriately configured for the current cluster. ** "/etc/init.d/kube-addons start" slurps up everything in that dir. (Most of the difficult is the business logic in salt around getting that directory built at all.) ** We cheat and overlay cluster/addons into saltbase/salt/kube-addons as config files for the kube-addons meta-service. * Change .yaml.in files to salt templates * Rename {setup,teardown}-{monitoring,logging} to {setup,teardown}-{monitoring,logging}-firewall to properly reflect their real purpose now (the purpose of these functions is now ONLY to bring up the firewall rules, and possibly to relay the IP to the user). * Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both functions were improperly configuring global rules, yet used lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the rule. The logging rule needed a $NETWORK specifier. The monitoring rule tried gcloud describe first, but given the instancing, this feels like a waste of time now. * Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING, ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master, since these are needed there now. (Desperately want just a yaml or json file we can share between providers that has all this crap. Maybe kubernetes#3525 is an answer?) Huge caveats: I've gone pretty firm testing on GCE, including twiddling the env variables and making sure the objects I expect to come up, come up. I've tested that it doesn't break GKE bringup somehow. But I haven't had a chance to test the other providers.

alex-mohr · 2015-03-19T21:04:05Z

Zach, can you please update this with remaining work? And also how upgrading versions where the new version has a new addon will work?

Is this just another reconciler that the master runs for a well-defined set of cluster services, where if DESIRED_LOGGING_ENABLED=true, the controller will make sure the appropriate pod+firewalls+node capabilities+etc are created? And likewise, it supports disabling them?

zmerlynn · 2015-03-19T22:08:56Z

Current status:

We are still in Phase 1, with no immediate plans to move to Phase 2. I've revised the add-on plan to leave out all mention of firewall rules, though, because all add-ons are now going through the proxy instead, making this a little cleaner.

Upgrade status:

This is a mess. If you "upgrade", which today means kube-push, one of two things will happen: (a) you won't reboot, in which case kube-addons will stay paused (because that's its default behavior is to sit resolutely ignoring the world), or (b) you reboot, in which case kube-addons will re-run and something good might come of it. The goodness level depends on whether it was a variable addition or deletion, what objects were created or destroyed, etc. There is no reconciler right now.

What this needs is probably a defined namespace add-ons so that we can say kube-addons owns this namespace, and this exact set of objects belongs in this namespace, and it is a controller that enforces that will. That probably isn't a bash script.

erictune · 2015-03-20T23:45:00Z

Alternative view on Upgrades

A fresh kubernetes cluster will typically be created with certain initial resources. These are the things that are currently in addons. Kube-up needs to create these.
On a fully user-managed cluster, the admin might later reasonably modify or delete these items. For example, she might increase the memory and cpu limit for the Kibana pod.
Upgrading the apiserver binary (currently done with kube-push) should not imply reverting the user's changes. That is probably not what they want.
On a managed cluster, we should make it so that the admin user does not have permission to modify these objects.
In either case, we don't need a reconciler for these objects. Because either changes are either intentional, or not possible.
In the case when an upgrade necessitates changes to addons to a user-managed cluster, there needs to be some kind of three-way merge (e.g. combine admins change to increase ram limits with upgrade change to modify the kibana image version.) We may be able to ask the user to do this if they modified their addons.
In the case of a hosted cluster, the upgrade can probably just update the objects since they are in a known state.
Again, a reconciler doesn't help with the upgrades.

This implements phase 1 of the proposal in kubernetes#3579, moving the creation of the pods, RCs, and services to the master after the apiserver is available. This is such a wide commit because our existing initial config story is special: * Add kube-addons service and associated salt configuration: ** We configure /etc/kubernetes/addons to be a directory of objects that are appropriately configured for the current cluster. ** "/etc/init.d/kube-addons start" slurps up everything in that dir. (Most of the difficult is the business logic in salt around getting that directory built at all.) ** We cheat and overlay cluster/addons into saltbase/salt/kube-addons as config files for the kube-addons meta-service. * Change .yaml.in files to salt templates * Rename {setup,teardown}-{monitoring,logging} to {setup,teardown}-{monitoring,logging}-firewall to properly reflect their real purpose now (the purpose of these functions is now ONLY to bring up the firewall rules, and possibly to relay the IP to the user). * Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both functions were improperly configuring global rules, yet used lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the rule. The logging rule needed a $NETWORK specifier. The monitoring rule tried gcloud describe first, but given the instancing, this feels like a waste of time now. * Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING, ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master, since these are needed there now. (Desperately want just a yaml or json file we can share between providers that has all this crap. Maybe kubernetes#3525 is an answer?) Huge caveats: I've gone pretty firm testing on GCE, including twiddling the env variables and making sure the objects I expect to come up, come up. I've tested that it doesn't break GKE bringup somehow. But I haven't had a chance to test the other providers.

roberthbailey · 2015-08-27T01:45:45Z

@marekbiskup

davidopp · 2015-08-27T06:08:34Z

@marekbiskup doesn't work on Kubernetes anymore.

@mikedanese might be interested.

I can't I ever completely understood this issue, but it seems like it might be related to #7459

roberthbailey · 2015-09-23T04:46:53Z

@zmerlynn Can you provide an update on where we are at and where we are likely to go with this? I think that with the Deployment API and kubectl apply we may have a satisfactory answer for addons at cluster initialization time and during upgrades.

zmerlynn · 2015-09-23T16:13:36Z

Someone needs to do the work to transition all the addons to deployments
There still needs to be an addon reconciler of sorts (i.e. desired list for cluster config vs current list of deployments)

Otherwise, yes, it looks like things got drastically simpler here.

marekbiskup · 2015-09-24T06:59:28Z

Are deployments ready yet (sorry, I haven't followed kubernetes for some time)?
I may be able to find some time to work on it.
Do we still want to rewrite the add-on updater to go?

davidopp · 2015-09-24T20:17:16Z

No deployment is not done yet.

ixdy · 2016-12-09T02:21:21Z

any update in the last year+? I just ran across this in

kubernetes/build-tools/lib/release.sh

Line 290 in 74a3b77

# TODO(#3579): This is a temporary hack. It gathers up the yaml,

.

mikedanese · 2016-12-29T18:42:17Z

Much of what is talked about in this PR is basically complete.

zmerlynn mentioned this issue Jan 19, 2015

Deferred creation of SkyDNS, monitoring and logging objects #3601

Merged

zmerlynn mentioned this issue Jan 20, 2015

Services should not create their own firewall rules #3636

Closed

goltermann added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 28, 2015

goltermann assigned zmerlynn Jan 28, 2015

davidopp added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Feb 17, 2015

This was referenced Feb 19, 2015

Delete cluster level logging services during kube down #4627

Merged

Need an API call for "teardown all external resources" #4630

Closed

roberthbailey added this to the v1.0 milestone Mar 4, 2015

zmerlynn mentioned this issue Mar 14, 2015

Make secrets at cluster startup. #5470

Merged

roberthbailey mentioned this issue Mar 21, 2015

Dramatically simplify Kubernetes deployment #2303

Closed

22 tasks

zmerlynn mentioned this issue Apr 21, 2015

Extend PR#5470 for AWS and Vagrant #7112

Merged

alex-mohr modified the milestones: v1.0-bubble, v1.0, v1.0-candidate Apr 27, 2015

goltermann modified the milestones: v1.0-post, v1.0-candidate Apr 28, 2015

roberthbailey mentioned this issue May 12, 2015

Make cluster add-ons work when upgrading the master #8107

Closed

bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015

zmerlynn mentioned this issue Aug 3, 2015

mesos/docker automated local cluster deployment #10049

Merged

21 tasks

roberthbailey added the team/control-plane label Aug 27, 2015

davidopp added the davidopp label Dec 15, 2015

davidopp assigned mikedanese and unassigned zmerlynn Dec 15, 2015

bgrant0607 added team/ux and removed team/control-plane labels Jun 23, 2016

bgrant0607 removed the davidopp label Dec 8, 2016

bgrant0607 removed the team/ux (deprecated - do not use) label Mar 9, 2017

bgrant0607 closed this as completed Mar 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for Deferred Creation of Addon Objects #3579

Proposal for Deferred Creation of Addon Objects #3579

zmerlynn commented Jan 16, 2015

satnam6502 commented Jan 17, 2015

thockin commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

erictune commented Jan 20, 2015

erictune commented Jan 20, 2015

erictune commented Jan 20, 2015

erictune commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

vishh commented Jan 20, 2015

erictune commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

dchen1107 commented Jan 21, 2015

alex-mohr commented Mar 19, 2015

zmerlynn commented Mar 19, 2015

erictune commented Mar 20, 2015

roberthbailey commented Aug 27, 2015

davidopp commented Aug 27, 2015

roberthbailey commented Sep 23, 2015

zmerlynn commented Sep 23, 2015

marekbiskup commented Sep 24, 2015

davidopp commented Sep 24, 2015

ixdy commented Dec 9, 2016

mikedanese commented Dec 29, 2016

Proposal for Deferred Creation of Addon Objects #3579

Proposal for Deferred Creation of Addon Objects #3579

Comments

zmerlynn commented Jan 16, 2015

Deferred Creation of "Addon Objects"

Phase 1: Cheesy init.d script on the master to create objects

Phase 2: Move deferred addon objects to a container

satnam6502 commented Jan 17, 2015

thockin commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

erictune commented Jan 20, 2015

erictune commented Jan 20, 2015

erictune commented Jan 20, 2015

erictune commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

vishh commented Jan 20, 2015

erictune commented Jan 20, 2015

zmerlynn commented Jan 20, 2015

dchen1107 commented Jan 21, 2015

alex-mohr commented Mar 19, 2015

zmerlynn commented Mar 19, 2015

Current status:

Upgrade status:

erictune commented Mar 20, 2015

Alternative view on Upgrades

roberthbailey commented Aug 27, 2015

davidopp commented Aug 27, 2015

roberthbailey commented Sep 23, 2015

zmerlynn commented Sep 23, 2015

marekbiskup commented Sep 24, 2015

davidopp commented Sep 24, 2015

ixdy commented Dec 9, 2016

mikedanese commented Dec 29, 2016

Phase 1: Cheesy `init.d` script on the master to create objects