Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Deferred Creation of Addon Objects #3579

Closed
zmerlynn opened this issue Jan 16, 2015 · 25 comments
Closed

Proposal for Deferred Creation of Addon Objects #3579

zmerlynn opened this issue Jan 16, 2015 · 25 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@zmerlynn
Copy link
Member

Deferred Creation of "Addon Objects"

In PRs #2224 and #3292, we added a considerable amount of initialization of state in kube-up.sh after the actual kube-up call. This pattern really doesn't work very well for GKE: In a GKE, kube-up is essentially a single API call that can be made from a variety of clients that aren't kube-up.sh.

This bug proposes a way to pieces of the initialization in kube-up.sh onto the master, so that, aspirationally, kube-up.sh results in a single call to a provider-specific kube-up.

Phase 1: Cheesy init.d script on the master to create objects

In this phase, we move (some) initialization to an init.d script on the master. The actual YAML files for the created objects are distributed in salt tarball. We add a tiny /etc/init.d script via salt to the master that essentially does the exact same logic that that kube-up.sh was doing, with the guiding environment variables plumbed through appropriately to salt on the master (e.g. ENABLE_CLUSTER_DNS, ENABLE_CLUSTER_MONITORING, etc.). The only complexity for this script is that it needs to wait for the apiserver to be ready before it starts creating objects.

Phase 2: Move deferred addon objects to a container

Rather than having an init.d script handle this, load the deferred installation into a container. This has the added complexity of needing a kubectl that can auth to the master (so possibly cheat and use hostDir to steal the kubelet's creds), but it makes this piece robust. Turn this from a silly script into a bit more of a daemon that actually enforces these "system containers" are present at any given time.

The main benefits of a container are:

  • Resilience - The script may be finnicky to apiserver instability.
  • Handling configuration change - The init.d script could handle this, but especially given previous point, it's probably better as a real daemon.
  • Disconnect from salt - We can just pass env variables in to the pod. Initially these would be from salt, but they don't have to be.

The complexity is primarily high around the build, hence phase 1.


This proposal is meant to address #3305 and follow on issues from pr #3292. I hope to have the phase 1 work complete soon, so comments appreciated.

cc @thockin @satnam6502 @vishh @brendanburns

@satnam6502
Copy link
Contributor

+1 to some equivalent to an init.d approach for letting users configure their clusters and for teasing apart the core cluster creation operations from "add ons".

@thockin
Copy link
Member

thockin commented Jan 20, 2015

I'm not sure I quite get this. Sadly, DNS config involves passing new flags to kubelets. How does that happen from init.d?

@zmerlynn
Copy link
Member Author

That part is still getting plumbed through salt. All this is doing is
handling the apiserver calls on the master rather than in kube-up.sh.
Essentially, the user has expressed a desired cluster configuration of
"ENABLE_CLUSTER_DNS", and the master (and salt config) then enforces it
rather than having the addon objects slapped on by the install client.

On Tue, Jan 20, 2015 at 9:22 AM, Tim Hockin notifications@github.com
wrote:

I'm not sure I quite get this. Sadly, DNS config involves passing new
flags to kubelets. How does that happen from init.d?


Reply to this email directly or view it on GitHub
#3579 (comment)
.

@erictune
Copy link
Member

We want to start the scheduler and controller-manager in pods. We should make sure the same solution works for those.

@erictune
Copy link
Member

would the init.d script be run-once? Or would it keep trying to ensure that the objects from the tarball exist on the master with the right values?

If the former, then users can later delete/modify the pods that implement dns, monitorind, etc, and bork the system, resulting in more support overhead. We could potentially put those pods in a special namespace for system services which the admin account does not have permission to modify (depends on getting per-namespace permissions code checked into OSS, but that may be ready in a few weeks.

@erictune
Copy link
Member

I don't understand the "load deferred installation into a container". What starts that container? There has to be a list of containers to start initially, and there has to be something with permission to start that pod. The init.d running on the master seems like a fine way to do that.

@erictune
Copy link
Member

For the firewall-rules: should these be automatically created by the master based on attributes of a service (and should there be a service for heapster @vishh ?)

@zmerlynn
Copy link
Member Author

@erictune: It's currently run-once. The intent is to put them in separate namespaces to prevent deletion later, and also to make this into a loop that continually enforces that the objects are around with the right values. (The current kube-up.sh is run once, too.)

@zmerlynn
Copy link
Member Author

@erictune: If we load the deferred install into a container, there's a still a chicken-egg, yes. I'm actually not completely sold on the complexity of phase 2, either.

@zmerlynn
Copy link
Member Author

@erictune: And yeah, I would hope we could create firewall rules based on the attributes of the service. We have the scopes for it. I'm still investigating this part, as was @cjcullen. That part is somewhat speculative at this part.

@vishh
Copy link
Contributor

vishh commented Jan 20, 2015

@erictune: A service for 'heapster' itself will be added soon which will most likely not need a firewall rule. There are two other services for monitoring that require firewall rules. We can get rid of these firewall rules once the proxy on the apiserver works for Grafana.
The firewall rules are required to enable access to hostPort's of a Pod. If we were to infer them, we would have to do that from the Pod Spec and not the service spec.

@erictune
Copy link
Member

@zmerlynn is there an issue about services creating firewalls, and if not
would you feel up to filing one?

On Tue, Jan 20, 2015 at 10:40 AM, Vish Kannan notifications@github.com
wrote:

@erictune https://github.com/erictune: A service for 'heapster' itself
will be added soon which will most likely not need a firewall rule. There
are two other services for monitoring that require firewall rules. We can
get rid of these firewall rules once the proxy on the apiserver works for
Grafana.
The firewall rules are required to enable access to hostPort's of a Pod.
If we were to infer them, we would have to do that from the Pod Spec and
not the service spec.


Reply to this email directly or view it on GitHub
#3579 (comment)
.

@zmerlynn
Copy link
Member Author

@erictune: created #3636

@dchen1107
Copy link
Member

Just another brainstorming idea: instead of introducing a init.d script, how about master works like kubelet also taking service / pod configuration from file or a directory of files managed by master?

zmerlynn added a commit to zmerlynn/kubernetes that referenced this issue Jan 21, 2015
This implements phase 1 of the proposal in kubernetes#3579, moving the creation
of the pods, RCs, and services to the master after the apiserver is
available.

This is such a wide commit because our existing initial config story
is special:

* Add kube-addons service and associated salt configuration:
** We configure /etc/kubernetes/addons to be a directory of objects
that are appropriately configured for the current cluster.
** "/etc/init.d/kube-addons start" slurps up everything in that dir.
(Most of the difficult is the business logic in salt around getting
that directory built at all.)
** We cheat and overlay cluster/addons into saltbase/salt/kube-addons
as config files for the kube-addons meta-service.
* Change .yaml.in files to salt templates
* Rename {setup,teardown}-{monitoring,logging} to
{setup,teardown}-{monitoring,logging}-firewall to properly reflect
their real purpose now (the purpose of these functions is now ONLY to
bring up the firewall rules, and possibly to relay the IP to the user).
* Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both
functions were improperly configuring global rules, yet used
lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the
rule. The logging rule needed a $NETWORK specifier. The monitoring
rule tried gcloud describe first, but given the instancing, this feels
like a waste of time now.
* Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING,
ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master,
since these are needed there now.

(Desperately want just a yaml or json file we can share between
providers that has all this crap. Maybe kubernetes#3525 is an answer?)

Huge caveats: I've gone pretty firm testing on GCE, including
twiddling the env variables and making sure the objects I expect to
come up, come up. I've tested that it doesn't break GKE bringup
somehow. But I haven't had a chance to test the other providers.
zmerlynn added a commit to zmerlynn/kubernetes that referenced this issue Jan 23, 2015
This implements phase 1 of the proposal in kubernetes#3579, moving the creation
of the pods, RCs, and services to the master after the apiserver is
available.

This is such a wide commit because our existing initial config story
is special:

* Add kube-addons service and associated salt configuration:
** We configure /etc/kubernetes/addons to be a directory of objects
that are appropriately configured for the current cluster.
** "/etc/init.d/kube-addons start" slurps up everything in that dir.
(Most of the difficult is the business logic in salt around getting
that directory built at all.)
** We cheat and overlay cluster/addons into saltbase/salt/kube-addons
as config files for the kube-addons meta-service.
* Change .yaml.in files to salt templates
* Rename {setup,teardown}-{monitoring,logging} to
{setup,teardown}-{monitoring,logging}-firewall to properly reflect
their real purpose now (the purpose of these functions is now ONLY to
bring up the firewall rules, and possibly to relay the IP to the user).
* Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both
functions were improperly configuring global rules, yet used
lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the
rule. The logging rule needed a $NETWORK specifier. The monitoring
rule tried gcloud describe first, but given the instancing, this feels
like a waste of time now.
* Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING,
ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master,
since these are needed there now.

(Desperately want just a yaml or json file we can share between
providers that has all this crap. Maybe kubernetes#3525 is an answer?)

Huge caveats: I've gone pretty firm testing on GCE, including
twiddling the env variables and making sure the objects I expect to
come up, come up. I've tested that it doesn't break GKE bringup
somehow. But I haven't had a chance to test the other providers.
@goltermann goltermann added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 28, 2015
@davidopp davidopp added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Feb 17, 2015
@roberthbailey roberthbailey added this to the v1.0 milestone Mar 4, 2015
@alex-mohr
Copy link
Contributor

Zach, can you please update this with remaining work? And also how upgrading versions where the new version has a new addon will work?

Is this just another reconciler that the master runs for a well-defined set of cluster services, where if DESIRED_LOGGING_ENABLED=true, the controller will make sure the appropriate pod+firewalls+node capabilities+etc are created? And likewise, it supports disabling them?

@zmerlynn
Copy link
Member Author

Current status:

We are still in Phase 1, with no immediate plans to move to Phase 2. I've revised the add-on plan to leave out all mention of firewall rules, though, because all add-ons are now going through the proxy instead, making this a little cleaner.

Upgrade status:

This is a mess. If you "upgrade", which today means kube-push, one of two things will happen: (a) you won't reboot, in which case kube-addons will stay paused (because that's its default behavior is to sit resolutely ignoring the world), or (b) you reboot, in which case kube-addons will re-run and something good might come of it. The goodness level depends on whether it was a variable addition or deletion, what objects were created or destroyed, etc. There is no reconciler right now.

What this needs is probably a defined namespace add-ons so that we can say kube-addons owns this namespace, and this exact set of objects belongs in this namespace, and it is a controller that enforces that will. That probably isn't a bash script.

@erictune
Copy link
Member

Alternative view on Upgrades

  • A fresh kubernetes cluster will typically be created with certain initial resources. These are the things that are currently in addons. Kube-up needs to create these.
  • On a fully user-managed cluster, the admin might later reasonably modify or delete these items. For example, she might increase the memory and cpu limit for the Kibana pod.
  • Upgrading the apiserver binary (currently done with kube-push) should not imply reverting the user's changes. That is probably not what they want.
  • On a managed cluster, we should make it so that the admin user does not have permission to modify these objects.
  • In either case, we don't need a reconciler for these objects. Because either changes are either intentional, or not possible.
  • In the case when an upgrade necessitates changes to addons to a user-managed cluster, there needs to be some kind of three-way merge (e.g. combine admins change to increase ram limits with upgrade change to modify the kibana image version.) We may be able to ask the user to do this if they modified their addons.
  • In the case of a hosted cluster, the upgrade can probably just update the objects since they are in a known state.
  • Again, a reconciler doesn't help with the upgrades.

spothanis pushed a commit to spothanis/kubernetes that referenced this issue Mar 24, 2015
This implements phase 1 of the proposal in kubernetes#3579, moving the creation
of the pods, RCs, and services to the master after the apiserver is
available.

This is such a wide commit because our existing initial config story
is special:

* Add kube-addons service and associated salt configuration:
** We configure /etc/kubernetes/addons to be a directory of objects
that are appropriately configured for the current cluster.
** "/etc/init.d/kube-addons start" slurps up everything in that dir.
(Most of the difficult is the business logic in salt around getting
that directory built at all.)
** We cheat and overlay cluster/addons into saltbase/salt/kube-addons
as config files for the kube-addons meta-service.
* Change .yaml.in files to salt templates
* Rename {setup,teardown}-{monitoring,logging} to
{setup,teardown}-{monitoring,logging}-firewall to properly reflect
their real purpose now (the purpose of these functions is now ONLY to
bring up the firewall rules, and possibly to relay the IP to the user).
* Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both
functions were improperly configuring global rules, yet used
lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the
rule. The logging rule needed a $NETWORK specifier. The monitoring
rule tried gcloud describe first, but given the instancing, this feels
like a waste of time now.
* Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING,
ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master,
since these are needed there now.

(Desperately want just a yaml or json file we can share between
providers that has all this crap. Maybe kubernetes#3525 is an answer?)

Huge caveats: I've gone pretty firm testing on GCE, including
twiddling the env variables and making sure the objects I expect to
come up, come up. I've tested that it doesn't break GKE bringup
somehow. But I haven't had a chance to test the other providers.
@alex-mohr alex-mohr modified the milestones: v1.0-bubble, v1.0, v1.0-candidate Apr 27, 2015
@goltermann goltermann modified the milestones: v1.0-post, v1.0-candidate Apr 28, 2015
@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@roberthbailey
Copy link
Contributor

@marekbiskup

@davidopp
Copy link
Member

@marekbiskup doesn't work on Kubernetes anymore.

@mikedanese might be interested.

I can't I ever completely understood this issue, but it seems like it might be related to #7459

@roberthbailey
Copy link
Contributor

@zmerlynn Can you provide an update on where we are at and where we are likely to go with this? I think that with the Deployment API and kubectl apply we may have a satisfactory answer for addons at cluster initialization time and during upgrades.

@zmerlynn
Copy link
Member Author

  • Someone needs to do the work to transition all the addons to deployments
  • There still needs to be an addon reconciler of sorts (i.e. desired list for cluster config vs current list of deployments)

Otherwise, yes, it looks like things got drastically simpler here.

@marekbiskup
Copy link
Contributor

Are deployments ready yet (sorry, I haven't followed kubernetes for some time)?
I may be able to find some time to work on it.
Do we still want to rewrite the add-on updater to go?

@davidopp
Copy link
Member

No deployment is not done yet.

@ixdy
Copy link
Member

ixdy commented Dec 9, 2016

any update in the last year+? I just ran across this in

# TODO(#3579): This is a temporary hack. It gathers up the yaml,
.

@mikedanese
Copy link
Member

Much of what is talked about in this PR is basically complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests