Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred creation of SkyDNS, monitoring and logging objects #3601

Merged
merged 1 commit into from
Jan 21, 2015

Conversation

zmerlynn
Copy link
Member

This implements phase 1 of the proposal in #3579, moving the creation of the pods, RCs, and services to the master after the apiserver is available. This is the open source side of fixing #3305.

This is such a wide commit because our existing initial config stro4y is special:

  • Add kube-addons service and associated salt configuration:
    • We configure /etc/kubernetes/addons to be a directory of objects that are appropriately configured for the current cluster.
    • "/etc/init.d/kube-addons start" slurps up everything in that dir. (Most of the difficult is the business logic in salt around getting that directory built at all.)
    • We cheat and overlay cluster/addons into saltbase/salt/kube-addons as config files for the kube-addons meta-service.
  • Change .yaml.in files to salt templates
  • Rename {setup,teardown}-{monitoring,logging} to {setup,teardown}-{monitoring,logging}-firewall to properly reflect their real purpose now (the purpose of these functions is now ONLY to bring up the firewall rules, and possibly to relay the IP to the user).
  • Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both functions were improperly configuring global rules, yet used lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the rule. The logging rule needed a $NETWORK specifier. The monitoring rule tried gcloud describe first, but given the instancing, this feels like a waste of time now.
  • Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING, ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master, since these are needed there now.

(Desperately want just a yaml or json file we can share between providers that has all this crap. Maybe #3525 is an answer?)

Huge caveats: I've gone pretty firm testing on GCE, including twiddling the env variables and making sure the objects I expect to come up, come up. I've tested that it doesn't break GKE bringup somehow. But I haven't had a chance to test the other providers.

@zmerlynn
Copy link
Member Author

cc @brendanburns (if you want this in 0.9.0, it might be good to tag a reviewer), @thockin (in case you care about the DNS stuff, though it's actually the least complicated of the variables in here), @vishh (monitoring), @satnam6502 (logging), @jbeda (gross common.sh violation).

@@ -5,7 +5,7 @@ namespace: default
labels:
k8s-app: skydns
desiredState:
replicas: {DNS_REPLICAS}
replicas: {{ pillar['dns_replicas'] }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a doc that explain to mortal users how DNS works. Needs updates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check and update. It became more convoluted on our end, certainly - not sure about the end-users side, though, hopefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Are you talking about cluster/addons/dns/README.md? Those
instructions don't look right ... they don't include any instructions on
how you'd bonk salt to update the minions for the DNS_SERVER_IP. They
look pre-ENABLE_CLUSTER_DNS, pre-salt config, at the very least. At the
moment, the configuration seems pretty static once you create the cluster,
unless you know the right incantations.

Given that, would you mind a separate pull for cleanup on the docs?

On Tue, Jan 20, 2015 at 9:26 AM, Tim Hockin notifications@github.com
wrote:

In cluster/addons/dns/skydns-rc.yaml.in
#3601 (diff)
:

@@ -5,7 +5,7 @@ namespace: default
labels:
k8s-app: skydns
desiredState:

  • replicas: {DNS_REPLICAS}
  • replicas: {{ pillar['dns_replicas'] }}

There is a doc that explain to mortal users how DNS works. Needs updates?


Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23238511
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those assume that the salt stuff is done - it's more hand-holding for
launching the skydns pods. I'd be happy ot see that bit removed, with a
reference to "and then we salt it and the pod gets scheduled"

On Tue, Jan 20, 2015 at 9:45 AM, Zach Loafman notifications@github.com
wrote:

In cluster/addons/dns/skydns-rc.yaml.in
#3601 (diff)
:

@@ -5,7 +5,7 @@ namespace: default
labels:
k8s-app: skydns
desiredState:

  • replicas: {DNS_REPLICAS}
  • replicas: {{ pillar['dns_replicas'] }}

Hm. Are you talking about cluster/addons/dns/README.md? Those
instructions don't look right ... they don't include any instructions on
how you'd bonk salt to update the minions for the DNS_SERVER_IP. They
look pre-ENABLE_CLUSTER_DNS, pre-salt config, at the very least. At the
moment, the configuration seems pretty static once you create the cluster,
unless you know the right incantations. Given that, would you mind a
separate pull for cleanup on the docs?
... <#14b0876ce70e1c20_>
On Tue, Jan 20, 2015 at 9:26 AM, Tim Hockin notifications@github.com
wrote: In cluster/addons/dns/skydns-rc.yaml.in <
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601#discussion-diff-23238511>
: > @@ -5,7 +5,7 @@ namespace: default > labels: > k8s-app: skydns >
desiredState: > - replicas: {DNS_REPLICAS} > + replicas: {{
pillar['dns_replicas'] }} There is a doc that explain to mortal users how
DNS works. Needs updates? -- Reply to this email directly or view it on
GitHub <
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23238511>
.

Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23239852
.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

@thockin
Copy link
Member

thockin commented Jan 20, 2015

I think I "get it" - thanks for tackling this mess. Please take a look at docs that reference the old way of doing things - I know DNS documents it, I assume the others do too.

@vishh and @satnam6502 for reviews of their parts

@@ -0,0 +1,79 @@
{% if pillar.get('enable_cluster_monitoring', '').lower() == 'true' %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAML has an explicit bool type that turns into a python bool via salt.. It is a little odd that we are doing a string compare here instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try the "is defined and pillar['enable_cluster_monitoring']"
approach again, but when I tried this last time, it was always true, even
when enable_cluster_monitoring='false'.

On Tue, Jan 20, 2015 at 10:33 AM, Joe Beda notifications@github.com wrote:

In cluster/saltbase/salt/kube-addons/init.sls
#3601 (diff)
:

@@ -0,0 +1,79 @@
+{% if pillar.get('enable_cluster_monitoring', '').lower() == 'true' %}

YAML has an explicit bool type that turns into a python bool via salt.. It
is a little odd that we are doing a string compare here instead.


Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23243505
.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet I know what's wrong. I bet the commit that changed the init.sls YAML to all be quoted actually broke this, because this looks like
enable_cluster_monitoring: 'true'
now, which is probably coming in as a Python string, no? (I need to get up a cluster to test this, jussec)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly -- the explicit quoting makes it a string in yaml. That may be okay, but we should be consistent and eyes open here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is actually broken elsewhere already (in particular, hunt for
'enable_cluster_dns' and it's already broken). The problem, and I love
saying sentences like this, is that 'false' is true. So I think my code is
correct and/or we need to fix the escaping.

On Tue, Jan 20, 2015 at 11:02 AM, Joe Beda notifications@github.com wrote:

In cluster/saltbase/salt/kube-addons/init.sls
#3601 (diff)
:

@@ -0,0 +1,79 @@
+{% if pillar.get('enable_cluster_monitoring', '').lower() == 'true' %}

Exactly -- the explicit quoting makes it a string in yaml. That may be
okay, but we should be consistent and eyes open here.


Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23245950
.

@erictune
Copy link
Member

is kubectl going to talk to localhost:8080 to sidestep the authorization question?

@jbeda
Copy link
Contributor

jbeda commented Jan 20, 2015

The approach looks good. Some comments around auth, reboots and kube-push.

@zmerlynn
Copy link
Member Author

@erictune: Yeah, it's plumbed through localhost right now (in that it's not specifying anything). Is there a reason we'd lock that down? I'm not sure the path we're going to take there.

echo
else
echo -e "${color_red}Monitoring pods failed to be scheduled! Removing firewalls.${color_norm}"
teardown-monitoring-firewall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not remove the firewall rules here. "Failed to be scheduled" is different from "failing to start pods".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

@zmerlynn
Copy link
Member Author

@jbeda: PTAL. My internal scoreboard says:

  • The auth question is "future, localhost is fine for now".
  • The kube-push question is pending above, but there's some other stuff I've found that just doesn't play nicely right now with push. Not sure what work to do there
  • Lingering thread on service order. (I don't think there's anything to do here because the kube-up.sh script never actually verified anything was healthy.)
  • Other stuff addressed?

@jbeda
Copy link
Contributor

jbeda commented Jan 21, 2015

LGTM -- please rebase, squash and merge.

This implements phase 1 of the proposal in kubernetes#3579, moving the creation
of the pods, RCs, and services to the master after the apiserver is
available.

This is such a wide commit because our existing initial config story
is special:

* Add kube-addons service and associated salt configuration:
** We configure /etc/kubernetes/addons to be a directory of objects
that are appropriately configured for the current cluster.
** "/etc/init.d/kube-addons start" slurps up everything in that dir.
(Most of the difficult is the business logic in salt around getting
that directory built at all.)
** We cheat and overlay cluster/addons into saltbase/salt/kube-addons
as config files for the kube-addons meta-service.
* Change .yaml.in files to salt templates
* Rename {setup,teardown}-{monitoring,logging} to
{setup,teardown}-{monitoring,logging}-firewall to properly reflect
their real purpose now (the purpose of these functions is now ONLY to
bring up the firewall rules, and possibly to relay the IP to the user).
* Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both
functions were improperly configuring global rules, yet used
lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the
rule. The logging rule needed a $NETWORK specifier. The monitoring
rule tried gcloud describe first, but given the instancing, this feels
like a waste of time now.
* Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING,
ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master,
since these are needed there now.

(Desperately want just a yaml or json file we can share between
providers that has all this crap. Maybe kubernetes#3525 is an answer?)

Huge caveats: I've gone pretty firm testing on GCE, including
twiddling the env variables and making sure the objects I expect to
come up, come up. I've tested that it doesn't break GKE bringup
somehow. But I haven't had a chance to test the other providers.
@zmerlynn zmerlynn force-pushed the deferred_addons_phase_1 branch from 5ea3dbb to a305269 Compare January 21, 2015 20:26
@zmerlynn
Copy link
Member Author

@jbeda: Done. PTAL!

jbeda added a commit that referenced this pull request Jan 21, 2015
Deferred creation of SkyDNS, monitoring and logging objects
@jbeda jbeda merged commit 621e703 into kubernetes:master Jan 21, 2015
This was referenced Jan 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants