-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deferred creation of SkyDNS, monitoring and logging objects #3601
Conversation
cc @brendanburns (if you want this in 0.9.0, it might be good to tag a reviewer), @thockin (in case you care about the DNS stuff, though it's actually the least complicated of the variables in here), @vishh (monitoring), @satnam6502 (logging), @jbeda (gross common.sh violation). |
@@ -5,7 +5,7 @@ namespace: default | |||
labels: | |||
k8s-app: skydns | |||
desiredState: | |||
replicas: {DNS_REPLICAS} | |||
replicas: {{ pillar['dns_replicas'] }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a doc that explain to mortal users how DNS works. Needs updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check and update. It became more convoluted on our end, certainly - not sure about the end-users side, though, hopefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Are you talking about cluster/addons/dns/README.md
? Those
instructions don't look right ... they don't include any instructions on
how you'd bonk salt to update the minions for the DNS_SERVER_IP
. They
look pre-ENABLE_CLUSTER_DNS, pre-salt config, at the very least. At the
moment, the configuration seems pretty static once you create the cluster,
unless you know the right incantations.
Given that, would you mind a separate pull for cleanup on the docs?
On Tue, Jan 20, 2015 at 9:26 AM, Tim Hockin notifications@github.com
wrote:
In cluster/addons/dns/skydns-rc.yaml.in
#3601 (diff)
:@@ -5,7 +5,7 @@ namespace: default
labels:
k8s-app: skydns
desiredState:
- replicas: {DNS_REPLICAS}
- replicas: {{ pillar['dns_replicas'] }}
There is a doc that explain to mortal users how DNS works. Needs updates?
—
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23238511
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those assume that the salt stuff is done - it's more hand-holding for
launching the skydns pods. I'd be happy ot see that bit removed, with a
reference to "and then we salt it and the pod gets scheduled"
On Tue, Jan 20, 2015 at 9:45 AM, Zach Loafman notifications@github.com
wrote:
In cluster/addons/dns/skydns-rc.yaml.in
#3601 (diff)
:@@ -5,7 +5,7 @@ namespace: default
labels:
k8s-app: skydns
desiredState:
- replicas: {DNS_REPLICAS}
- replicas: {{ pillar['dns_replicas'] }}
Hm. Are you talking about
cluster/addons/dns/README.md
? Those
instructions don't look right ... they don't include any instructions on
how you'd bonk salt to update the minions for theDNS_SERVER_IP
. They
look pre-ENABLE_CLUSTER_DNS, pre-salt config, at the very least. At the
moment, the configuration seems pretty static once you create the cluster,
unless you know the right incantations. Given that, would you mind a
separate pull for cleanup on the docs?
... <#14b0876ce70e1c20_>
On Tue, Jan 20, 2015 at 9:26 AM, Tim Hockin notifications@github.com
wrote: In cluster/addons/dns/skydns-rc.yaml.in <
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601#discussion-diff-23238511>
: > @@ -5,7 +5,7 @@ namespace: default > labels: > k8s-app: skydns >
desiredState: > - replicas: {DNS_REPLICAS} > + replicas: {{
pillar['dns_replicas'] }} There is a doc that explain to mortal users how
DNS works. Needs updates? -- Reply to this email directly or view it on
GitHub <
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23238511>
.Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23239852
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
I think I "get it" - thanks for tackling this mess. Please take a look at docs that reference the old way of doing things - I know DNS documents it, I assume the others do too. @vishh and @satnam6502 for reviews of their parts |
@@ -0,0 +1,79 @@ | |||
{% if pillar.get('enable_cluster_monitoring', '').lower() == 'true' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML has an explicit bool type that turns into a python bool via salt.. It is a little odd that we are doing a string compare here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to try the "is defined and pillar['enable_cluster_monitoring']"
approach again, but when I tried this last time, it was always true, even
when enable_cluster_monitoring='false'.
On Tue, Jan 20, 2015 at 10:33 AM, Joe Beda notifications@github.com wrote:
In cluster/saltbase/salt/kube-addons/init.sls
#3601 (diff)
:@@ -0,0 +1,79 @@
+{% if pillar.get('enable_cluster_monitoring', '').lower() == 'true' %}YAML has an explicit bool type that turns into a python bool via salt.. It
is a little odd that we are doing a string compare here instead.—
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23243505
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet I know what's wrong. I bet the commit that changed the init.sls YAML to all be quoted actually broke this, because this looks like
enable_cluster_monitoring: 'true'
now, which is probably coming in as a Python string, no? (I need to get up a cluster to test this, jussec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly -- the explicit quoting makes it a string in yaml. That may be okay, but we should be consistent and eyes open here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is actually broken elsewhere already (in particular, hunt for
'enable_cluster_dns' and it's already broken). The problem, and I love
saying sentences like this, is that 'false' is true. So I think my code is
correct and/or we need to fix the escaping.
On Tue, Jan 20, 2015 at 11:02 AM, Joe Beda notifications@github.com wrote:
In cluster/saltbase/salt/kube-addons/init.sls
#3601 (diff)
:@@ -0,0 +1,79 @@
+{% if pillar.get('enable_cluster_monitoring', '').lower() == 'true' %}Exactly -- the explicit quoting makes it a string in yaml. That may be
okay, but we should be consistent and eyes open here.—
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/3601/files#r23245950
.
is kubectl going to talk to localhost:8080 to sidestep the authorization question? |
The approach looks good. Some comments around auth, reboots and kube-push. |
@erictune: Yeah, it's plumbed through localhost right now (in that it's not specifying anything). Is there a reason we'd lock that down? I'm not sure the path we're going to take there. |
echo | ||
else | ||
echo -e "${color_red}Monitoring pods failed to be scheduled! Removing firewalls.${color_norm}" | ||
teardown-monitoring-firewall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not remove the firewall rules here. "Failed to be scheduled" is different from "failing to start pods".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
@jbeda: PTAL. My internal scoreboard says:
|
LGTM -- please rebase, squash and merge. |
This implements phase 1 of the proposal in kubernetes#3579, moving the creation of the pods, RCs, and services to the master after the apiserver is available. This is such a wide commit because our existing initial config story is special: * Add kube-addons service and associated salt configuration: ** We configure /etc/kubernetes/addons to be a directory of objects that are appropriately configured for the current cluster. ** "/etc/init.d/kube-addons start" slurps up everything in that dir. (Most of the difficult is the business logic in salt around getting that directory built at all.) ** We cheat and overlay cluster/addons into saltbase/salt/kube-addons as config files for the kube-addons meta-service. * Change .yaml.in files to salt templates * Rename {setup,teardown}-{monitoring,logging} to {setup,teardown}-{monitoring,logging}-firewall to properly reflect their real purpose now (the purpose of these functions is now ONLY to bring up the firewall rules, and possibly to relay the IP to the user). * Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both functions were improperly configuring global rules, yet used lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the rule. The logging rule needed a $NETWORK specifier. The monitoring rule tried gcloud describe first, but given the instancing, this feels like a waste of time now. * Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING, ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master, since these are needed there now. (Desperately want just a yaml or json file we can share between providers that has all this crap. Maybe kubernetes#3525 is an answer?) Huge caveats: I've gone pretty firm testing on GCE, including twiddling the env variables and making sure the objects I expect to come up, come up. I've tested that it doesn't break GKE bringup somehow. But I haven't had a chance to test the other providers.
5ea3dbb
to
a305269
Compare
@jbeda: Done. PTAL! |
Deferred creation of SkyDNS, monitoring and logging objects
This implements phase 1 of the proposal in #3579, moving the creation of the pods, RCs, and services to the master after the apiserver is available. This is the open source side of fixing #3305.
This is such a wide commit because our existing initial config stro4y is special:
kube-addons
service and associated salt configuration:/etc/kubernetes/addons
to be a directory of objects that are appropriately configured for the current cluster."/etc/init.d/kube-addons start"
slurps up everything in that dir. (Most of the difficult is the business logic in salt around getting that directory built at all.)cluster/addons
intosaltbase/salt/kube-addons
as config files for thekube-addons
meta-service..yaml.in
files to salt templates{setup,teardown}-{monitoring,logging}
to{setup,teardown}-{monitoring,logging}-firewall
to properly reflect their real purpose now (the purpose of these functions is now ONLY to bring up the firewall rules, and possibly to relay the IP to the user).{setup,teardown}-{monitoring,logging}-firewall
: Both functions were improperly configuring global rules, yet used lifecycles tied to the cluster. Use$NODE_INSTANCE_PREFIX
with the rule. The logging rule needed a$NETWORK
specifier. The monitoring rule tried gcloud describe first, but given the instancing, this feels like a waste of time now.ENABLE_CLUSTER_MONITORING
,ENABLE_CLUSTER_LOGGING
,ELASTICSEARCH_LOGGING_REPLICAS
andDNS_REPLICAS
down to the master, since these are needed there now.(Desperately want just a yaml or json file we can share between providers that has all this crap. Maybe #3525 is an answer?)
Huge caveats: I've gone pretty firm testing on GCE, including twiddling the env variables and making sure the objects I expect to come up, come up. I've tested that it doesn't break GKE bringup somehow. But I haven't had a chance to test the other providers.