Rackspace - Switch to CoreOS for standard cluster #1832

doublerr · 2014-10-16T15:17:01Z

The Rackspace setup for Kubernetes now leverages CoreOS instead of Ubuntu. We've dropped Salt completely for the cluster.

This doesn't include the updated release scripts so I've left those in the icebox for now.

doublerr · 2014-10-16T15:17:24Z

PR for issue #1733

CC @jbeda

thockin · 2014-10-16T15:39:26Z

Please hold this until after breakage day. I'm happy to see less salt.

On Thu, Oct 16, 2014 at 8:17 AM, Ryan Richard notifications@github.com
wrote:

PR for issue #1733
#1733

CC @jbeda https://github.com/jbeda

Reply to this email directly or view it on GitHub
#1832 (comment)
.

doublerr · 2014-10-16T15:47:00Z

@thockin I've been disconnected for a bit. When is breakage day and is it mostly around #1402 ?

thockin · 2014-10-16T15:58:00Z

Breakage day is today :)

#1402, #1564, #1662 are the primary PRs

On Thu, Oct 16, 2014 at 8:47 AM, Ryan Richard notifications@github.com
wrote:

@thockin https://github.com/thockin I've been disconnected for a bit.
When is breakage day and is it mostly around #1402
#1402 ?

Reply to this email directly or view it on GitHub
#1832 (comment)
.

jbeda · 2014-10-16T16:18:27Z

cluster/rackspace/cloud-config/master-cloud-config.yaml

+    permissions: 0755
+    content: |
+      #!/bin/sh
+      m=$(echo $(etcdctl ls --recursive /corekube/minions | cut -d/ -f4 | sort) | tr ' ' ,)


I'd take a closer look at some point about making the failure modes for these scripts a little more robust. As things stand if, say, etcdctl fails it'll be hard to figure out what is going on.

Totally agree. We'll work on iterating these little scripts to make them more robust. I wonder if building smaller 3rd party Go binaries would be the best course of action. Thoughts?

I'm not sure -- I could see going either way. No need to lock it down now though.

In any case, it might be worthwhile to create a mime-encoded cloud-init package vs. putting everything into a single yaml file. That is how we could start using the stuff from #1831.

jbeda · 2014-10-16T16:42:54Z

Thanks for getting things fixed up!

One thing to keep in mind is that as we diverge from everyone using salt (not a bad thing!) we might miss out in features like the recent fluentd set up. If we (eventually) move the coreOS setup (combo of systemd+cloud-init) into a common shared space it might be more likely that we see this stuff get kept up to date.

Something for later though :)

anguslees · 2014-10-17T01:09:49Z

cluster/rackspace/cloud-config/master-cloud-config.yaml

+
+  units:
+    #- name: nova-agent-watcher.service
+    #  command: try-restart


These two lines are noise left over from a previous workaround I needed - they should be deleted.

Ya, I'll remove them.

bketelsen · 2014-10-17T01:58:59Z

General Comment: In RackConnect v3, the public_ip and private_ip are goofy because CoreOS uses eth0 for public_ip and eth1 for private_ip. RackConnect v3 assigns your "public" ip to eth1, so these scripts will cause networking to be backwards in a RCv3 environment. I had to change all my K8s/CoreOS scripts to account for this when I moved to RCv3 a few weeks ago. Small enough edge case that it's not worth trying to detect, but might be worth mentioning in a doc/comment somewhere that's visible before install.

doublerr · 2014-10-17T14:23:59Z

@bketelsen thanks for the heads up. In general these are just examples of how to get k8s up and running on a specific provider for dev purposes. I would expect anyone to run this for real would build the own deployment scripts. Hopefully that's not a crazy expectation.

Re: CoreOS and public_ip/private_ip. I personally don't even like using eth1 (servicenet) and would much rather use eth2. It would be nice to be able to use something like "eth2_ipv4" or "eth0_ipv6". I think this would get around RC issues.

jbeda · 2014-10-17T16:49:52Z

The issue with NIC assignment seems really subtle and easy to get wrong. Perhaps that is worth a section in the docs? Also if we expect users to customize this stuff for prod deployments perhaps state that explicitly in the docs too?

@doublerr Feel free to ping me on IRC when you are ready for another round of reviews.

doublerr · 2014-10-17T23:42:40Z

build/release.sh

@@ -40,4 +40,5 @@ fi
 kube::build::copy_output
 kube::build::run_image
 kube::release::package_tarballs
-kube::release::gcs::release
+
+kube::release::${KUBERNETES_PROVIDER-gce}::release


@jbeda I'm not sure what to do here. You switch between gce and gcs acronyms in cluster and build respectively. This provider logic won't work.

@doublerr You shouldn't need to touch anything in build/. We really have two separate places now where we upload stuff -- it keeps things cleaner and keeps the build/release stuff completely divorced from the "run a cluster" stuff.

build/release.sh -- Here we are uploading the binary release tarball for doing things like automated releases and nightlies. Unless you are going to be cutting and publishing your own releases (or unless we want to mirror releases into rackspace) there is no reason to touch this. Most developers won't upload here as part of building dev releases

cluster/*/util.sh -- Here we assume that there is a tarball with the built stuff we need locally. We need to get it to the machines we are bringing up and deploying. You may want to upload through some cloud storage as a easy way to get those tar files there -- that is what we do for GCE. But for something like vagrant, we take advantage of the shared /vagrant file system. For vSphere, we scp the tars up over an ssh connection.

While this is a little less efficient than it could be (in the binary release case the tars will transit your local workstation and have to be downloaded and then re-uploaded) it really smooths over the dev situation and makes it super easy to distribute a single tarball that has everything you need.

We may short-circuit the re-upload in the future but I consider that an optimization over the pattern we have now.

Sorry this is so confusing!

@jbeda thanks for the update. I was wondering why there 2 functions to upload the tar files. We don't plan on cutting releases so I'll move the "upload tar" code to the util.sh scripts.

Awesome -- I'll be on IRC a lot today if you have questions.

The Rackspace setup for Kubernetes now leverages CoreOS instead of Ubuntu. We've dropped Salt completely for our cluster.

Updates to the build/release.sh scripts to upload build binaries to Cloud Files.

The functions to upload built k8s tars to Cloud Files were incorrectly placed in build/common.sh. These have been migrated to cluster/rackspace/util.sh

jbeda · 2014-10-20T20:45:39Z

Looks good to me! Let me know when you are ready for me to merge it and I'll get it in.

doublerr · 2014-10-22T15:06:42Z

Sorry, I've been traveling. It's good to go but I can rebase one last time if needed.

jbeda · 2014-10-22T16:30:51Z

Awesome! Thanks for getting this done! Merging.

Rackspace - Switch to CoreOS for standard cluster

OCPBUGS-25812: Fix device uncertain errors on reboot - 4.15

jbeda reviewed Oct 16, 2014
View reviewed changes

jbeda self-assigned this Oct 16, 2014

jbeda force-pushed the master branch from 89ee618 to f61d434 Compare October 16, 2014 23:44

anguslees reviewed Oct 17, 2014
View reviewed changes

doublerr force-pushed the rackspace_switch_to_coreos branch from bd34262 to ae68ab7 Compare October 17, 2014 20:36

doublerr reviewed Oct 17, 2014
View reviewed changes

doublerr added 4 commits October 20, 2014 12:14

Rackspace - Switch to CoreOS for standard cluster

a8bae68

The Rackspace setup for Kubernetes now leverages CoreOS instead of Ubuntu. We've dropped Salt completely for our cluster.

Add Rackspace to build/release.sh

f283848

Updates to the build/release.sh scripts to upload build binaries to Cloud Files.

remove hardcoded cloudfiles path and fix PORTAL_NET

0bfb5ae

Migrate Rackspace upload scripts to util.sh

a26aefa

The functions to upload built k8s tars to Cloud Files were incorrectly placed in build/common.sh. These have been migrated to cluster/rackspace/util.sh

doublerr force-pushed the rackspace_switch_to_coreos branch from 36f3b06 to a26aefa Compare October 20, 2014 17:15

Remove extra newline and update comments in util.sh

d513854

jbeda added the release/cherrypick-0.4 label Oct 20, 2014

jbeda mentioned this pull request Oct 20, 2014

Don't write password in get-password #1915

Closed

jbeda added a commit that referenced this pull request Oct 22, 2014

Merge pull request #1832 from doublerr/rackspace_switch_to_coreos

25b1eea

Rackspace - Switch to CoreOS for standard cluster

jbeda merged commit 25b1eea into kubernetes:master Oct 22, 2014

doublerr deleted the rackspace_switch_to_coreos branch October 22, 2014 16:49

This was referenced Oct 27, 2014

Release 0.4 #2019

Closed

Release 0.4 #2020

Merged

jbeda removed the release/cherrypick-0.4 label Nov 10, 2014

google-admin unassigned jbeda May 6, 2015

soltysh pushed a commit to soltysh/kubernetes that referenced this pull request Jan 18, 2024

Merge pull request kubernetes#1832 from gnufied/fix-uncertain-device-415

c84a6b8

OCPBUGS-25812: Fix device uncertain errors on reboot - 4.15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rackspace - Switch to CoreOS for standard cluster #1832

Rackspace - Switch to CoreOS for standard cluster #1832

doublerr commented Oct 16, 2014

doublerr commented Oct 16, 2014

thockin commented Oct 16, 2014

doublerr commented Oct 16, 2014

thockin commented Oct 16, 2014

jbeda Oct 16, 2014

doublerr Oct 16, 2014

jbeda Oct 16, 2014

jbeda commented Oct 16, 2014

anguslees Oct 17, 2014

doublerr Oct 17, 2014

bketelsen commented Oct 17, 2014

doublerr commented Oct 17, 2014

jbeda commented Oct 17, 2014

doublerr Oct 17, 2014

jbeda Oct 18, 2014

doublerr Oct 20, 2014

jbeda Oct 20, 2014

jbeda commented Oct 20, 2014

doublerr commented Oct 22, 2014

jbeda commented Oct 22, 2014

Rackspace - Switch to CoreOS for standard cluster #1832

Rackspace - Switch to CoreOS for standard cluster #1832

Conversation

doublerr commented Oct 16, 2014

doublerr commented Oct 16, 2014

thockin commented Oct 16, 2014

doublerr commented Oct 16, 2014

thockin commented Oct 16, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbeda commented Oct 16, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bketelsen commented Oct 17, 2014

doublerr commented Oct 17, 2014

jbeda commented Oct 17, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbeda commented Oct 20, 2014

doublerr commented Oct 22, 2014

jbeda commented Oct 22, 2014