-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rackspace - Switch to CoreOS for standard cluster #1832
Conversation
Please hold this until after breakage day. I'm happy to see less salt. On Thu, Oct 16, 2014 at 8:17 AM, Ryan Richard notifications@github.com
|
Breakage day is today :) #1402, #1564, #1662 are the primary PRs On Thu, Oct 16, 2014 at 8:47 AM, Ryan Richard notifications@github.com
|
permissions: 0755 | ||
content: | | ||
#!/bin/sh | ||
m=$(echo $(etcdctl ls --recursive /corekube/minions | cut -d/ -f4 | sort) | tr ' ' ,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take a closer look at some point about making the failure modes for these scripts a little more robust. As things stand if, say, etcdctl fails it'll be hard to figure out what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree. We'll work on iterating these little scripts to make them more robust. I wonder if building smaller 3rd party Go binaries would be the best course of action. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure -- I could see going either way. No need to lock it down now though.
In any case, it might be worthwhile to create a mime-encoded cloud-init package vs. putting everything into a single yaml file. That is how we could start using the stuff from #1831.
Thanks for getting things fixed up! One thing to keep in mind is that as we diverge from everyone using salt (not a bad thing!) we might miss out in features like the recent fluentd set up. If we (eventually) move the coreOS setup (combo of systemd+cloud-init) into a common shared space it might be more likely that we see this stuff get kept up to date. Something for later though :) |
|
||
units: | ||
#- name: nova-agent-watcher.service | ||
# command: try-restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are noise left over from a previous workaround I needed - they should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I'll remove them.
General Comment: In RackConnect v3, the public_ip and private_ip are goofy because CoreOS uses eth0 for public_ip and eth1 for private_ip. RackConnect v3 assigns your "public" ip to eth1, so these scripts will cause networking to be backwards in a RCv3 environment. I had to change all my K8s/CoreOS scripts to account for this when I moved to RCv3 a few weeks ago. Small enough edge case that it's not worth trying to detect, but might be worth mentioning in a doc/comment somewhere that's visible before install. |
@bketelsen thanks for the heads up. In general these are just examples of how to get k8s up and running on a specific provider for dev purposes. I would expect anyone to run this for real would build the own deployment scripts. Hopefully that's not a crazy expectation. Re: CoreOS and public_ip/private_ip. I personally don't even like using eth1 (servicenet) and would much rather use eth2. It would be nice to be able to use something like "eth2_ipv4" or "eth0_ipv6". I think this would get around RC issues. |
The issue with NIC assignment seems really subtle and easy to get wrong. Perhaps that is worth a section in the docs? Also if we expect users to customize this stuff for prod deployments perhaps state that explicitly in the docs too? @doublerr Feel free to ping me on IRC when you are ready for another round of reviews. |
bd34262
to
ae68ab7
Compare
@@ -40,4 +40,5 @@ fi | |||
kube::build::copy_output | |||
kube::build::run_image | |||
kube::release::package_tarballs | |||
kube::release::gcs::release | |||
|
|||
kube::release::${KUBERNETES_PROVIDER-gce}::release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbeda I'm not sure what to do here. You switch between gce
and gcs
acronyms in cluster and build respectively. This provider logic won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doublerr You shouldn't need to touch anything in build/
. We really have two separate places now where we upload stuff -- it keeps things cleaner and keeps the build/release stuff completely divorced from the "run a cluster" stuff.
build/release.sh
-- Here we are uploading the binary release tarball for doing things like automated releases and nightlies. Unless you are going to be cutting and publishing your own releases (or unless we want to mirror releases into rackspace) there is no reason to touch this. Most developers won't upload here as part of building dev releasescluster/*/util.sh
-- Here we assume that there is a tarball with the built stuff we need locally. We need to get it to the machines we are bringing up and deploying. You may want to upload through some cloud storage as a easy way to get those tar files there -- that is what we do for GCE. But for something like vagrant, we take advantage of the shared/vagrant
file system. For vSphere, wescp
the tars up over an ssh connection.
While this is a little less efficient than it could be (in the binary release case the tars will transit your local workstation and have to be downloaded and then re-uploaded) it really smooths over the dev situation and makes it super easy to distribute a single tarball that has everything you need.
We may short-circuit the re-upload in the future but I consider that an optimization over the pattern we have now.
Sorry this is so confusing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbeda thanks for the update. I was wondering why there 2 functions to upload the tar files. We don't plan on cutting releases so I'll move the "upload tar" code to the util.sh scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome -- I'll be on IRC a lot today if you have questions.
The Rackspace setup for Kubernetes now leverages CoreOS instead of Ubuntu. We've dropped Salt completely for our cluster.
Updates to the build/release.sh scripts to upload build binaries to Cloud Files.
The functions to upload built k8s tars to Cloud Files were incorrectly placed in build/common.sh. These have been migrated to cluster/rackspace/util.sh
36f3b06
to
a26aefa
Compare
Looks good to me! Let me know when you are ready for me to merge it and I'll get it in. |
Sorry, I've been traveling. It's good to go but I can rebase one last time if needed. |
Awesome! Thanks for getting this done! Merging. |
Rackspace - Switch to CoreOS for standard cluster
OCPBUGS-25812: Fix device uncertain errors on reboot - 4.15
The Rackspace setup for Kubernetes now leverages CoreOS instead of Ubuntu. We've dropped Salt completely for the cluster.
This doesn't include the updated release scripts so I've left those in the icebox for now.