Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rackspace - Switch to CoreOS for standard cluster #1832

Merged
merged 5 commits into from
Oct 22, 2014

Conversation

doublerr
Copy link
Contributor

The Rackspace setup for Kubernetes now leverages CoreOS instead of Ubuntu. We've dropped Salt completely for the cluster.

This doesn't include the updated release scripts so I've left those in the icebox for now.

@doublerr
Copy link
Contributor Author

PR for issue #1733

CC @jbeda

@thockin
Copy link
Member

thockin commented Oct 16, 2014

Please hold this until after breakage day. I'm happy to see less salt.

On Thu, Oct 16, 2014 at 8:17 AM, Ryan Richard notifications@github.com
wrote:

PR for issue #1733
#1733

CC @jbeda https://github.com/jbeda

Reply to this email directly or view it on GitHub
#1832 (comment)
.

@doublerr
Copy link
Contributor Author

@thockin I've been disconnected for a bit. When is breakage day and is it mostly around #1402 ?

@thockin
Copy link
Member

thockin commented Oct 16, 2014

Breakage day is today :)

#1402, #1564, #1662 are the primary PRs

On Thu, Oct 16, 2014 at 8:47 AM, Ryan Richard notifications@github.com
wrote:

@thockin https://github.com/thockin I've been disconnected for a bit.
When is breakage day and is it mostly around #1402
#1402 ?

Reply to this email directly or view it on GitHub
#1832 (comment)
.

permissions: 0755
content: |
#!/bin/sh
m=$(echo $(etcdctl ls --recursive /corekube/minions | cut -d/ -f4 | sort) | tr ' ' ,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd take a closer look at some point about making the failure modes for these scripts a little more robust. As things stand if, say, etcdctl fails it'll be hard to figure out what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. We'll work on iterating these little scripts to make them more robust. I wonder if building smaller 3rd party Go binaries would be the best course of action. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure -- I could see going either way. No need to lock it down now though.

In any case, it might be worthwhile to create a mime-encoded cloud-init package vs. putting everything into a single yaml file. That is how we could start using the stuff from #1831.

@jbeda
Copy link
Contributor

jbeda commented Oct 16, 2014

Thanks for getting things fixed up!

One thing to keep in mind is that as we diverge from everyone using salt (not a bad thing!) we might miss out in features like the recent fluentd set up. If we (eventually) move the coreOS setup (combo of systemd+cloud-init) into a common shared space it might be more likely that we see this stuff get kept up to date.

Something for later though :)


units:
#- name: nova-agent-watcher.service
# command: try-restart
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are noise left over from a previous workaround I needed - they should be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I'll remove them.

@bketelsen
Copy link
Contributor

General Comment: In RackConnect v3, the public_ip and private_ip are goofy because CoreOS uses eth0 for public_ip and eth1 for private_ip. RackConnect v3 assigns your "public" ip to eth1, so these scripts will cause networking to be backwards in a RCv3 environment. I had to change all my K8s/CoreOS scripts to account for this when I moved to RCv3 a few weeks ago. Small enough edge case that it's not worth trying to detect, but might be worth mentioning in a doc/comment somewhere that's visible before install.

@doublerr
Copy link
Contributor Author

@bketelsen thanks for the heads up. In general these are just examples of how to get k8s up and running on a specific provider for dev purposes. I would expect anyone to run this for real would build the own deployment scripts. Hopefully that's not a crazy expectation.

Re: CoreOS and public_ip/private_ip. I personally don't even like using eth1 (servicenet) and would much rather use eth2. It would be nice to be able to use something like "eth2_ipv4" or "eth0_ipv6". I think this would get around RC issues.

@jbeda
Copy link
Contributor

jbeda commented Oct 17, 2014

The issue with NIC assignment seems really subtle and easy to get wrong. Perhaps that is worth a section in the docs? Also if we expect users to customize this stuff for prod deployments perhaps state that explicitly in the docs too?

@doublerr Feel free to ping me on IRC when you are ready for another round of reviews.

@doublerr doublerr force-pushed the rackspace_switch_to_coreos branch from bd34262 to ae68ab7 Compare October 17, 2014 20:36
@@ -40,4 +40,5 @@ fi
kube::build::copy_output
kube::build::run_image
kube::release::package_tarballs
kube::release::gcs::release

kube::release::${KUBERNETES_PROVIDER-gce}::release
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbeda I'm not sure what to do here. You switch between gce and gcs acronyms in cluster and build respectively. This provider logic won't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doublerr You shouldn't need to touch anything in build/. We really have two separate places now where we upload stuff -- it keeps things cleaner and keeps the build/release stuff completely divorced from the "run a cluster" stuff.

  • build/release.sh -- Here we are uploading the binary release tarball for doing things like automated releases and nightlies. Unless you are going to be cutting and publishing your own releases (or unless we want to mirror releases into rackspace) there is no reason to touch this. Most developers won't upload here as part of building dev releases
  • cluster/*/util.sh -- Here we assume that there is a tarball with the built stuff we need locally. We need to get it to the machines we are bringing up and deploying. You may want to upload through some cloud storage as a easy way to get those tar files there -- that is what we do for GCE. But for something like vagrant, we take advantage of the shared /vagrant file system. For vSphere, we scp the tars up over an ssh connection.

While this is a little less efficient than it could be (in the binary release case the tars will transit your local workstation and have to be downloaded and then re-uploaded) it really smooths over the dev situation and makes it super easy to distribute a single tarball that has everything you need.

We may short-circuit the re-upload in the future but I consider that an optimization over the pattern we have now.

Sorry this is so confusing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbeda thanks for the update. I was wondering why there 2 functions to upload the tar files. We don't plan on cutting releases so I'll move the "upload tar" code to the util.sh scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome -- I'll be on IRC a lot today if you have questions.

The Rackspace setup for Kubernetes now leverages CoreOS instead of Ubuntu. We've dropped Salt completely for our cluster.
Updates to the build/release.sh scripts to upload build binaries to
Cloud Files.
The functions to upload built k8s tars to Cloud Files were incorrectly
placed in build/common.sh. These have been migrated to
cluster/rackspace/util.sh
@doublerr doublerr force-pushed the rackspace_switch_to_coreos branch from 36f3b06 to a26aefa Compare October 20, 2014 17:15
@jbeda
Copy link
Contributor

jbeda commented Oct 20, 2014

Looks good to me! Let me know when you are ready for me to merge it and I'll get it in.

@doublerr
Copy link
Contributor Author

Sorry, I've been traveling. It's good to go but I can rebase one last time if needed.

@jbeda
Copy link
Contributor

jbeda commented Oct 22, 2014

Awesome! Thanks for getting this done! Merging.

jbeda added a commit that referenced this pull request Oct 22, 2014
Rackspace - Switch to CoreOS for standard cluster
@jbeda jbeda merged commit 25b1eea into kubernetes:master Oct 22, 2014
@doublerr doublerr deleted the rackspace_switch_to_coreos branch October 22, 2014 16:49
This was referenced Oct 27, 2014
soltysh pushed a commit to soltysh/kubernetes that referenced this pull request Jan 18, 2024
OCPBUGS-25812: Fix device uncertain errors on reboot - 4.15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants