Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: Rolling system updates via delete/re-create nodes #5044

Closed
multilinear opened this issue Mar 4, 2015 · 11 comments
Closed

upgrade: Rolling system updates via delete/re-create nodes #5044

multilinear opened this issue Mar 4, 2015 · 11 comments
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Milestone

Comments

@multilinear
Copy link

I'm looking for a way to make kubernetes to do rolling system updates of node machines. I'm targetting the kernel and operating system levels, though kublet would come along for the ride.

My thought is that do an "update" of a node you simply turn up a new node, and then tear down the old one. This model is simple and easy to reason about, much easier than in-place update. We want kubernetes to know that a node is gone immediately though, so downtime of containers on that node is small. I think this is consistant with current plans? (e.g. #4855)

Here's my proposal:

  • Add kubectl (and API) functionality to add and remove nodes from the cluster dynamically, also ensuring the state gets exported to the "spec" for that node (I think what's needed is already in the status).
  • Within the kube shell scripts, refactor the startup and shutdown functions a little so they use lower-level functions that spin up a new machine and remove an old one using the kubectl calls. Add direct access to these new functions via something like kube-removenode and kube-addnode scripts. These scripts can poll the status and spec, and block until the change is complete (or error out if the spec changes).
  • The "rolling" part of the update can then be done with another small shell script in the kubernetes codebase, or by a user script.

I'm with Meteor, the folks running kubernetes on AWS, and my intention is to implement this myself within about 2 months if it looks like a good direction to go. I don't want to step on anyones toes, and after I do this work I'd really like to upstream it, so I'm trying to ensure that this is compatible with the rest of the project.

@erictune
Copy link
Member

erictune commented Mar 4, 2015

related issues:
#3333
#2524
#1573

@erictune
Copy link
Member

erictune commented Mar 4, 2015

@zmerlynn

@erictune
Copy link
Member

erictune commented Mar 4, 2015

good to see a familiar face, @multilinear

Contribs are always welcome. Things change rapidly, so the best approach is to break it up and contribute incrementally.

We've been assuming that we have to respect an SLA for pods to do node upgrades. If for your cluster this is not necessary then you could script this yourself without any changes to the kubernetes code.

I don't think you need to add API functionality to do this.
Hints:

  • switch from master-based node management to "manual" node management. See the bottom of docs/node.md talks about this. --sync_nodes=false
  • If you kubectl remove nodes $nodename, then it won't be schedulable. But the pods will keep running for a bit. Then you could gracefully stop all those pods within 5 minutes before they are ungracefully killed.

@erictune erictune added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Mar 4, 2015
@zmerlynn
Copy link
Member

zmerlynn commented Mar 4, 2015

Let's definitely sync. @maxforbes on our side has already started looking at how to do rolling updates for GKE, and it's similar to what you're saying. We were thinking of starting with simple drain and re-provision as well.

I'm considering more grandiose plans to possibly rewrite kube-up.sh and friends to take a yaml specification so we can be a lot more precise for provisioning and update. I expect to have a proposal out soon, assuming I get some time for that this week. (I think I can take all the existing scripts along for the ride and work towards rewriting advanced providers in Go as we want.) So that's something else in this puzzle to think about.

@zmerlynn
Copy link
Member

zmerlynn commented Mar 4, 2015

Silly mobile client. I meant @mbforbes

@multilinear
Copy link
Author

I see, so you can do this via "kubectl create" and "kubectl delete". Then indeed manual management would definitely work for now. There's a bit more already there than I realized... I'll have to give that a try. Thanks!

It sounds like zmerlynn has more thoughts though, and I'm still interested in solutions that build a bit more into the kubernetes framework as well (be it shell scripts or otherwise). So I'll leave this open.

@erictune
Copy link
Member

erictune commented Mar 4, 2015

@multilinear can you report back and let us know if the manual strategy worked and/or if you hit any bumps?

@erictune
Copy link
Member

erictune commented Mar 4, 2015

Also, once #4585 merges, which should be soon, there will be an explicit unschedulable-but-not-deleted state for nodes. That will give you as much time as you want to drain the node.

@alex-mohr alex-mohr changed the title Rolling system updates upgrade: Rolling system updates via delete/re-create nodes Mar 19, 2015
@alex-mohr alex-mohr added this to the v1.0 milestone Mar 19, 2015
@mbforbes mbforbes mentioned this issue Mar 27, 2015
10 tasks
@mbforbes
Copy link
Contributor

With the round-up for node upgrades (#6079), I'm trying to close other issues that target pieces of the process. Specifically, this should be implemented with the basic version (#6082).

Your ideas here were definitely helpful, though! I think they're going to be taking shape in the system:

Definitely re-open / ping if I missed something!

@multilinear
Copy link
Author

I see. I don't actually trust the methods in #6082 (basically I don't think it will actually work), and would argue that just recreating the machine form scratch is much smarter. Truly idempotent machine setup is nigh impossible (Chef doesn't do it, and that's their entire product), and I'm not sure it has much value if you're running inside a virtualized cluster anyway... but I suppose that's an argument to have on that bug :).

So, with that sort-of exception, yes I think you hit all the points.

@mbforbes
Copy link
Contributor

Yeah, believe me, I'd rather start from a fresh machine (huge discussion on this in #3333). However, as Zach mentioned in his reply to you in #6082, we need to be able to let new nodes into the cluster, aka dynamic clustering (#6087), before we can do this. And we want to make some progress before then, as we've been blocked on this for a long time.

As soon as dynamic clustering is in, we have #6088 to start with a fresh machine instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests

5 participants