upgrade: Rolling system updates via delete/re-create nodes #5044

multilinear · 2015-03-04T20:23:30Z

I'm looking for a way to make kubernetes to do rolling system updates of node machines. I'm targetting the kernel and operating system levels, though kublet would come along for the ride.

My thought is that do an "update" of a node you simply turn up a new node, and then tear down the old one. This model is simple and easy to reason about, much easier than in-place update. We want kubernetes to know that a node is gone immediately though, so downtime of containers on that node is small. I think this is consistant with current plans? (e.g. #4855)

Here's my proposal:

Add kubectl (and API) functionality to add and remove nodes from the cluster dynamically, also ensuring the state gets exported to the "spec" for that node (I think what's needed is already in the status).
Within the kube shell scripts, refactor the startup and shutdown functions a little so they use lower-level functions that spin up a new machine and remove an old one using the kubectl calls. Add direct access to these new functions via something like kube-removenode and kube-addnode scripts. These scripts can poll the status and spec, and block until the change is complete (or error out if the spec changes).
The "rolling" part of the update can then be done with another small shell script in the kubernetes codebase, or by a user script.

I'm with Meteor, the folks running kubernetes on AWS, and my intention is to implement this myself within about 2 months if it looks like a good direction to go. I don't want to step on anyones toes, and after I do this work I'd really like to upstream it, so I'm trying to ensure that this is compatible with the rest of the project.

erictune · 2015-03-04T20:43:01Z

related issues:
#3333
#2524
#1573

erictune · 2015-03-04T20:44:14Z

@zmerlynn

erictune · 2015-03-04T20:59:32Z

good to see a familiar face, @multilinear

Contribs are always welcome. Things change rapidly, so the best approach is to break it up and contribute incrementally.

We've been assuming that we have to respect an SLA for pods to do node upgrades. If for your cluster this is not necessary then you could script this yourself without any changes to the kubernetes code.

I don't think you need to add API functionality to do this.
Hints:

switch from master-based node management to "manual" node management. See the bottom of docs/node.md talks about this. --sync_nodes=false
If you kubectl remove nodes $nodename, then it won't be schedulable. But the pods will keep running for a bit. Then you could gracefully stop all those pods within 5 minutes before they are ungracefully killed.

zmerlynn · 2015-03-04T21:17:33Z

Let's definitely sync. @maxforbes on our side has already started looking at how to do rolling updates for GKE, and it's similar to what you're saying. We were thinking of starting with simple drain and re-provision as well.

I'm considering more grandiose plans to possibly rewrite kube-up.sh and friends to take a yaml specification so we can be a lot more precise for provisioning and update. I expect to have a proposal out soon, assuming I get some time for that this week. (I think I can take all the existing scripts along for the ride and work towards rewriting advanced providers in Go as we want.) So that's something else in this puzzle to think about.

zmerlynn · 2015-03-04T21:20:28Z

Silly mobile client. I meant @mbforbes

multilinear · 2015-03-04T22:10:57Z

I see, so you can do this via "kubectl create" and "kubectl delete". Then indeed manual management would definitely work for now. There's a bit more already there than I realized... I'll have to give that a try. Thanks!

It sounds like zmerlynn has more thoughts though, and I'm still interested in solutions that build a bit more into the kubernetes framework as well (be it shell scripts or otherwise). So I'll leave this open.

erictune · 2015-03-04T22:47:02Z

@multilinear can you report back and let us know if the manual strategy worked and/or if you hit any bumps?

erictune · 2015-03-04T23:56:37Z

Also, once #4585 merges, which should be soon, there will be an explicit unschedulable-but-not-deleted state for nodes. That will give you as much time as you want to drain the node.

mbforbes · 2015-03-27T22:48:54Z

With the round-up for node upgrades (#6079), I'm trying to close other issues that target pieces of the process. Specifically, this should be implemented with the basic version (#6082).

Your ideas here were definitely helpful, though! I think they're going to be taking shape in the system:

API functionality to add & remove nodes will be in pkg/cloudprovider
the tool / libraries to add & remove nodes this will be kubedeploy (Proposal to rework Kubernetes deployment CLI #5472)
doing this dynamically is a super important issue (Allow nodes to dynamically join a running cluster #6087, part of Secure node -> master communication #3168)
rolling update should be the basic mechanism for upgrades (Node upgrades: basic mechanism #6082)
you clearly wanted the mechanisms by which this is done to be in Go in a non-horrible way; we want a step more and have the top-level mechanism to be Go rather than a Bash script (Node upgrades: write in Go and hook into pkg/cloudprovider #6095)
and you were doing this on AWS; we also want a provider-agnostic solution (Node upgrades: provider agnosticism #6096)

Definitely re-open / ping if I missed something!

multilinear · 2015-03-30T14:02:50Z

I see. I don't actually trust the methods in #6082 (basically I don't think it will actually work), and would argue that just recreating the machine form scratch is much smarter. Truly idempotent machine setup is nigh impossible (Chef doesn't do it, and that's their entire product), and I'm not sure it has much value if you're running inside a virtualized cluster anyway... but I suppose that's an argument to have on that bug :).

So, with that sort-of exception, yes I think you hit all the points.

mbforbes · 2015-03-30T19:37:25Z

Yeah, believe me, I'd rather start from a fresh machine (huge discussion on this in #3333). However, as Zach mentioned in his reply to you in #6082, we need to be able to let new nodes into the cluster, aka dynamic clustering (#6087), before we can do this. And we want to make some progress before then, as we've been blocked on this for a long time.

As soon as dynamic clustering is in, we have #6088 to start with a fresh machine instead.

erictune added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Mar 4, 2015

erictune mentioned this issue Mar 9, 2015

Added kernel upgrade workflow documentation #5211

Merged

alex-mohr changed the title ~~Rolling system updates~~ upgrade: Rolling system updates via delete/re-create nodes Mar 19, 2015

alex-mohr added this to the v1.0 milestone Mar 19, 2015

mbforbes mentioned this issue Mar 27, 2015

Node upgrades #6079

Closed

10 tasks

mbforbes closed this as completed Mar 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade: Rolling system updates via delete/re-create nodes #5044

upgrade: Rolling system updates via delete/re-create nodes #5044

multilinear commented Mar 4, 2015

erictune commented Mar 4, 2015

erictune commented Mar 4, 2015

erictune commented Mar 4, 2015

zmerlynn commented Mar 4, 2015

zmerlynn commented Mar 4, 2015

multilinear commented Mar 4, 2015

erictune commented Mar 4, 2015

erictune commented Mar 4, 2015

mbforbes commented Mar 27, 2015

multilinear commented Mar 30, 2015

mbforbes commented Mar 30, 2015

upgrade: Rolling system updates via delete/re-create nodes #5044

upgrade: Rolling system updates via delete/re-create nodes #5044

Comments

multilinear commented Mar 4, 2015

erictune commented Mar 4, 2015

erictune commented Mar 4, 2015

erictune commented Mar 4, 2015

zmerlynn commented Mar 4, 2015

zmerlynn commented Mar 4, 2015

multilinear commented Mar 4, 2015

erictune commented Mar 4, 2015

erictune commented Mar 4, 2015

mbforbes commented Mar 27, 2015

multilinear commented Mar 30, 2015

mbforbes commented Mar 30, 2015