Rolling restart of pods #13488

ghodss · 2015-09-02T01:48:33Z

kubectl rolling-update is useful for incrementally deploying a new replication controller. But if you have an existing replication controller and want to do a rolling restart of all the pods that it manages, you are forced to do a no-op update to an RC with a new name and the same spec. It would be useful to be able to do a rolling restart without needing to change the RC or to give the RC spec, so anyone with access to kubectl could easily initiate a restart without worrying about having the spec locally, making sure it's the same/up to date, etc. This could work in a few different ways:

A new command, kubectl rolling-restart that takes an RC name and incrementally deletes all the pods controlled by the RC and allows the RC to recreate them.
Same as 1, but instead of deleting each pod, the command iterates through the pods and issues some kind of "restart" command to each pod incrementally (does this exist? is this a pattern we prefer?). The advantage of this one is that the pods wouldn't get unnecessarily rebalanced to other machines.
kubectl rolling-update with a flag that lets you specify an old RC only, and it follows the logic of either 1 or 2.
kubectl rolling-update with a flag that lets you specify an old RC only, and it auto-generates a new RC based on the old one and proceeds with normal rolling update logic.

All of the above options would need the MaxSurge and MaxUnavailable options recently introduced (see #11942) along with readiness checks along the way to make sure that the restarting is done without taking down all the pods.

@nikhiljindal @kubernetes/kubectl

The text was updated successfully, but these errors were encountered:

nikhiljindal · 2015-09-02T03:50:16Z

cc @ironcladlou @bgrant0607

Whats the use case for restarting the pods without any changes to the spec?

Note that there wont be any way to rollback the change if pods started failing when they were restarted.

ghodss · 2015-09-02T04:01:36Z

Whenever services get into some wedged or undesirable state (maxed out connections and are now stalled, bad internal state, etc.). It's usually one of the first troubleshooting steps if a service is seriously misbehaving.

If the first pod fails as it is restarted, I would expect it to cease continuing or continue retrying to start the pod.

smarterclayton · 2015-09-02T04:23:20Z

Also, a rolling restart with no spec change reallocates pods across the
cluster.

However, I would also like the ability to do this without rescheduling the
pods. That could be a rolling label change, but may pick up new dynamic
config or clear the local file state.

On Wed, Sep 2, 2015 at 12:01 AM, Sam Ghods notifications@github.com wrote:

Whenever services get into some wedged or undesirable state (maxed out
connections and are now stalled, bad internal state, etc.). It's usually
one of the first troubleshooting steps if the service is seriously
misbehaving.

If the first pod fails as it is restarted, I would expect it to cease
continuing or continue retrying to start the pod.

—
Reply to this email directly or view it on GitHub
#13488 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

ghodss · 2015-09-02T05:03:47Z

@smarterclayton Is that like my option 2 listed above? Though why would labels be changed?

bgrant0607 · 2015-09-02T05:09:54Z

Re. wedged: That's what liveness probes are for.

Re. rebalancing: see #12140

If we did support this, I'd lump it with #9043 -- the same mechanism is required.

ghodss · 2015-09-02T06:10:38Z

I suppose this would more be for a situation where the pod is alive and responding to checks but still needs to be restarted. One example is a service with an in-memory cache or internal state that gets corrupted and needs to be cleared.

I feel like asking for an application to be restarted is a fairly common use case, but maybe I'm incorrect.

bgrant0607 · 2015-09-04T05:54:43Z

Corruption would just be one pod, which could just be killed and replaced by the RC.

The other case mentioned offline was to re-read configuration. That's dangerous to do implicitly, because restarts for any reason would cause containers to load the new configuration. It would be better to do a rolling update to push a new versioned config reference (e.g. in an env var) to the pods. This is similar to what motivated #1353.

gmarek · 2015-09-09T11:44:17Z

@bgrant0607 have we decided that we don't want to do this?

bgrant0607 · 2015-09-09T21:02:57Z

@gmarek Nothing, for now. Too many things are underway already.

gmarek · 2015-09-10T07:41:50Z

Can we have a post v1.1 milestone (or something) for the stuff that we deem important, but we lack people to fix them straight away?

Glennvd · 2015-12-01T10:43:48Z

I would be a fan of this feature as well, you don't want to be forced to switch tags for every minor update you want to roll out.

mbmccoy · 2015-12-31T22:24:05Z

I'm a fan of this feature. Use case: Easily upgrade all the pods to use a newly-pushed docker image (with imagePullPolicy: Always). I currently use a bit of a hacky solution: Rolling-update with or without the :latest tag on the image name.

mbmccoy · 2016-01-08T23:00:06Z

Another use case: Updating secrets.

ericuldall · 2016-04-05T20:14:28Z

I'd really like to see this feature. We run node apps on kubernetes and currently have certain use cases where we restart pods to clear in app pseudo caching.

Here's what I'm doing for now:

kubectl get pod | grep 'pod-name' | cut -d " " -f1 - | xargs -n1 -P 10 kubectl delete pod

This deletes pods 10 at a time and works well in a replication controller set up. It does not address any concerns like pod allocation or new pods failing to start. It's a quick solution when needed.

jonaz · 2016-04-25T14:52:40Z

I would really like to be able to do a rolling restart.
The main reason is we will feed ENV variables into pods using ConfigMap and then if we change config we need to restart the consumers of that ConfigMap.

paunin · 2016-05-10T08:35:42Z

Yes, there are a lot of cases when you really want to restart pod/container without changes inside...
Configs, cache, reconnect to external services, etc. I really hope the feature will be developed.

paunin · 2016-05-10T09:19:04Z

Small work around (I use deployments and I want to change configs without having real changes in image/pod):

create configMap
create deployment with ENV variable (you will use it as indicator for your deployment) in any container
update configMap
update deployment (change this ENV variable)

k8s will see that definition of the deployment has been changed and will start process of replacing pods
PS:
if someone has better solution, please share

Lasim · 2016-05-10T10:53:02Z

Thank you @paunin

wombat · 2016-07-28T08:30:52Z

@paunin Thats exactly the case where we need it currently - We have to change ConfigMap values that are very important to the services and need to be rolled-out to the containers within minutes up to some hours. If no deployment happens in the meantime the containers will all fail at the same time and we will have partial downtime of at least some seconds

dimileeh · 2019-12-19T17:18:16Z

Our GKE cluster on "rapid" release channel has upgraded itself to Kubernetes 1.16 and now kubectl rollout restart has stopped working:

kubectl rollout restart deployment myapp
error: unknown command "restart deployment myapp"

@nikhiljindal asked a while ago about the use case for updating the deployments without any changes to the specs. Maybe we're doing it in a non-optimal way, but here it is: our pre-trained ML models are loaded into memory from Google Cloud Storage. When model files get updated on GCS, we want to rollout restart our K8S deployment, which pulls the models from GCS.

I appreciate we aren't able to roll back the deployment with previous model files easily, but that's the trade-off we adopted to bring models as close as possible to the app and avoid a network call (as some might suggest).

apelisse · 2019-12-19T19:12:43Z

hey @dimileeh

Do you happen to know what version of kubectl you're using now? and what version you used before? I'd love to know if there was a regression, but at the same time I'd be surprised if the feature had entirely disappeared.

With regard to the GCS thing, and knowing very little about your use-case so sorry if it makes no sense: I would suggest that the gcs model get a different name every time they get modified (maybe suffix with their hash), and that the name would be included in the deployment. Updating the deployment to use the new files would automatically trigger a rollout. This give you the ability to roll-back to a previous deployment/model, have a better understanding of the changes happening to the models, etc.

dimileeh · 2019-12-19T19:23:11Z

hi @apelisse, thank you for your response!

When I run kubectl version from Google Cloud Terminal, I get the following:

Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-dispatcher", GitCommit:"2e298c7e992f83f47af60cf4830b11c7370f6668", GitTreeState:"clean", BuildDate:"2019-09-19T22:20:12Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.0-gke.20", GitCommit:"d324c1db214acfc1ff3d543767f33feab3f4dcaa", GitTreeState:"clean", BuildDate:"2019-11-26T20:51:21Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

When I tried to upgrade kubectl via gcloud components update, it said I'm already using the latest versions of all products. Therefore, I think my kubectl version stayed the same while the K8S cluster upgraded from 1.15 to 1.16.

The Kubenetes documentation 1.17, 1.16 and 1.15 has nothing about kubectl rollout restart feature. So I wonder if your valuable contribution could have disappeared from 1.16?

Thank you for your suggestion on model versioning, it makes perfect sense. We thought about that but then, since we retrain our models every day, we thought we'd start accumulating too many models (and they are quite heavy). Of course, we could use some script to clean up old versions after some time, but so far we've decided to keep it simple relying on kubectl rollout restart and not caring about model versioning :)

apelisse · 2019-12-19T20:26:16Z

I can see the docs here:
https://v1-16.docs.kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-restart-em-

dimileeh · 2019-12-19T20:40:33Z

Ah, thank you, I was looking here:
https://v1-16.docs.kubernetes.io/docs/reference/kubectl/cheatsheet/

apelisse · 2019-12-19T21:12:39Z

Thank you very much for that link, I’ll make sure it gets updated !

…

On Thu, Dec 19, 2019 at 12:40 PM Dmitri Lihhatsov ***@***.***> wrote: Ah, thank you, I was looking here: https://v1-16.docs.kubernetes.io/docs/reference/kubectl/cheatsheet/ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13488?email_source=notifications&email_token=AAOXDLCDSTPYK6EGBQWSRADQZPL5BA5CNFSM4BOYZ5Z2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHK3ZSA#issuecomment-567655624>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOXDLHWCU4T6NCSHOYZIELQZPL5BANCNFSM4BOYZ5ZQ> .

apelisse · 2019-12-21T00:08:23Z

@dimileeh PTAL kubernetes/website#18224 (I'll cherry-pick in relevant branches once this gets merged).

apelisse · 2019-12-21T00:21:53Z

@dimileeh I think I figured out what's wrong with your kubectl version, we'll be working on it.

anuragtr · 2020-09-14T08:15:59Z

Yes, we also have use case of re-starting pod without code change, after updating the configmap. This is to update a ML model without re-deploying the service.

montanaflynn · 2020-09-14T08:50:45Z

@anuragtr with latest versions you can run

kubectl rollout restart deploy NAME

mauri870 · 2020-12-11T23:53:15Z

I was using a custom command for that [1], glad it is now in the standard kubectl! Thanks

[1] https://github.com/mauri870/kubectl-renew

japzio · 2020-12-14T05:38:18Z

@anuragtr with latest versions you can run
kubectl rollout restart deploy NAME

@countrogue

shoce · 2021-05-02T07:07:57Z

As i understand now if i have a newer docker image tagged as :latest and a deployment using the image tagged :latest, with the kubectl rollout restart and imagePullPolicy: Always specified in the container template it is possible to restart the pods and they will pull the newer image. But if the image is still the same, rollout restart will still restart the pods of the deployment.

IS THERE a way to ask Kubernetes to check if the image has really changed and restart pods ONLY if the image differs from that one used in the pods?

I am migrating services that are managed by docker-compose, and currently i run docker-compose up -d and it restarts a service (using :latest tagged image) only if another version of the image (still tagged as :latest) is available now.

AndrewFarley · 2021-05-02T07:28:35Z

@shoce I believe you may misunderstand how kubernetes and this latest tag concept works.

Simply put, if you use a dynamic tag (like latest) that can have multiple different images at any given time, you can never guarantee you are always using the same version of that tag (aka the same image). Kubernetes doesn’t do a “lookup” like you seem to assume to check what sha sum the current t deployment needs to pull from your container registry. This all is possible whether or not you have image pull policy set.

An example of it helps, is let’s say you have 3 nodes and a deployment of your “widgets” service that has 3 replicas specified and let’s say your image pull policy is always. Let’s say you trigger an update to your service (although I do not know how since your image tag didn’t change. So let’s say you do something silly which I’ve seen before like set the current date into an annotation). The second this triggers on the first node it will try to bring up a new pod with your latest latest, but before that gets healthy let’s say your CI system or your dev pushed a new latest. Then after this your first pod got healthy and a new pod on the second node tried to come up. This one would now be using the newer latest.

TL;DR: Do not use latest tag for anything ever for the most part. It’s really bad practice. All registries I know of have a feature you can enable which disallows pushing over an existing tag exactly for this reason. It’s bad. There are simple use cases for latest (Eg useful for internal ci images and tooling and can be useful in docker files) but you should understand when you use and not use them. Deploying something into kubernetes with a latest tag is generally viewed as “doing something wrong/funny” in my experience.

shoce · 2021-05-02T07:50:19Z

@AndrewFarley thanks, your explanation helped me a lot and the example is something i could not see before. Actually what i do is using :develop and :master tagging of docker images built against the corresponding git branches and deploy them to dev and staging environments as soon as possible. As my dev and staging environments are hosted on a single host i did not worry much.

After reading many discussions on :latest tags and Kubernetes i finally agree to drop using :latest tagging with Kubernetes even for dev and staging environments. It was simple to use with docker-compose but not ok with Kubernetes.

I have two related questions now and i would appreciate any leads. It might seem off topic but i believe these are the issues blocking people to drop using :latest tags.

What would be the best way to trigger deployments' updates? Instead of the cron job i have now that does docker pull and docker-compose up -d. I have a feeling it is too much coupling to make the build system to trigger updates to the cluster. I prefer the cluster to check for updates available but may be i am wrong here.
How to clean old docker images in private docker registry? Now i have :develop and :master tags only and no need to make cleaning. But with proper tagging i will have to keep only the latest :develop-timestamp-or-hash image and delete all older images. This really adds me headache as i try to have minimum possibilities of running out of disk space.

AndrewFarley · 2021-05-02T08:10:14Z

@shoce You should google this problem for your registry provider (eg: "Automatically delete old images on REGISTRY_PROVIDER" or perhaps "Delete untagged images on REGISTRY_PROVIDER")

Each registry tends to have their own API and/or tools to handle this. AWS, for example, has a built-in Lifecycle Policy which you can configure to automatically delete images without requiring much effort on your part. In the past on "simpler" registries I've been known to write simple scripts to query the old images and delete them.

Good luck!

bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. area/kubectl team/ux labels Sep 2, 2015

bgrant0607 added the area/app-lifecycle label Sep 2, 2015

gmarek assigned bgrant0607 Sep 9, 2015

bgrant0607 removed their assignment Sep 9, 2015

nikhiljindal mentioned this issue Nov 10, 2015

Rolling update without any changes #16977

Closed

bgrant0607 mentioned this issue Feb 18, 2016

Implement alpha version of PreferAvoidPods #20699

Merged

nikhiljindal mentioned this issue Apr 18, 2016

Kubectl rolling update in 1.2 not behaving the same as 1.1 #23497

Closed

dts mentioned this issue May 25, 2016

In-place rolling updates #9043

Closed

0xmichalis mentioned this issue Jun 28, 2016

secrets guide references rolling-update and could use deployment version of language #28184

Closed

k8s-ci-robot closed this as completed in #77423 May 16, 2019

wangyoucao577 mentioned this issue Jun 24, 2019

Service will break for timed blue/green deployment by kubernetes Telenav/osrm-backend#25

Closed

natarajaya mentioned this issue Jul 2, 2019

[GPII-3868]: Bump kubectl version to 1.15.0 gpii-ops/exekube#59

Closed

vishal-biyani mentioned this issue Jul 9, 2019

Function Update if config/secret changes fission/fission#1224

Merged

joestringer mentioned this issue Jul 16, 2019

Error while inserting service in LB map: argument list too long cilium/cilium#8587

Closed

jfillo mentioned this issue Aug 15, 2019

Feature Request: Rolling Restart of Pods spinnaker/spinnaker#4757

Closed

saintanger mentioned this issue Sep 3, 2019

v2.2.4 Pipeline deployment does not redeploy pods rancher/rancher#21352

Closed

ryanelian mentioned this issue Sep 4, 2019

UI Enhancement: Add Restart Deployment Feature kubernetes/dashboard#4254

Closed

pitlv2109 mentioned this issue Oct 28, 2019

Update upgrade steps for Istio 1.3 istio/istio.io#4890

Merged

kerneltime mentioned this issue Jun 19, 2020

Improve the zone addition process minio/operator#150

Merged

Rolling restart of pods #13488

Rolling restart of pods #13488

Comments

ghodss commented Sep 2, 2015

nikhiljindal commented Sep 2, 2015

ghodss commented Sep 2, 2015

smarterclayton commented Sep 2, 2015

ghodss commented Sep 2, 2015

bgrant0607 commented Sep 2, 2015

ghodss commented Sep 2, 2015

bgrant0607 commented Sep 4, 2015

gmarek commented Sep 9, 2015

bgrant0607 commented Sep 9, 2015

gmarek commented Sep 10, 2015

Glennvd commented Dec 1, 2015

mbmccoy commented Dec 31, 2015

mbmccoy commented Jan 8, 2016

ericuldall commented Apr 5, 2016

jonaz commented Apr 25, 2016

paunin commented May 10, 2016 • edited Loading

paunin commented May 10, 2016 • edited Loading

Lasim commented May 10, 2016

wombat commented Jul 28, 2016

dimileeh commented Dec 19, 2019 • edited Loading

apelisse commented Dec 19, 2019 • edited Loading

dimileeh commented Dec 19, 2019 • edited Loading

apelisse commented Dec 19, 2019

dimileeh commented Dec 19, 2019

apelisse commented Dec 19, 2019 via email

apelisse commented Dec 21, 2019

apelisse commented Dec 21, 2019

anuragtr commented Sep 14, 2020

montanaflynn commented Sep 14, 2020

mauri870 commented Dec 11, 2020

japzio commented Dec 14, 2020

shoce commented May 2, 2021 • edited Loading

AndrewFarley commented May 2, 2021

shoce commented May 2, 2021 • edited Loading

AndrewFarley commented May 2, 2021

paunin commented May 10, 2016 •

edited

Loading

paunin commented May 10, 2016 •

edited

Loading

dimileeh commented Dec 19, 2019 •

edited

Loading

apelisse commented Dec 19, 2019 •

edited

Loading

dimileeh commented Dec 19, 2019 •

edited

Loading

shoce commented May 2, 2021 •

edited

Loading

shoce commented May 2, 2021 •

edited

Loading