Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl run --restart=Never restarts (creates a Job) #24533

Closed
thockin opened this issue Apr 20, 2016 · 50 comments
Closed

kubectl run --restart=Never restarts (creates a Job) #24533

thockin opened this issue Apr 20, 2016 · 50 comments
Assignees
Labels
area/kubectl priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@thockin
Copy link
Member

thockin commented Apr 20, 2016

What do we think should happen here:

kubectl run should-run-once --image=busybox --restart=Never -- exit 1

I would expect this to run once and then never retry. What happens is that we create a Job with a PodTemplate that says restartPolicy: Never and then the Job itself happily restarts the Pod (well, recreates it). So it appears we no longer have a way to express "really, just run once" through kubectl run? That doesn't seem right - I feel like I am missing something...

@erictune @janetkuo

@pditommaso
Copy link
Contributor

I can include the full command line and the reported pods:

$ kubectl run nxf-126 --image=busybox --restart=Never -- exit 1
$ kubectl get pods -a
NAME                   READY     STATUS               RESTARTS   AGE
k8s-etcd-127.0.0.1     1/1       Running              0          1h
k8s-master-127.0.0.1   4/4       Running              1          1h
k8s-proxy-127.0.0.1    1/1       Running              0          1h
nxf-126-0xfra          0/1       ContainerCannotRun   0          1m
nxf-126-i2v90          0/1       ContainerCreating    0          11s
nxf-126-prp5h          0/1       ContainerCannotRun   0          58s
nxf-126-vgcd6          0/1       ContainerCannotRun   1          44s
nxf-126-yveze          0/1       ContainerCannotRun   0          1m

@bprashanth
Copy link
Contributor

I use this often:

$ kubectl run test -i --rm --image=busybox --restart=Never /bin/sh
# exit 1
Enter
E0420 09:21:13.526533    3346 v2.go:116] EOF
job "test" deleted

I'd asked on the brainstorming doc that we handle this better, a bug probably makes more sense, but there is a version that does what I (both of us?) want.

@thockin
Copy link
Member Author

thockin commented Apr 20, 2016

to clarify, there IS actually an error in the commandline - "exit" is not an executable, but a shell builtin.

kubectl run should-run-once --image=busybox --restart=Never -- sh -c "exit 1"

Now instead of ContainerCannotRun you get Error, but the overall same behavior

@bgrant0607 bgrant0607 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 20, 2016
@pditommaso
Copy link
Contributor

pditommaso commented Apr 21, 2016

Note: the same happens when creating a job with the command kubectl create -f job.yaml and the YAML descriptor contains restartPolicy: Never .

@janetkuo
Copy link
Member

janetkuo commented Apr 21, 2016

cc @mikedanese @erictune on job behavior

@soltysh
Copy link
Contributor

soltysh commented Apr 25, 2016

@thockin job controller's role is to execute your assignment successfully with the given parameters. That kubectl run you've mentioned, creates a default job with completions=1, which means the controller will try as hard as possible to actually have that one successful execution. If, on the other hand, you're still interested in actually running a pod --generator=run-pod/v1 is what you're looking for. I hope that answers your question.

@pditommaso
Copy link
Contributor

@thockin Does this mean that this behaviour is expected? If so how to create a job that is not restarted when returns a non-zero exit status? The documentation seems suggesting to specify restartPolicy = "Never".

@thockin
Copy link
Member Author

thockin commented Apr 25, 2016

Maciej, Yeah, I know that, but I'm arguing it's wrong. We already use --restart as
a switch between Deployment and Job. I think --restart=Never (to kubectl run, not to a Job yaml) should switch to run-pod/v1 automatically. The
user's intesntions are pretty clear, and a Job is NOT what they would
expect.

On Mon, Apr 25, 2016 at 8:33 AM, Maciej Szulik notifications@github.com
wrote:

@thockin https://github.com/thockin job controller's role is to execute
your assignment successfully with the given parameters. That kubectl run
you've mentioned, creates a default job with completions=1, which means
the controller will try as hard as possible to actually have that one
successful execution. If, on the other hand, you're still interested in
actually running a pod --generator=run-pod/v1 is what you're looking for.
I hope that answers your question.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#24533 (comment)

@erictune
Copy link
Member

erictune commented Apr 25, 2016

So, I think tim would be happy with the following:

kubectl run --restart=Always #  creates rc
kubectl run --restart=Never #  creates bare pod with restart=never
kubectl run --restart=OnFailure # creates a Job.

I would approve of that.

On Mon, Apr 25, 2016 at 12:46 PM, Tim Hockin notifications@github.com
wrote:

Maciej,

Yeah, I know that, but I'm arguing it's wrong. We already use --restart as
a switch between Deployment and Job. I think --restart=Never (to kubectl run, not to a Job yaml) should switch to run-pod/v1 automatically. The
user's intesntions are pretty clear, and a Job is NOT what they would
expect.

On Mon, Apr 25, 2016 at 8:33 AM, Maciej Szulik notifications@github.com
wrote:

@thockin https://github.com/thockin job controller's role is to
execute
your assignment successfully with the given parameters. That kubectl run
you've mentioned, creates a default job with completions=1, which means
the controller will try as hard as possible to actually have that one
successful execution. If, on the other hand, you're still interested in
actually running a pod --generator=run-pod/v1 is what you're looking for.
I hope that answers your question.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
#24533 (comment)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#24533 (comment)

@thockin
Copy link
Member Author

thockin commented Apr 25, 2016

I think that is what I arrived at, too.

On Mon, Apr 25, 2016 at 2:24 PM, Eric Tune notifications@github.com wrote:

So, I think tim would be happy with the following:

kubectl run --restart=Always # creates rc
kubectl run --restart=Never # creates bare pod with restart=never
kubectl run --restart=OnFailure # creates a Job.

I would approve of that.

On Mon, Apr 25, 2016 at 12:46 PM, Tim Hockin notifications@github.com
wrote:

Maciej,

Yeah, I know that, but I'm arguing it's wrong. We already use --restart
as
a switch between Deployment and Job. I think --restart=Never (to kubectl run, not to a Job yaml) should switch to run-pod/v1 automatically. The
user's intesntions are pretty clear, and a Job is NOT what they would
expect.

On Mon, Apr 25, 2016 at 8:33 AM, Maciej Szulik <notifications@github.com

wrote:

@thockin https://github.com/thockin job controller's role is to
execute
your assignment successfully with the given parameters. That kubectl
run
you've mentioned, creates a default job with completions=1, which means
the controller will try as hard as possible to actually have that one
successful execution. If, on the other hand, you're still interested in
actually running a pod --generator=run-pod/v1 is what you're looking
for.
I hope that answers your question.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<

#24533 (comment)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
#24533 (comment)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#24533 (comment)

@janetkuo
Copy link
Member

janetkuo commented Apr 25, 2016

kubectl run --restart=Always #  creates rc
kubectl run --restart=Never #  creates bare pod with restart=never
kubectl run --restart=OnFailure # creates a Job.

kubectl run --restart=Always should create a Deployment instead of RC.

@thockin
Copy link
Member Author

thockin commented Apr 25, 2016

details details :)

On Mon, Apr 25, 2016 at 2:39 PM, Janet Kuo notifications@github.com wrote:

kubectl run --restart=Always should create a Deployment instead of RC.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#24533 (comment)

@soltysh
Copy link
Contributor

soltysh commented Apr 26, 2016

kubectl run --restart=Always should create a Deployment instead of RC

Actually that depends which cluster version you have, prior to 1.2 it does create RC 😉

I'm totally OK with the proposed solution, but I'd like to hear from @bgrant0607 as well. I think (but it might be my memory failing me) that he wanted kubectl run not to create the k8s primitives, such as pod, for example. Additionally that introduces yet another compatibility change vs prior kubectl versions, are we ok with that?

@bgrant0607
Copy link
Member

@soltysh @erictune What does restartPolicy=Never mean, then, in the context of a Job? Just don't restart locally under Kubelet? If so, why is that useful?

@bgrant0607
Copy link
Member

Note that --replicas=N should be supported, so it would be N pods if we were to generate pods.

@bgrant0607
Copy link
Member

I would rather not change kubectl run behavior again. I have to agree the current run behavior is unintuitive, but apparently so is the current Job behavior.

@soltysh
Copy link
Contributor

soltysh commented Apr 26, 2016

@soltysh @erictune What does restartPolicy=Never mean, then, in the context of a Job? Just don't restart locally under Kubelet? If so, why is that useful?

The initial implementation of a job worked that way, but then I was pointed out we should only pass that to Kubelet and not look at it in the controller. Which, by its design, will always try to reach specified completions. The only noticeable difference between Never and OnFailure in job is that the former shows number of failures the job had, whereas the latter does not.
Having said that I don't mind changing the controller to the initial idea, if that's approved.

@bgrant0607
Copy link
Member

Well, Job is v1, so I don't think we can change the behavior without adding another knob.

@bgrant0607
Copy link
Member

Given that restart=Never behavior for kubectl run is currently useless and confusing, I guess the least-bad option would be to produce N pods for now.

@bgrant0607
Copy link
Member

cc @kubernetes/kubectl

@bgrant0607 bgrant0607 added this to the v1.3 milestone Apr 27, 2016
@0xmichalis
Copy link
Contributor

Agreed with @erictune / @janetkuo

kubectl run --restart=Always # creates a Deployment
kubectl run --restart=Never # creates bare pod
kubectl run --restart=OnFailure # creates a Job.

@adohe-zz
Copy link

Agreed with @erictune / @janetkuo

kubectl run --restart=Always # creates a Deployment
kubectl run --restart=Never # creates bare pod
kubectl run --restart=OnFailure # creates a Job.

also need to consider --replicas=N

@soltysh
Copy link
Contributor

soltysh commented Apr 27, 2016

Well, Job is v1, so I don't think we can change the behavior without adding another knob.

@bgrant0607 the other option is we could do conditionally only for newer jobs (iow. from batch/v2alpha1 and old would support that with a knob, probably.

@erictune
Copy link
Member

@bgrant0607 restart=Never could be useful if you are running an image that for some reason expects a clean EmptyDir every time. Maybe it writes temporary files, and would be confused if it crashes and then finds the temporary files in place? Maybe? This is a stretch.

@erictune
Copy link
Member

I have had people ask for Pods that run at most once, rather than at-least-once.

@0xmichalis
Copy link
Contributor

PDS will start counting only upon getting some errors

PDS, once specified, is counting always and starts from the last time there was any progress (scaled up or down pods during a rollout).

@bgrant0607
Copy link
Member

Infant mortality detection: #18568

@bgrant0607
Copy link
Member

activeDeadlineSeconds was originally intended to eventually terminate failing Jobs. Unlike with Deployment, a Job seems less likely to start failing after making some progress.

Ok, let's have kubectl run --restart=Never create N pods.

Let's suggest job-level activeDeadlineSeconds to users to ensure jobs don't crashloop forever.

I propose we add infant-mortality detection to Job (eventually), with no knob, and backoff in the event of failures in the middle of Job execution (#2529). Unlike controllers for continuously running applications (Deployment, ReplicaSet, ReplicationController, DaemonSet, PetSet), Jobs don't have the expectation of existing forever. If a Job doesn't complete, fail-fast with notification may be better.

@soltysh
Copy link
Contributor

soltysh commented May 6, 2016

Ok, let's have kubectl run --restart=Never create N pods.

Fix in #25253. Although our current generator allows only having 1 pod. So additional changes might be required. But I'd propose to do it afterwards.

Infant mortality detection for Job moved to: #25254

@thockin
Copy link
Member Author

thockin commented May 6, 2016

having just one pod seems like a non-starter, unless we demand that
--replicas == 1 when restart=Never

On Fri, May 6, 2016 at 2:45 AM, Maciej Szulik notifications@github.com
wrote:

Ok, let's have kubectl run --restart=Never create N pods.

Fix in #25253 #25253.
Although our current generator allows only having 1 pod. So additional
changes might be required. But I'd propose to do it afterwards.

Infant mortality detection for Job moved to: #25254
#25254


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#24533 (comment)

@soltysh
Copy link
Contributor

soltysh commented May 6, 2016

@thockin yes it's required for everything other than RestartPolicyAlways.

@erictune
Copy link
Member

We should, by default, retry a job if, the node was rebooted, or crashed, or the node OOM'ed or the Job was OOM-killed, while still under its memory limit, or otherwise evicted by the system.

If the job has a rare race condition and crashes with a segfault, it is worth restarting.

If the job has an external dependency, like some remote service that it needs to talk to, and it crashes when it can't talk to that service, but that remote service will be repaired by someone else pretty soon, then it is worth retrying the job with exponential backoff maybe.

If the job hits a peak of resource usage after some time, and hits its specified memory limit, then it is debatable whether we should restart it.

  • if the Pod behavior is deterministic and future going to fail again, then no point in restarting it.
  • if in the future, we have a vertical autosizer, then that autosizer could learn the new pod requirements from the failure and adjust the limits (hypothetical at this point, but I could see us doing this).
    • if the Pod behavior is non-deterministic (like when it takes work from a work queue, and might not get the same work units next time) then it is worth retrying.

Most of the use cases I can think of suggest retrying, at least without further information.

@bgrant0607
Copy link
Member

Flakes aside, CI (build/test) failures are typically going to be deterministic. Interactive pods should also only run once, in general.

@sttts
Copy link
Contributor

sttts commented May 31, 2016

While #25253 fixes the issue @thockin brought up here, it still feels strange that the pod of kubectl run --restart=Never -it <podname> is not deleted after termination. What one realy wants IMO is a behavior like in docker run -it <imagename>, i.e. pods which are more ephemeral and go away when the kubectl terminal connection is closed.

k8s-github-robot pushed a commit that referenced this issue May 31, 2016
Automatic merge from submit-queue

kubectl run --restart=Never creates pods

Fixes #24533.

@bgrant0607 @janetkuo ptal
/fyi @thockin

```release-note
* kubectl run --restart=Never creates pods
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
@0xmichalis
Copy link
Contributor

While #25253 fixes the issue @thockin brought up here, it still feels strange that the pod of kubectl run --restart=Never -it is not deleted after termination. What one realy wants IMO is a behavior like in docker run -it , i.e. pods which are more ephemeral and go away when the kubectl terminal connection is closed.

I think I would want the pod around in case of failure. Agreed we should remove it in case of a clean exit.

@soltysh
Copy link
Contributor

soltysh commented May 31, 2016

@stts but after doing docker run ... you still have the docker container there, do docker ps -a to see it. You're talking about docker run --rm ... here, which is handled by the GC in k8s.

@sttts
Copy link
Contributor

sttts commented Jun 1, 2016

@soltysh it would be nice to have a way to use random pod names plus the --rm behavor you mention. When does the GC kick in and in which component does it run? In my tests yesterday I saw pods 20 minutes old.

@sttts
Copy link
Contributor

sttts commented Jun 1, 2016

Answering my own question, https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/garbage-collection.md is the midterm answer (currently alpha). It will collect pods after 12500 terminated pods exist, a pretty high number and it is not deterministic for a user when his pods are deleted. Hence, something like this without an explicit name would be great:

$ kubectl run -it --image=busybox --restart=Never -- /bin/true

k8s-github-robot pushed a commit that referenced this issue Aug 21, 2016
Automatic merge from submit-queue

Return container command exit codes in kubectl run/exec

Fixes #26424
Based on #25273.

TODO:
- [x] add e2e tests
- [x] investigate `kubectl run` exit code for `--restart=Never` (compare issue #24533 and PR #25253)
- [x] document exit codes
@nisargam
Copy link

I have started a pod with

kubectl run busybox --image=busybox --restart=Never --tty -i generator=run-pod/v1

I tried to delete this pod, but it never gets deleted. How can I delete this pod ?

kubectl delete pods busybox-na3tm

pod "busybox-na3tm" deleted

kubectl get pods

NAME READY STATUS RESTARTS AGE
busybox-vlzh3 0/1 ContainerCreating 0 14s

kubectl delete pod busybox-vlzh3 --grace-period=0

kubectl get all -o name | wc -l

2599

kubectl delete pods --all

pod "busybox-131cq" deleted
pod "busybox-136x9" deleted
pod "busybox-13f8a" deleted
pod "busybox-13svg" deleted
pod "busybox-1465m" deleted
pod "busybox-14uz1" deleted
pod "busybox-15raj" deleted
pod "busybox-160to" deleted
pod "busybox-16191" deleted

kubectl get pods

NAME READY STATUS RESTARTS AGE
busybox-09ush 0/1 ContainerCreating 0 1s

kubectl get pods --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox-c9rnx 0/1 RunContainerError 0 23s

tail /var/log/messegae

Nov 19 00:18:10 masterserver1 kube-controller-manager: E1119 00:18:10.013599 741 controller.go:409] Failed to update job busybox, requeuing. Error: jobs.extensions "busybox" cannot be updated: the object has been modified; please apply your changes to the latest version and try again

kubectl version

Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5", GitTreeState:"clean"}

@thockin
Copy link
Member Author

thockin commented Nov 19, 2016

We also have a --rm flag ...

$ kubectl run foobar$RANDOM --restart=Never --rm -i --image=busybox date
Waiting for pod default/foobar23887 to be running, status is Pending, pod
ready: false
Sat Nov 19 01:23:22 UTC 2016
pod "foobar23887" deleted

On Fri, Nov 18, 2016 at 4:38 PM, nisargam notifications@github.com wrote:

I have started a pod with
kubectl run busybox --image=busybox --restart=Never --tty -i
generator=run-pod/v1

I tried to delete this pod, but it never gets deleted. How can I delete
this pod ?
kubectl delete pods busybox-na3tm

pod "busybox-na3tm" deleted
kubectl get pods

NAME READY STATUS RESTARTS AGE
busybox-vlzh3 0/1 ContainerCreating 0 14s
kubectl delete pod busybox-vlzh3 --grace-period=0 kubectl get all -o name
| wc -l

2599
kubectl delete pods --all

pod "busybox-131cq" deleted
pod "busybox-136x9" deleted
pod "busybox-13f8a" deleted
pod "busybox-13svg" deleted
pod "busybox-1465m" deleted
pod "busybox-14uz1" deleted
pod "busybox-15raj" deleted
pod "busybox-160to" deleted
pod "busybox-16191" deleted
kubectl get pods

NAME READY STATUS RESTARTS AGE
busybox-09ush 0/1 ContainerCreating 0 1s
kubectl get pods --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox-c9rnx 0/1 RunContainerError 0 23s
tail /var/log/messegae

Nov 19 00:18:10 masterserver1 kube-controller-manager: E1119
00:18:10.013599 741 controller.go:409] Failed to update job busybox,
requeuing. Error: jobs.extensions "busybox" cannot be updated: the object
has been modified; please apply your changes to the latest version and try
again
kubectl version

Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0",
GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5",
GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0",
GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5",
GitTreeState:"clean"}


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24533 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVLQ1iEviNBHF3I4iiET8i7OxPxwlks5q_kUEgaJpZM4IL12Z
.

@soltysh
Copy link
Contributor

soltysh commented Nov 21, 2016

From the name busybox-xxxx it looks like your command did not create a pod but rather a ReplicationController or a Deployment, otherwise the pod name would be just busybox. Looking further at your command you're missing -- before generator flag:

kubectl run busybox --image=busybox --restart=Never --tty -i --generator=run-pod/v1

Now, when executing it pay attention what the generator says it created, if it says replication controller/set or a deployment you need to delete that and that will remove the pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubectl priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests