Spinning up of large number of erroneous pods #64

swatisehgal · 2017-02-02T19:10:04Z

In case of some configuration errors e.g. SSL misconfiguration, Kubernetes jobs fails to successfully complete. It keeps on creating new pods and finally the ends up having ~5000 pods with Error state.
It should stop or timeout. A simple fix could be setting the RestartPolicy equal to OnFailure or Never. Are there any other recommendation of gracefully handling this error

balajismaniam · 2017-02-23T18:39:05Z

@swatisehgal Sorry for the delay in response. I don't think setting restartPolicy=Never will resolve this issue. We set job.spec.completions to a positive value. As a result, if a pod spawned by the job fails, the pod will be restarted even if restartPolicy = Never. This is my expectation but I haven't tested it yet. Also, see https://kubernetes.io/docs/user-guide/jobs/#handling-pod-and-container-failures.
Did you get a chance to test if setting restartPolicy=Never resolves this issue? I will check if there is any good way to handle this.
Also, we expect the Kubernetes cluster to be configured properly before using NFD. This is only an issue if containers or pods fail due to misconfiguration.

okartau · 2018-02-13T12:11:24Z

Does backoff-policy
functionality which was added in Sep-2017, cover this issue?
With backoffLimit set to low value, there should not be large numbers of failed repeated failed pods.
As backoffLimit default value is 6, the number of restarts should be limited to 6 in default config.
Can that be verified with orig.reporter case, using some recent k8s version?

marquiz · 2018-07-11T09:16:03Z

@swatisehgal: any comments on this? I think the job BackoffLimit introduced in Kubernetes v1.8 should mitigate this as @okartau described.

I would be inclined into closing this issue

marquiz · 2018-08-17T09:17:52Z

Closing this for now. Please re-open if you still see this issue

marquiz closed this as completed Aug 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spinning up of large number of erroneous pods #64

Spinning up of large number of erroneous pods #64

swatisehgal commented Feb 2, 2017

balajismaniam commented Feb 23, 2017

okartau commented Feb 13, 2018 •

edited

Loading

marquiz commented Jul 11, 2018

marquiz commented Aug 17, 2018

Spinning up of large number of erroneous pods #64

Spinning up of large number of erroneous pods #64

Comments

swatisehgal commented Feb 2, 2017

balajismaniam commented Feb 23, 2017

okartau commented Feb 13, 2018 • edited Loading

marquiz commented Jul 11, 2018

marquiz commented Aug 17, 2018

okartau commented Feb 13, 2018 •

edited

Loading