-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job: Failure Threshold #41451
Job: Failure Threshold #41451
Conversation
Hi @nickschuch. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED The following people have approved this PR: nickschuch Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
@nickschuch PR needs rebase |
@kubernetes/sig-apps-feature-requests @kubernetes/sig-apps-pr-reviews |
@erictune in case there was an issue that discussed this previously (I couldn't find one from our original job threads, but my memory isn't what it used to be) |
@smarterclayton you're asking about this #30243 |
Thank you. Agree with Eric's concern that "failures=1" is deceptive and we
do not guarantee that. That's my primary concern with this, that it is
obvious that these are best effort.
On Feb 15, 2017, at 6:20 AM, Maciej Szulik <notifications@github.com> wrote:
@smarterclayton <https://github.com/smarterclayton> you're asking about
this #30243 <#30243>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#41451 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pykp9DGnqeCXsAid51D_HBbiO8lGks5rct9vgaJpZM4MBNa9>
.
|
I haven't looked into this PR, yet. I'm reserving tomorrow's morning for a thorough review. |
// to finish once the threshold is met. | ||
// More info: http://kubernetes.io/docs/user-guide/jobs | ||
// +optional | ||
FailureThreshold *int32 `json:"failureThreshold,omitempty" protobuf:"varint,2,opt,name=failureThreshold"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per this and this I'm imaging this allowing you set different policies, to start with:
- count failures from beginning, iow. I don't care when the failures started, but as soon as they hit X we can fail a job
- count failures from certain point (probably configurable, eg. X successful), and Y failures after that fails the job
- be able to specify whether to clear the counter if successful run happens in the mean time
@erictune we've never actually specified how this could be implemented, do you have something on my your mind for this or what I've mentioned could be a good place to start with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a bit more detail around this, Im happy to discuss this in any avenue, I thought we were waiting on some input from @erictune .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a discussion about possible solutions in the original issue. This is API change and it requires a bit more thought before we do so.
@nickschuch let's discuss possible approaches to this in #30243, where others already spoke up. Once we have a general agreement we can proceed with actual implementation. The current solution with just a threshold is definitely not sufficient. |
Discussing the proper solution in kubernetes/community#583, I'm closing this one for now. Please open a new one with what we agree on in the proposal. |
What this PR does / why we need it:
Jobs current implementation will keep recreating a pod as it fails to ensure that the job runs to completion.
This isn't always the ideal, sometimes we want our build to fail and be recreated at a much later date (or not at all) by a higher level API eg. CronJobs.
My solution, implement a "Failure Threshold".
This means that jobs will be allowed to have X number of failures before the Job is stopped entirely. This becomes really handy when using the CronJobs API and you know if the job fails you will get a new job at a later date.
Which issue this PR fixes:
kubectl run --restart=Never
restarts (creates a Job) #24533Special notes for your reviewer:
A Job with a Failure Threshold can be created with the following:
Release note: