-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job: Support for the SuccessPolicy #123412
Job: Support for the SuccessPolicy #123412
Conversation
78c556b
to
d6e5e67
Compare
/triage accepted |
/cc |
454be08
to
b6f1b25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job controller changes generally lgtm, left mostly readability comments.
One place I'm not yet understanding the change is in this comment: https://github.com/kubernetes/kubernetes/pull/123412/files#r1516065458. A comment would do, or drop the check if not needed.
@atiratree @mimowo I addressed all your comments. PTAL, thanks! |
I'm still working on addressing Aldo's comments. |
/test pull-kubernetes-node-e2e-containerd |
803aa3f
to
33155da
Compare
The changes in |
@alculquicondor I addressed all comments! PTAL, thanks! |
@@ -875,6 +878,11 @@ func (jm *Controller) syncJob(ctx context.Context, key string) (rErr error) { | |||
} | |||
jobCtx.podsWithDelayedDeletionPerIndex = getPodsWithDelayedDeletionPerIndex(logger, jobCtx) | |||
} | |||
if jobCtx.finishedCondition == nil && hasSuccessCriteriaMetCondition(jobCtx.job) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is too different from my last review or I just missed it.
We had agreed in the KEP that if, in a given reconcile loop, a Job qualifies for both SuccessCriteriaMet and FailureTarget (or Backoff or any other error), the success would win. It doesn't look like it, because we are checking matchSuccessPolicy
last.
This is somewhat important because the job controller might be busy at the time when a Pod finishes successfully, so by the time it has a chance to run, multiple other Pods might have failed. But again, the success policy should win.
But I'll consider the above as part of #117303 and we can fix it before the test freeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is too different from my last review or I just missed it.
We had agreed in the KEP that if, in a given reconcile loop, a Job qualifies for both SuccessCriteriaMet and FailureTarget (or Backoff or any other error), the success would win. It doesn't look like it, because we are checking matchSuccessPolicy last.
I changed this based on #123412 (comment)
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
aa0ef26
to
e216742
Compare
/lgtm |
LGTM label has been added. Git tree hash: fb204ff04b33a16575ea7f27ca616e95d1942860
|
Sure. |
This is an irrelevant error. |
As per Exception approval /milestone v1.30 |
This is an irrelevant error. |
/test pull-kubernetes-conformance-kind-ipv6-parallel |
Let me check it. |
This already has been tracked by #123799 /test pull-kubernetes-e2e-gce |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
I implemented the following items to support the
JobSuccessPolicy
:Please see more details in the KEP.
Which issue(s) this PR fixes:
Tracking issue kubernetes/enhancements#3998
Special notes for your reviewer:
New Condition Name
ref: #123412 (comment)
Votes:
SuccessCriteriaMet
: @atiratree @deads2kSuccessPolicyMet
: @atiratree @deads2kSuccessTarget
:Neutral: @mimowo
Difference from KEP
Most of the implementation completely follows specifications decided in KEP.
There are 3 differences from the KEP.
.spec.successPolicy.criteria[*].succeededCount
. I replaced theint
withint32
because we used to select theint32
for the number of indexes like the.spec.completions
.RejectSuccessCriteriaMetWithFailedCondition
: Reject transitions of conditionsFailed
<->SuccessCriteriaMet
RejectSuccessCriteriaMetWithFailureTargetCondition
: Reject transitions of conditionsFailureTarget
<->SuccessCriteriaMet
RejectSuccessCriteriaMetForAlreadyCompleteJob
: Reject transitions of conditionsComplete
->SuccessCriteria
RejectSuccessCriteriaMetForNonIndexedJob
: Reject the addition ofSuccessCriteria
for NonIndexed Job.SuccessPolicyCriteria (JSON=criteria)
withSuccessPolicyRule (JSON=rules)
based on Job: Support for the SuccessPolicy #123412 (comment).Additionally, this PR depends on #123273 in the
validateIndexesFormat
:kubernetes/pkg/apis/batch/validation/validation.go
Lines 445 to 447 in 134a16f
,
IsJobComplete
,IsJobFailed
,IsConditionTrue
:kubernetes/pkg/apis/batch/validation/validation.go
Lines 790 to 805 in 1bff570
, and
JobStatusValidationOptions
:kubernetes/pkg/apis/batch/validation/validation.go
Line 855 in 1bff570
So, we need to merge #123273, first.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: