-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: requeue when expire time is not up yet #1614
Conversation
Can you add test as well? |
Pull Request Test Coverage Report for Build 2506345128
💛 - Coveralls |
Thanks for reminding, I have added a unit test. BTW, I tested in my own minikube cluster with a TTL second and it works as expected. |
/assign @zw0610 |
@@ -162,6 +162,15 @@ func (r *PyTorchJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) | |||
return ctrl.Result{}, err | |||
} | |||
|
|||
t, err := util.DurationUntilExpireTime(&pytorchjob.Spec.RunPolicy, pytorchjob.Status) | |||
if err != nil { | |||
logrus.Warnf("Reconcile PyTorchJob error %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using mix of logger and logrus in code. Need to cleanup in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, exactly.
/lgtm |
Can you add same tests with reconcile loops as well? Ref: https://github.com/kubeflow/training-operator/blob/master/pkg/controller.v1/tensorflow/job_test.go We have test functions defined in https://github.com/kubeflow/training-operator/blob/master/pkg/common/util/v1/testutil/tfjob.go#L38 which can be used |
Signed-off-by: Garrybest <garrybest@foxmail.com>
Signed-off-by: Garrybest <garrybest@foxmail.com>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Garrybest, johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Garrybest @zw0610 @johnugeorge Don't you think this implementation is so strange? |
It is just like a "patch" training-operator/pkg/controller.v1/pytorch/pytorchjob_controller.go Lines 349 to 352 in afb652f
|
Signed-off-by: Garrybest garrybest@foxmail.com
What this PR does / why we need it:
When reconciling jobs, if expire time is not up yet, the
common.JobController
would requeue the object into aFakeWorkQueue
, which does not work in Reconcilers.I would like to check TTL after finished and requeue the object until expire time is up if necessary.
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #1533
Checklist: