Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve RayJob controller quality to alpha #398

Merged
merged 1 commit into from
Jul 26, 2022

Conversation

Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented Jul 21, 2022

Why are these changes needed?

Fix RayJob controller issues and few enhancements.

  1. Implement spec.shutdownAfterJobFinishes, delete the cluster once the job finishes if this field is set
  2. Add status new field JobDeploymentStatusComplete to indicate the RayJob is complete
  3. Add status field Message, StartTime, EndTime to expose more job status
  4. Improve logs usage in the operators and make them look consistent
  5. Optimize requeueAfter for some time consuming operators like waiting for dashboard ready (container takes time to start) etc.

Related issue number

#393

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@Jeffwan Jeffwan force-pushed the jiaxin/rayjob_improvement branch 4 times, most recently from 7869665 to 1e4a9b6 Compare July 23, 2022 22:06
@Jeffwan Jeffwan changed the title WIP: Improve RayJob controller logics Improve RayJob controller quality to alpha Jul 23, 2022
@Jeffwan Jeffwan force-pushed the jiaxin/rayjob_improvement branch 4 times, most recently from 5741479 to 86b1d53 Compare July 24, 2022 01:07
@Jeffwan Jeffwan requested a review from brucez-anyscale July 24, 2022 04:03
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Jul 24, 2022

/cc @harryge00 Please have a check. I make some improvements on the RayJob

1. Implement `spec.shutdownAfterJobFinishes`, delete the cluster once the job finishes if this field is set
2. Add status new field `JobDeploymentStatusComplete` to indicate the RayJob is complete
3. Add status field `Message`, `StartTime`, `EndTime` to expose more job status
4. Improve logs usage in the operators and make them look consistent
5. Optimize requeueAfter for some time consuming operators like waiting for dashboard ready (container takes time to start) etc.
@Jeffwan Jeffwan force-pushed the jiaxin/rayjob_improvement branch from 86b1d53 to f93158d Compare July 25, 2022 04:55
@Jeffwan Jeffwan merged commit 0c98cf2 into ray-project:master Jul 26, 2022
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Jul 26, 2022

Let me merge the PR. I test it in our env and currently it's working fine. If you guys have additional comments. Let me know and I will address it in separate PR

@Jeffwan Jeffwan deleted the jiaxin/rayjob_improvement branch July 26, 2022 05:02
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
1. Implement `spec.shutdownAfterJobFinishes`, delete the cluster once the job finishes if this field is set
2. Add status new field `JobDeploymentStatusComplete` to indicate the RayJob is complete
3. Add status field `Message`, `StartTime`, `EndTime` to expose more job status
4. Improve logs usage in the operators and make them look consistent
5. Optimize requeueAfter for some time consuming operators like waiting for dashboard ready (container takes time to start) etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants