-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UX] Add in-progress spot jobs to sky status
#1691
Conversation
Thanks @Michaelvll for this UX improvement! I will try to get to this today or tomorrow. Quick question: many users will have tens of jobs - mix of in progress and finished - in the spot queue. What would the output looks like (will it put normal clusters out of the current terminal page)? |
Good question! It currently shows all the in-progress spot jobs without the finished ones. I can make it only show the first 5 in-progress spot jobs, and show all the spot jobs when |
Another idea is to not show any jobs, but show the total number of
in-progress jobs + a hint asking users to check "spot queue". Wdyt?
…On Mon, Feb 13, 2023 at 10:51 AM Zhanghao Wu ***@***.***> wrote:
Thanks @Michaelvll <https://github.com/Michaelvll> for this UX
improvement! I will try to get to this today or tomorrow. Quick question:
many users will have tens of jobs - mix of in progress and finished - in
the spot queue. What would the output looks like (will it put normal
clusters out of the current terminal page)?
Good question! It currently shows all the in-progress spot jobs without
the finished ones. I can make it only show the first 5 in-progress spot
jobs, and show all the spot jobs when -a is specified. Wdyt?
—
Reply to this email directly, view it on GitHub
<#1691 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEQWHXYWXLNJXFPJZKKUFLWXJ7DZANCNFSM6AAAAAAUYFTVGI>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
I think it might be better to show the actual spot jobs for the user's convenience. We can still show the |
According to the feedback, I reordered hints for |
…nto spot-queue-in-status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll, let's go with this design thanks to user feedback. Left some UX comments.
» sst
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
sky-spot-controller-8a3968f2 16 hrs ago 1x GCP(n2-standard-4, disk_size=50) UP 10m sky spot queue -r
Managed Spot Jobs
* 1 cluster has auto{stop,down} scheduled. Refresh statuses with: sky status --refresh
or with a refresh
» sst -r
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
sky-spot-controller-8a3968f2 16 hrs ago 1x GCP(n2-standard-4, disk_size=50) STOPPED 10m sky spot queue -r
Managed Spot Jobs
* 1 cluster has auto{stop,down} scheduled. Refresh statuses with: sky status --refresh
Showing the header alone "Managed Spot Jobs" seems surprising (leaving it hanging). Shall we print something like "No in-progress jobs found." immediately under the header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it out, works well! Some comments, mostly UX nits.
|
…nto spot-queue-in-status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll for this. Huge long-awaited UX improvement. LGTM.
Running a spot launch
with a stopped controller, there seems to be a blank line:
» sst
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
sky-spot-controller-8a3968f2 a few secs ago 1x GCP(n2-standard-4, disk_size=50) INIT - sky spot launch -dy --cloud...
Managed spot jobs
Controller's latest status is detected to be INIT; jobs will not be shown until it becomes UP.
f'* {job_info}To see all jobs: {colorama.Style.BRIGHT}' | ||
f'sky spot queue{colorama.Style.RESET_ALL}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not feeling strongly.
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
…-experiments into spot-queue-in-status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sky/backends/backend_utils.py
Outdated
@@ -1958,11 +1958,10 @@ def _refresh_cluster_record( | |||
|
|||
Args: | |||
cluster_name: The name of the cluster. | |||
force_refresh: refresh the cluster status as long as the cluster exists. | |||
If False, the cluster status will only be refreshed if the cluster: | |||
force_refresh: if True, refresh the cluster status even if it may be skipped. Otherwise (the default), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format (line too long)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good catch! Weirdly, our format script and the GitHub CI failed to catch it. Fixed. Thanks!
# controller is INIT/UP. This optimization avoids unnecessary costly | ||
# refresh when the controller is already stopped. This optimization is | ||
# based on the assumption that the user will not start the controller | ||
# manually from the cloud console. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems safe to me after reading all usage in core
and cli
. Even if users manually start the controller in the console, this indicates it was previously autostopped, which means job have finished, so those usage seem safe.
* Show spot queue in sky status * format * Handle missed error * Do not force refresh for spot controller * use multiprocesssing * fix deprecation error * fix * fix * Fix spot queue problem * Add --no-show-spot-queue * format * Only show first 5 in-progress jobs * fix * reorder * Do not show the spot queue if the controller is stopped * format * fix * Rearange * partially address comments * address comments * fix * longer wait for spot job * fix * Fix message * format * fix message * revert changes for catalog fetcher * revert * partially address * Use the new fetchers * address comments * force_refresh * Adopt threading instead * no need to wait * address comments * Update sky/backends/backend_utils.py Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update sky/cli.py Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update sky/cli.py Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * rename option * fix argument * format * add comment --------- Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Fixes #1665.
In this PR, we also show in-progress spot jobs in the
sky status
output.We also changed the implementation of
fetch_aws.py
/fetch_azure.py
to get rid of ray, because:TODO:
sky status
aftersky launch
, and the current change will add overheads to a normalsky launch
. We may need to add an additional option tosky status
to avoid querying the spot jobs.Overhead comparison:
python -c 'import time; from sky.clouds.service_catalog.data_fetchers import fetch_aws; start=time.time(); fetch_aws.fetch_availability_zone_mappings(); end = time.time(); print(end-start)'
(used in aws_catalog.py)new: 2.44 seconds, master: 6.50 seconds
time python -m sky.clouds.service_catalog.data_fetchers.fetch_aws --no-az-mappings --check-all-regions-enabled-for-account
new: 43s, master: 59s
time python -m sky.clouds.service_catalog.data_fetchers.fetch_azure
new: 1m9s, master: 1m28s
Tested (run the relevant ones):
sky status
when the controller is stopped. (6.4s)sky status -r
when the controller is stopped. (37s and spot job query does not introduce more cost)sky status
when the controller is INIT.sky status
when the controller is UP.python -m sky.clouds.service_catalog.data_fetchers.fetch_aws --no-az-mappings --check-all-regions-enabled-for-account
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure