-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Allow zero replica for workers for Helm #968
[Bug] Allow zero replica for workers for Helm #968
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! I am wondering is there any difference for your use case between disabled: true
and replicas: 0
?
as I understand, |
I think setting |
@kevin85421 Could you make a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test this PR manually using this gist.
# Step 0: Replace values.yaml with the gist
# (path: helm-chart/ray-cluster)
helm install ray-cluster .
# Step 1: Try to scale up the cluster
export HEAD_POD=$(kubectl get pods -o custom-columns=POD:metadata.name | grep raycluster-autoscaler-head)
kubectl exec $HEAD_POD -it -c ray-head -- python -c "import ray;ray.init();ray.autoscaler.sdk.request_resources(num_cpus=4)"
# Step 2: The RayCluster will scale from 0 worker to 3 workers.
@kevin85421 is this available on 0.5.2? |
I see it's only on 0.6.0. Is it stable or still WIP? |
Allow zero replica for workers for Helm
Why are these changes needed?
We are currently using Ray for computing heavily tasks on GKE. When initializing, it spawns a worker each worker group. Then, it triggers GKE scale up node. It's money cost.
This happens because ternary function in template file.
{{ 0 | 1 }} = 1
kuberay/helm-chart/ray-cluster/templates/raycluster-cluster.yaml
Line 91 in 87dde22
kuberay/helm-chart/ray-cluster/templates/raycluster-cluster.yaml
Line 157 in 87dde22
workaround by setting default replica to zero.
Related issue number
Open #965
Checks