-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix operator: correctly set head pod service account #276
Conversation
// If no service account is specified in the RayCluster, | ||
// the head pod's service account should default to the RayCluster name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jeffwan lmk if this is the correct behavior
Do you know why the compat tests are failing? |
I deployed this change to my K8s cluster and see this in the operator logs. I think operator's inability to update the RayCluster instance's status field is unrelated to my change. I get the same error with kuberay/operator:e0de2bd
|
From the CI logs, looks like operator failed to create the resource so assert command failed (failed to find resources). You shared logs looks good to me. Seems the worker have been created but not sure whether they are in expected status. If you can not figure it out, I will dump to my local env and give it a try |
yes this would be great! I retriggered the tests to see if it was a fluke or if they still fail with same error. |
The failure is caused by my change. Here's a no-op change that passes. I'm not sure why this line breaks tests though. |
sorry for late. I didn't get a chance to help check it in past few days. @davidxia Could you help rebase the changes and I can check it again |
I manually apply your changes to the master and notice the following error. I think that's the reason it can not launch the cluster and pass the test. We should control the scope to autoscaler enabled cases only.
|
In order to test your changes end to end locally. You can start docker for desktop or kind, then following steps
I will close #277 then |
Currently the patch in ray-project#259 isn't working because a later line overwrites with the original bad value. This change sets the head pod's service account to the one configured in the RayCluster.
@Jeffwan thanks! All tests pass now. |
instead of always using RayCluster name. Default to RayCluster name if the ServiceAccount doesn't exist. This is a follow up fix after ray-project#259 and ray-project#276. The use case is when using the RayCluster autoscaler. The autoscaler will use the head group's K8s ServiceAccount to make K8s API requests. Always using the RayCluster name as the RoleBinding's subject name won't be correct and will result in the autoscaler sidecar container not being authorized by the K8s API.
instead of always using RayCluster name. Default to RayCluster name if the ServiceAccount doesn't exist. This is a follow up fix after #259 and #276. The use case is when using the RayCluster autoscaler. The autoscaler will use the head group's K8s ServiceAccount to make K8s API requests. Always using the RayCluster name as the RoleBinding's subject name won't be correct and will result in the autoscaler sidecar container not being authorized by the K8s API.
Currently the patch in ray-project#259 isn't working because a later line overwrites with the original bad value. This change sets the head pod's service account to the one configured in the RayCluster.
…roject#315) instead of always using RayCluster name. Default to RayCluster name if the ServiceAccount doesn't exist. This is a follow up fix after ray-project#259 and ray-project#276. The use case is when using the RayCluster autoscaler. The autoscaler will use the head group's K8s ServiceAccount to make K8s API requests. Always using the RayCluster name as the RoleBinding's subject name won't be correct and will result in the autoscaler sidecar container not being authorized by the K8s API.
Why are these changes needed?
Currently the patch in #259 isn't working because a later line overwrites with
the original bad value. This change sets the head pod's service account
to the one configured in the RayCluster.
Checks