Skip to content

Commit

Permalink
[Doc] Customize KubeRay container commands (#41651)
Browse files Browse the repository at this point in the history
Update doc for ray-project/kuberay#1704.

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
  • Loading branch information
kevin85421 and architkulkarni authored Dec 8, 2023
1 parent abf6fd2 commit 25d16d3
Showing 1 changed file with 70 additions and 103 deletions.
173 changes: 70 additions & 103 deletions doc/source/cluster/kubernetes/user-guides/pod-command.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,74 @@
(kuberay-pod-command)=

# Specify container commands for Ray head/worker Pods
You can execute commands on the head/worker pods at two timings:

* (1) **Before `ray start`**: As an example, you can set up some environment variables that will be used by `ray start`.
KubeRay generates a `ray start` command for each Ray Pod.
Sometimes, you may want to execute certain commands either before or after the ray start command, or you may wish to define the container's command yourself.
This document shows you how to do that.

* (2) **After `ray start` (RayCluster is ready)**: As an example, you can launch a Ray serve deployment when the RayCluster is ready.
## Part 1: Specify a custom container command, optionally including the generated `ray start` command

## Current KubeRay operator behavior for container commands
* The current behavior for container commands is not finalized, and **may be updated in the future**.
* See [code](https://github.com/ray-project/kuberay/blob/47148921c7d14813aea26a7974abda7cf22bbc52/ray-operator/controllers/ray/common/pod.go#L301-L326) for more details.
Starting with KubeRay v1.1.0, if users add the annotation `ray.io/overwrite-container-cmd: "true"` to a RayCluster, KubeRay respects the container `command` and `args` as provided by the users, without including any generated command, including the `ulimit` and the `ray start` commands, with the latter stored in the environment variable `KUBERAY_GEN_RAY_START_CMD`.

## Timing 1: Before `ray start`
Currently, for timing (1), we can set the container's `Command` and `Args` in RayCluster specification to reach the goal.
```yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
annotations:
# If this annotation is set to "true", KubeRay will respect the container `command` and `args`.
ray.io/overwrite-container-cmd: "true"
...
spec:
headGroupSpec:
rayStartParams: {}
# Pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.8.0
# Because the annotation "ray.io/overwrite-container-cmd" is set to "true",
# KubeRay will overwrite the generated container command with `command` and
# `args` in the following. Hence, you need to specify the `ulimit` command
# by yourself to avoid Ray scalability issues.
command: ["/bin/bash", "-lc", "--"]
# Starting from v1.1.0, KubeRay injects the environment variable `KUBERAY_GEN_RAY_START_CMD`
# into the Ray container. This variable can be used to retrieve the generated Ray start command.
# Note that this environment variable does not include the `ulimit` command.
args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD"]
...
```

The preceding example YAML is a part of [ray-cluster.overwrite-command.yaml](https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray-cluster.overwrite-command.yaml).

* `metadata.annotations.ray.io/overwrite-container-cmd: "true"`: This annotation tells KubeRay to respect the container `command` and `args` as provided by the users, without including any generated command.
Refer to Part 2 for the default behavior if you set the annotation to "false" or don't set it at all.

* `ulimit -n 65536`: This command is necessary to avoid Ray scalability issues caused by running out of file descriptors.
If you don't set the annotation, KubeRay automatically injects the `ulimit` command into the container.

* `$KUBERAY_GEN_RAY_START_CMD`: Starting from KubeRay v1.1.0, KubeRay injects the environment variable `KUBERAY_GEN_RAY_START_CMD` into the Ray container for both head and worker Pods to store the `ray start` command generated by KubeRay.
Note that this environment variable doesn't include the `ulimit` command.
```sh
# Example of the environment variable `KUBERAY_GEN_RAY_START_CMD` in the head Pod.
ray start --head --dashboard-host=0.0.0.0 --num-cpus=1 --block --metrics-export-port=8080 --memory=2147483648
```

The head Pod's `command`/`args` looks like the following:

```yaml
Command:
/bin/bash
-lc
--
Args:
ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD
```
## Part 2: Execute commands before the generated `ray start` command

If you only want to execute commands before the generated command, you don't need to set the annotation `ray.io/overwrite-container-cmd: "true"`.
Some users employ this method to set up environment variables used by `ray start`.

```yaml
# https://github.com/ray-project/kuberay/ray-operator/config/samples/ray-cluster.head-command.yaml
Expand All @@ -23,7 +79,7 @@ Currently, for timing (1), we can set the container's `Command` and `Args` in Ra
spec:
containers:
- name: ray-head
image: rayproject/ray:2.5.0
image: rayproject/ray:2.8.0
resources:
...
ports:
Expand All @@ -33,12 +89,11 @@ Currently, for timing (1), we can set the container's `Command` and `Args` in Ra
args: ["456"]
```
* Ray head Pod
* `spec.containers.0.command` is hardcoded with `["/bin/bash", "-lc", "--"]`.
* `spec.containers.0.args` contains two parts:
* (Part 1) **user-specified command**: A string concatenates `headGroupSpec.template.spec.containers.0.command` from RayCluster and `headGroupSpec.template.spec.containers.0.args` from RayCluster together.
* (Part 2) **ray start command**: The command is created based on `rayStartParams` specified in RayCluster. The command will look like `ulimit -n 65536; ray start ...`.
* To summarize, `spec.containers.0.args` will be `$(user-specified command) && $(ray start command)`.
* `spec.containers.0.command`: KubeRay hard codes `["/bin/bash", "-lc", "--"]` as the container's command.
* `spec.containers.0.args` contains two parts:
* **user-specified command**: A string concatenates `headGroupSpec.template.spec.containers.0.command` and `headGroupSpec.template.spec.containers.0.args` together.
* **ray start command**: KubeRay creates the command based on `rayStartParams` specified in RayCluster. The command looks like `ulimit -n 65536; ray start ...`.
* To summarize, `spec.containers.0.args` is `$(user-specified command) && $(ray start command)`.

* Example
```sh
Expand All @@ -63,91 +118,3 @@ Currently, for timing (1), we can set the container's `Command` and `Args` in Ra
# Args:
# echo 123 456 && ulimit -n 65536; ray start --head --dashboard-host=0.0.0.0 --num-cpus=1 --block --metrics-export-port=8080 --memory=2147483648
```


## Timing 2: After `ray start` (RayCluster is ready)
We have two solutions to execute commands after the RayCluster is ready. The main difference between these two solutions is users can check the logs via `kubectl logs` with Solution 1.

### Solution 1: Container command (Recommended)
As we mentioned in the section "Timing 1: Before `ray start`", user-specified command will be executed before the `ray start` command. Hence, we can execute the `ray_cluster_resources.sh` in background by updating `headGroupSpec.template.spec.containers.0.command` in `ray-cluster.head-command.yaml`.

```yaml
# https://github.com/ray-project/kuberay/ray-operator/config/samples/ray-cluster.head-command.yaml
# Parentheses for the command is required.
command: ["(/home/ray/samples/ray_cluster_resources.sh&)"]
# ray_cluster_resources.sh
apiVersion: v1
kind: ConfigMap
metadata:
name: ray-example
data:
ray_cluster_resources.sh: |
#!/bin/bash
# wait for ray cluster to finish initialization
while true; do
ray health-check 2>/dev/null
if [ "$?" = "0" ]; then
break
else
echo "INFO: waiting for ray head to start"
sleep 1
fi
done
# Print the resources in the ray cluster after the cluster is ready.
python -c "import ray; ray.init(); print(ray.cluster_resources())"
echo "INFO: Print Ray cluster resources"
```

* Example
```sh
# (1) Update `command` to ["(/home/ray/samples/ray_cluster_resources.sh&)"]
# (2) Comment out `postStart` and `args`.
kubectl apply -f ray-cluster.head-command.yaml

# Check ${RAYCLUSTER_HEAD_POD}
kubectl get pod -l ray.io/node-type=head

# Check the logs
kubectl logs ${RAYCLUSTER_HEAD_POD}

# INFO: waiting for ray head to start
# .
# . => Cluster initialization
# .
# 2023-02-16 18:44:43,724 INFO worker.py:1231 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS
# 2023-02-16 18:44:43,724 INFO worker.py:1352 -- Connecting to existing Ray cluster at address: 10.244.0.26:6379...
# 2023-02-16 18:44:43,735 INFO worker.py:1535 -- Connected to Ray cluster. View the dashboard at http://10.244.0.26:8265
# {'object_store_memory': 539679129.0, 'node:10.244.0.26': 1.0, 'CPU': 1.0, 'memory': 2147483648.0}
# INFO: Print Ray cluster resources
```

### Solution 2: postStart hook
```yaml
# https://github.com/ray-project/kuberay/ray-operator/config/samples/ray-cluster.head-command.yaml
lifecycle:
postStart:
exec:
command: ["/bin/sh","-c","/home/ray/samples/ray_cluster_resources.sh"]
```

* We execute the script `ray_cluster_resources.sh` via the postStart hook. Based on [this document](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks), there is no guarantee that the hook will execute before the container ENTRYPOINT. Hence, we need to wait for RayCluster to finish initialization in `ray_cluster_resources.sh`.

* Example
```sh
kubectl apply -f ray-cluster.head-command.yaml
# Check ${RAYCLUSTER_HEAD_POD}
kubectl get pod -l ray.io/node-type=head
# Forward the port of Dashboard
kubectl port-forward --address 0.0.0.0 ${RAYCLUSTER_HEAD_POD} 8265:8265
# Open the browser and check the Dashboard (${YOUR_IP}:8265/#/job).
# You shold see a SUCCEEDED job with the following Entrypoint:
#
# `python -c "import ray; ray.init(); print(ray.cluster_resources())"`
```

0 comments on commit 25d16d3

Please sign in to comment.