Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cluster nodes info test #299

Merged
merged 2 commits into from
Jun 28, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
add cluster nodes info test
wilsonwang371 committed Jun 13, 2022
commit b2f4791b22dae7727c0d7698c84fa800819eb8e0
4 changes: 2 additions & 2 deletions ray-operator/controllers/ray/raycluster_controller_test.go
Original file line number Diff line number Diff line change
@@ -39,7 +39,7 @@ import (
)

const (
DefaultAttempts = 8
DefaultAttempts = 16
DefaultSleepDurationInSeconds = 3
)

@@ -206,7 +206,7 @@ var _ = Context("Inside the default namespace", func() {
// adding a scale down
Eventually(
getResourceFunc(ctx, client.ObjectKey{Name: myRayCluster.Name, Namespace: "default"}, myRayCluster),
time.Second*3, time.Millisecond*500).Should(BeNil(), "My raycluster = %v", myRayCluster)
time.Second*5, time.Millisecond*500).Should(BeNil(), "My raycluster = %v", myRayCluster)
rep := new(int32)
*rep = 2
myRayCluster.Spec.WorkerGroupSpecs[0].Replicas = rep
55 changes: 50 additions & 5 deletions tests/compatibility-test.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#!/usr/bin/env python
import logging
import os
import sys
import tempfile
import time
import unittest
from string import Template

import docker
import sys
import time

ray_version = '1.9.0'
ray_image = "rayproject/ray:1.9.0"
@@ -21,6 +21,7 @@

kuberay_sha = 'nightly'


def shell_run(cmd):
logger.info(cmd)
return os.system(cmd)
@@ -61,6 +62,8 @@ def create_kuberay_cluster():
f.write(raycluster_spec_buf)
raycluster_spec_file = f.name

time.sleep(30)

shell_assert_success('kubectl wait --for=condition=ready pod -n ray-system --all --timeout=1600s')
assert raycluster_spec_file is not None
shell_assert_success('kubectl apply -f {}'.format(raycluster_spec_file))
@@ -85,21 +88,30 @@ def download_images():


class BasicRayTestCase(unittest.TestCase):
def setUp(self):

@classmethod
def setUpClass(cls):
# Ray cluster is running inside a local Kind environment.
# We use port mapping to connect to the Kind environment
# from another local ray container. The local ray container
# outside Kind environment has the same ray version as the
# ray cluster running inside Kind environment.
create_cluster()
apply_kuberay_resources()
download_images()
create_kuberay_cluster()

def test_simple_code(self):
# connect from a ray containter client to ray cluster
# inside a local Kind environment and run a simple test
client = docker.from_env()
container = client.containers.run(ray_image,
remove=True,
detach=True,
tty=True,
network_mode='host')
rtn_code, output = container.exec_run(['python',
'-c', '''
'-c', '''
import ray
ray.init(address='ray://127.0.0.1:10001')

@@ -125,7 +137,40 @@ def f(x):

client.close()

def tearDown(self):
def test_cluster_info(self):
# connect from a ray containter client to ray cluster
# inside a local Kind environment and run a test that
# gets the amount of nodes in the ray cluster.
client = docker.from_env()
container = client.containers.run(ray_image,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containers.run is equivalent to docker run or docker exec? Seems it just create a new container based on ray_image and execute the following scripts locally but not test against kuberay cluster. @wilsonwang371

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. This container is using hostnetwork and will connect to the ray cluster through host port exposed by kind.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a couple of comments in the code explaining the use of the docker client?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilsonwang371 Let's add a comment here? otherwise, I feel people may have similar confusion like me. This code snippet assumes the following prerequisite

https://github.com/ray-project/kuberay/blob/7f16c1205ee7f484ee08e68d38d563947e31a5fa/tests/config/raycluster-service.yaml#L18-L21

https://github.com/ray-project/kuberay/blob/7f16c1205ee7f484ee08e68d38d563947e31a5fa/tests/config/cluster-config.yaml#L20-L23

Using ray.util.connect probably makes more sense because it won't init the cluster in any scenario.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not calling ray.util.connect is because we want exactly the same version of ray for both the ray cluster and the ray client.

@wilsonwang371 Let's add a comment here? otherwise, I feel people may have similar confusion like me. This code snippet assumes the following prerequisite

https://github.com/ray-project/kuberay/blob/7f16c1205ee7f484ee08e68d38d563947e31a5fa/tests/config/raycluster-service.yaml#L18-L21

https://github.com/ray-project/kuberay/blob/7f16c1205ee7f484ee08e68d38d563947e31a5fa/tests/config/cluster-config.yaml#L20-L23

Using ray.util.connect probably makes more sense because it won't init the cluster in any scenario.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For KubeRay tests in the Ray CI, we install Ray in the host environment to test Ray Client and Job Submission.

Using docker directly works as well. Kubectl exec would probably work too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe later we can switch to install particular version of Ray on the host environment too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not calling ray.util.connect is because we want exactly the same version of ray for both the ray cluster and the ray client.

em. Is this related? We can still use the exact same version by using ray.util.connect instead of ray.init. Am i missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not calling ray.util.connect is because we want exactly the same version of ray for both the ray cluster and the ray client.

em. Is this related? We can still use the exact same version by using ray.util.connect instead of ray.init. Am i missing something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ray.init("ray://...") is the preferred API for Ray client these days.

If I understand right, the issue is that we don't currently have the same Ray version installed in the host CI environment as in the Ray pods in KinD, so Ray Client in the host will fail to connect to the server in the Ray head pod.

remove=True,
detach=True,
tty=True,
network_mode='host')
rtn_code, output = container.exec_run(['python',
'-c', '''
import ray
ray.init(address='ray://127.0.0.1:10001')

print(len(ray.nodes()))
'''],
demux=True)
stdout_str, _ = output

container.stop()

if stdout_str != b'2\n':
print(output, file=sys.stderr)
raise Exception('invalid result.')
if rtn_code != 0:
msg = 'invalid return code {}'.format(rtn_code)
print(msg, file=sys.stderr)
raise Exception(msg)

client.close()

@classmethod
def tearDownClass(cls):
delete_cluster()


61 changes: 61 additions & 0 deletions tests/config/ray-cluster.mini.yaml.template
Original file line number Diff line number Diff line change
@@ -55,3 +55,64 @@ spec:
name: dashboard
- containerPort: 10001
name: client
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 1
groupName: small-group
# the following params are used to complete the ray start: ray start --block --node-ip-address= ...
rayStartParams:
redis-password: 'LetMeInRay' # Deprecated since Ray 1.11 due to GCS bootstrapping enabled
node-ip-address: $$MY_POD_IP
block: 'true'
#pod template
template:
metadata:
labels:
rayCluster: raycluster-compatibility-test
rayNodeType: worker # will be injected if missing
groupName: small-group # will be injected if missing
# annotations for pod
annotations:
key: value
spec:
initContainers: # to avoid worker crashing before head service is created
- name: init-myservice
image: busybox:1.28
# Change the cluster postfix if you don't have a default setting
command: ['sh', '-c', "until nslookup $$RAY_IP.$$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
containers:
- name: machine-learning # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: $ray_image
# environment variables to set in the container.Optional.
# Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
env:
- name: TYPE
value: "worker"
- name: RAY_DISABLE_DOCKER_CPU_WARNING
value: "1"
- name: CPU_REQUEST
valueFrom:
resourceFieldRef:
containerName: machine-learning
resource: requests.cpu
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 80
# use volumeMounts.Optional.
# Refer to https://kubernetes.io/docs/concepts/storage/volumes/
volumeMounts:
- mountPath: /var/log
name: log-volume
# use volumes
# Refer to https://kubernetes.io/docs/concepts/storage/volumes/
volumes:
- name: log-volume
emptyDir: {}