Skip to content

Commit

Permalink
[Templates] Reintroduce requirements.txt + temporary patch fixes (ray…
Browse files Browse the repository at this point in the history
…-project#34903)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
  • Loading branch information
justinvyu authored May 2, 2023
1 parent 9f60a09 commit f30f2ed
Show file tree
Hide file tree
Showing 13 changed files with 284 additions and 169 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# NOTE: Add @ray-project/ray-docs to all following docs subdirs.
/doc/ @ray-project/ray-docs
/doc/source/use-cases.rst @ericl @pcmoritz
/doc/source/templates @justinvyu @sofianhnaide

# ==== Ray core ====

Expand Down
13 changes: 9 additions & 4 deletions doc/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,10 @@ py_test_run_all_subdirectory(

filegroup(
name = "workspace_templates",
srcs = glob(["source/templates/tests/*.ipynb"]),
srcs = glob([
"source/templates/tests/**/*.ipynb",
"source/templates/tests/**/requirements.txt"
]),
visibility = ["//doc:__subpackages__"]
)

Expand All @@ -255,7 +258,8 @@ py_test(

py_test_run_all_notebooks(
size = "large",
include = ["source/templates/tests/many_model_training.ipynb"],
# TODO(justinvyu): Merge tests/ with the regular versions of the templates.
include = ["source/templates/tests/02_many_model_training/many_model_training.ipynb"],
exclude = [],
data = ["//doc:workspace_templates"],
tags = ["exclusive", "team:ml", "ray_air"],
Expand All @@ -267,8 +271,9 @@ py_test_run_all_notebooks(
py_test_run_all_notebooks(
size = "large",
include = [
"source/templates/tests/batch_inference.ipynb",
"source/templates/tests/serving_stable_diffusion.ipynb"
# TODO(justinvyu): Merge tests/ with the regular versions of the templates.
"source/templates/tests/01_batch_inference/batch_inference.ipynb",
"source/templates/tests/03_serving_stable_diffusion/serving_stable_diffusion.ipynb"
],
exclude = [],
data = ["//doc:workspace_templates"],
Expand Down
62 changes: 39 additions & 23 deletions doc/source/templates/01_batch_inference/batch_inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
"source": [
"# Scaling Batch Inference with Ray Data\n",
"\n",
"This template is a quickstart to using [Ray Data](https://docs.ray.io/en/latest/data/data.html) for batch inference. Ray Data is one of many libraries under the [Ray AI Runtime](https://docs.ray.io/en/latest/ray-air/getting-started.html). See [this blog post](https://www.anyscale.com/blog/model-batch-inference-in-ray-actors-actorpool-and-datasets) for more information on why and how you should perform batch inference with Ray!\n",
"This template is a quickstart to using [Ray Data](https://docs.ray.io/en/latest/data/dataset.html) for batch inference. Ray Data is one of many libraries under the [Ray AI Runtime](https://docs.ray.io/en/latest/ray-air/getting-started.html). See [this blog post](https://www.anyscale.com/blog/model-batch-inference-in-ray-actors-actorpool-and-datasets) for more information on why and how you should perform batch inference with Ray!\n",
"\n",
"This template walks through GPU batch prediction on an image dataset using a PyTorch model, but the framework and data format are there just to help you build your own application!\n",
"\n",
"At a high level, this template will:\n",
"1. [Load your dataset using Ray Data.](https://docs.ray.io/en/latest/data/creating-datastreams.html)\n",
"2. [Preprocess your dataset before feeding it to your model.](https://docs.ray.io/en/latest/data/transforming-datastreams.html)\n",
"3. [Initialize your model and perform inference on a shard of your dataset with a remote actor.](https://docs.ray.io/en/latest/data/transforming-datastreams.html#callable-class-udfs)\n",
"1. [Load your dataset using Ray Data.](https://docs.ray.io/en/latest/data/creating-datasets.html)\n",
"2. [Preprocess your dataset before feeding it to your model.](https://docs.ray.io/en/latest/data/transforming-datasets.html)\n",
"3. [Initialize your model and perform inference on a shard of your dataset with a remote actor.](https://docs.ray.io/en/latest/data/transforming-datasets.html#writing-user-defined-functions-udfs)\n",
"4. [Save your prediction results.](https://docs.ray.io/en/latest/data/api/input_output.html)\n",
"\n",
"> Slot in your code below wherever you see the ✂️ icon to build a many model training Ray application off of this template!"
Expand Down Expand Up @@ -52,42 +52,46 @@
{
"cell_type": "code",
"execution_count": null,
"id": "770bbdc7",
"metadata": {},
"id": "9d49681f-baf0-4ed8-9740-5c4e38744311",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!ray status"
"NUM_WORKERS: int = 4\n",
"NUM_GPUS_PER_WORKER: float = 1\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d49681f-baf0-4ed8-9740-5c4e38744311",
"metadata": {
"tags": []
},
"id": "770bbdc7",
"metadata": {},
"outputs": [],
"source": [
"NUM_WORKERS: int = 4\n",
"NUM_GPUS_PER_WORKER: float = 1\n"
"!ray status"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "23321ba8",
"metadata": {},
"source": [
"```{tip}\n",
"Try setting `NUM_GPUS_PER_WORKER` to a fractional amount! This will leverage Ray's fractional resource allocation, which means you can schedule multiple batch inference workers to happen on the same GPU.\n",
"Try setting `NUM_GPUS_PER_WORKER` to a fractional amount! This will leverage Ray's fractional resource allocation, which means you can schedule multiple batch inference workers to use the same GPU.\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3b6f2352",
"metadata": {},
"source": [
"> ✂️ Replace this function with logic to load your own data with Ray Data."
"> ✂️ Replace this function with logic to load your own data with Ray Data.\n",
">\n",
"> See [the Ray Data guide on creating datasets](https://docs.ray.io/en/latest/data/creating-datasets.html) to learn how to create a dataset based on the data type and how file storage format."
]
},
{
Expand All @@ -97,7 +101,7 @@
"metadata": {},
"outputs": [],
"source": [
"def load_ray_dataset() -> ray.data.Datastream:\n",
"def load_ray_dataset():\n",
" from ray.data.datasource.partitioning import Partitioning\n",
"\n",
" s3_uri = \"s3://anonymous@air-example-data-2/imagenette2/val/\"\n",
Expand Down Expand Up @@ -163,7 +167,9 @@
"outputs": [],
"source": [
"ds = ds.map_batches(preprocess, batch_format=\"numpy\")\n",
"ds.schema()\n"
"\n",
"print(\"Dataset schema:\\n\", ds.schema())\n",
"print(\"Number of images:\", ds.count())\n"
]
},
{
Expand Down Expand Up @@ -194,9 +200,9 @@
" def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:\n",
" # <Replace this with your own model inference logic>\n",
" input_data = torch.as_tensor(batch[\"image\"], device=self.device)\n",
" with torch.no_grad():\n",
" result = self.model(input_data)\n",
" return {\"predictions\": result.cpu().numpy()}\n"
" with torch.inference_mode():\n",
" pred = self.model(input_data)\n",
" return {\"predicted_class_index\": pred.argmax(dim=1).detach().cpu().numpy()}\n"
]
},
{
Expand All @@ -218,8 +224,9 @@
" PredictCallable,\n",
" batch_size=128,\n",
" compute=ray.data.ActorPoolStrategy(\n",
" # Fix the number of batch inference workers to a specified value.\n",
" size=NUM_WORKERS,\n",
" # Fix the number of batch inference workers to `NUM_WORKERS`.\n",
" min_size=NUM_WORKERS,\n",
" max_size=NUM_WORKERS,\n",
" ),\n",
" num_gpus=NUM_GPUS_PER_WORKER,\n",
" batch_format=\"numpy\",\n",
Expand All @@ -237,14 +244,23 @@
"preds.schema()\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2565ba08",
"metadata": {},
"source": [
"Show the first few predictions!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d606556",
"metadata": {},
"outputs": [],
"source": [
"preds.take(1)\n"
"preds.take(5)\n"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@
"\n",
"This template requires certain Python packages to be available to every node in the cluster.\n",
"\n",
"> ✂️ Add your own package dependencies! You can specify bounds for package versions\n",
"> in the same format as a `requirements.txt` file.\n"
"> ✂️ Add your own package dependencies in the `requirements.txt` file!\n"
]
},
{
Expand All @@ -50,9 +49,21 @@
},
"outputs": [],
"source": [
"requirements = [\n",
" \"statsforecast==1.5.0\",\n",
"]\n"
"requirements_path = \"./requirements.txt\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92161434",
"metadata": {},
"outputs": [],
"source": [
"with open(requirements_path, \"r\") as f:\n",
" requirements = f.read().strip().splitlines()\n",
"\n",
"print(\"Requirements:\")\n",
"print(\"\\n\".join(requirements))\n"
]
},
{
Expand All @@ -64,7 +75,9 @@
"First, we may want to use these modules right here in our script, which is running on the head node.\n",
"Install the Python packages on the head node using `pip install`.\n",
"\n",
"You may need to restart this notebook kernel to access the installed packages.\n"
"```{note}\n",
"You may need to restart this notebook kernel to access the installed packages.\n",
"```\n"
]
},
{
Expand All @@ -74,9 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"all_requirements = \" \".join(requirements)\n",
"\n",
"%pip install {all_requirements}\n"
"%pip install -r {requirements_path} --upgrade"
]
},
{
Expand Down Expand Up @@ -118,11 +129,12 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "b8fc83d0",
"metadata": {},
"source": [
"> ✂️ Replace this value to change the number of data partitions you will use. This will be total the number of Tune trials you will run!\n",
"> ✂️ Replace this value to change the number of data partitions you will use (<= 5000 for this dataset). This will be total the number of Tune trials you will run!\n",
">\n",
"> Note that this template fits two models per data partition and reports the best performing one."
]
Expand All @@ -136,7 +148,7 @@
},
"outputs": [],
"source": [
"NUM_DATA_PARTITIONS: int = 1000\n"
"NUM_DATA_PARTITIONS: int = 500\n"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
statsforecast==1.5.0
10 changes: 10 additions & 0 deletions doc/source/templates/03_serving_stable_diffusion/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
accelerate==0.14.0
diffusers==0.15.1
matplotlib>=3.5.3,<=3.7.1
numpy>=1.21.6,<=1.23.5
Pillow==9.3.0
scipy>=1.7.3,<=1.9.3
tensorboard>=2.11.2,<=2.12.0
torch==1.13.0
torchvision==0.14.0
transformers==4.28.1
Loading

0 comments on commit f30f2ed

Please sign in to comment.