add visuals

jainanisha90 · Apr 12, 2024 · cb657cf · cb657cf
1 parent aed312a
commit cb657cf
Show file tree

Hide file tree

Showing 9 changed files with 119 additions and 8 deletions.
diff --git a/docs/source/_static/images/tutorials/small_object_detection.jpg b/docs/source/_static/images/tutorials/small_object_detection.jpg
diff --git a/docs/source/tutorials/images/sahi_base_model.gif b/docs/source/tutorials/images/sahi_base_model.gif
diff --git a/docs/source/tutorials/images/sahi_base_model_predictions_filtered.jpg b/docs/source/tutorials/images/sahi_base_model_predictions_filtered.jpg
diff --git a/docs/source/tutorials/images/sahi_dataset.jpg b/docs/source/tutorials/images/sahi_dataset.jpg
diff --git a/docs/source/tutorials/images/sahi_high_conf_fp_view.jpg b/docs/source/tutorials/images/sahi_high_conf_fp_view.jpg
diff --git a/docs/source/tutorials/images/sahi_slices.gif b/docs/source/tutorials/images/sahi_slices.gif
diff --git a/docs/source/tutorials/images/sahi_small_boxes_view.gif b/docs/source/tutorials/images/sahi_small_boxes_view.gif
diff --git a/docs/source/tutorials/index.rst b/docs/source/tutorials/index.rst
@@ -170,7 +170,7 @@ your datasets and turn your good models into *great models*.
     :header: Small Object Detection with SAHI
     :description: Detect small objects in your images with Slicing-Aided Hyper-Inference (SAHI) and FiftyOne.
     :link: small_object_detection.html
-    :image: ../_static/images/tutorials/small_object_detection.png
+    :image: ../_static/images/tutorials/small_object_detection.jpg
     :tags: Model-Evaluation,Model-Zoo
 
 .. End of tutorial cards
@@ -216,4 +216,4 @@ your datasets and turn your good models into *great models*.
    Zero-shot classification <zero_shot_classification.ipynb>
    Data augmentation <data_augmentation.ipynb>
    Clustering images <clustering.ipynb>
-   Small object detection with SAHI<small_object_detection.ipynb>
+   Detecting small objects<small_object_detection.ipynb>
diff --git a/docs/source/tutorials/small_object_detection.ipynb b/docs/source/tutorials/small_object_detection.ipynb
@@ -7,13 +7,50 @@
     "# Detecting Small Objects with SAHI"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Teaser](../_static/images/tutorials/small_object_detection.jpg)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Object detection is one of the fundamental tasks in computer vision, but detecting small objects can be particularly challenging.\n",
+    "\n",
+    "In this walkthrough, you'll learn how to use a technique called SAHI (Slicing Aided Hyper Inference) in conjunction with state-of-the-art object detection models to improve the detection of small objects. We'll apply SAHI with Ultralytics' YOLOv8 model to detect small objects in the VisDrone dataset, and then evaluate these predictions to better understand how slicing impacts detection performance.\n",
+    "\n",
+    "It covers the following:\n",
+    "\n",
+    "- Loading the VisDrone dataset from the Hugging Face Hub\n",
+    "- Applying Ultralytics' YOLOv8 model to the dataset\n",
+    "- Using SAHI to run inference on slices of the images\n",
+    "- Evaluating model performance with and without SAHI"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Setup and Installation"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For this walkthrough, we'll be using the following libraries:\n",
+    "\n",
+    "- `fiftyone` for dataset exploration and manipulation\n",
+    "- `huggingface_hub` for loading the VisDrone dataset\n",
+    "- `ultralytics` for running object detection with YOLOv8\n",
+    "- `sahi` for slicing aided hyper inference\n",
+    "\n",
+    "If you haven't already, install the latest versions of these libraries:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 62,
@@ -31,6 +68,15 @@
     "pip install -U fiftyone sahi ultralytics huggingface_hub --quiet"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's get started! 🚀\n",
+    "\n",
+    "First, import the necessary modules from FiftyOne:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -39,7 +85,6 @@
    "source": [
     "import fiftyone as fo\n",
     "import fiftyone.zoo as foz\n",
-    "import fiftyone.brain as fob\n",
     "import fiftyone.utils.huggingface as fouh\n",
     "from fiftyone import ViewField as F"
    ]
@@ -48,7 +93,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We'll be taking advantage of FiftyOne's [Hugging Face Hub integration](https://docs.voxel51.com/integrations/huggingface.html#huggingface-hub) to load a subset of the [VisDrone dataset](https://github.com/VisDrone/VisDrone-Dataset) directly from the Hugging Face Hub:"
+    "Now, let's download some data. We'll be taking advantage of FiftyOne's [Hugging Face Hub integration](https://docs.voxel51.com/integrations/huggingface.html#huggingface-hub) to load a subset of the [VisDrone dataset](https://github.com/VisDrone/VisDrone-Dataset) directly from the [Hugging Face Hub](https://huggingface.co/docs/hub/en/index):"
    ]
   },
   {
@@ -71,6 +116,13 @@
     "dataset = fouh.load_from_hub(\"jamarks/VisDrone2019-DET\", name=\"sahi-test\", max_samples=100, overwrite=True)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before adding any predictions, let's take a look at the dataset:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 22,
@@ -88,6 +140,13 @@
     "session = fo.launch_app(dataset)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![VisDrone](./images/sahi_dataset.jpg)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -99,7 +158,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To start off, let's run our standard inference pipeline with a YOLOv8 (large-variant) model. We can load the model from Ultralytics and then apply this directly to our FiftyOne dataset using `apply_model()`, thanks to [FiftyOne's Ultralytics integration](https://docs.voxel51.com/integrations/ultralytics.html):"
+    "Now that we know what our data looks like, let's run our standard inference pipeline with a YOLOv8 (large-variant) model. We can load the model from Ultralytics and then apply this directly to our FiftyOne dataset using `apply_model()`, thanks to [FiftyOne's Ultralytics integration](https://docs.voxel51.com/integrations/ultralytics.html):"
    ]
   },
   {
@@ -160,6 +219,13 @@
     "session = fo.launch_app(dataset)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Base Model Predictions](./images/sahi_base_model.gif)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -234,13 +300,27 @@
     "session.view = filtered_view.view()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Filtered View](./images/sahi_base_model_predictions_filtered.jpg)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Now that the classes are aligned and we've reduced the crowding in our images, we can see that while the model does a pretty good job of detecting objects, it struggles with the small objects, especially people in the distance. This can happen with large images, as most detection models are trained on fixed-size images. As an example, YOLOv8 is trained on images with maximum side length $640$. When we feed it an image of size $1920$ x $1080$, the model will downsample the image to $640$ x $360$ before making predictions. This downsampling can cause small objects to be missed, as the model may not have enough information to detect them."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Detecting Small Objects with SAHI"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -252,7 +332,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Detecting Small Objects with SAHI"
+    "<figure>\n",
+    "  <img src=\"https://raw.githubusercontent.com/obss/sahi/main/resources/sliced_inference.gif\" alt=\"Alt text\" style=\"width:100%\">\n",
+    "  <figcaption style=\"text-align:center; color:gray;\">Illustration of Slicing Aided Hyper Inference. Image courtesy of SAHI Github Repo.</figcaption>\n",
+    "</figure>"
    ]
   },
   {
@@ -742,6 +825,13 @@
     "session = fo.launch_app(filtered_view, auto=False)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Sliced Model Predictions](./images/sahi_slices.gif)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -760,7 +850,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### FiftyOne's Evaluation API"
+    "### Using FiftyOne's Evaluation API"
    ]
   },
   {
@@ -859,7 +949,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Evaluation Performance on Small Objects"
+    "### Evaluating Performance on Small Objects"
    ]
   },
   {
@@ -897,6 +987,13 @@
     "session.view = small_boxes_view.view()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Small Box View](./images/sahi_small_boxes_view.gif)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 112,
@@ -1092,6 +1189,13 @@
     "session.view = high_conf_fp_view.view()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![False Positives View](./images/sahi_high_conf_fp_view.jpg)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1123,6 +1227,13 @@
     "You will also want to determine which evaluation metrics make the most sense for your use case!"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Additional Resources"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},