Skip to content

Commit

Permalink
adds visuals
Browse files Browse the repository at this point in the history
  • Loading branch information
jacobmarks committed Apr 8, 2024
1 parent 40a17fd commit 06af687
Show file tree
Hide file tree
Showing 13 changed files with 21 additions and 19 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 20 additions & 18 deletions docs/source/tutorials/clustering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
"# Clustering Images with Embeddings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Clustering](./images/clustering_preview.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -94,8 +101,8 @@
"\n",
"**Hierarchical clustering**: These techniques seek to either:\n",
"\n",
"1. Construct clusters by starting with individual points and iteratively combining clusters into larger composites or \n",
"2. Deconstruct clusters, starting with all objects in one cluster and iteratively diving clusters into smaller components.\n",
"1. *Construct* clusters by starting with individual points and iteratively combining clusters into larger composites or \n",
"2. *Deconstruct* clusters, starting with all objects in one cluster and iteratively diving clusters into smaller components.\n",
"\n",
"Constructive techniques like [Agglomerative Clustering](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering) become computationally expensive as the dataset grows, but performance can be quite impressive for small-to-medium datasets and low-dimensional features."
]
Expand Down Expand Up @@ -253,7 +260,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![FiftyOne App](./images/clustering_dataset_in_app.jpg)"
]
},
{
Expand Down Expand Up @@ -307,7 +314,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![Embeddings Panel](./images/clustering_open_embeddings_panel.gif)"
]
},
{
Expand All @@ -328,7 +335,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![Compute Clusters](./images/clustering_compute_clusters_operator.gif)"
]
},
{
Expand All @@ -346,7 +353,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![Filtering Clusters](./images/clustering_filter_by_cluster_number.gif)"
]
},
{
Expand All @@ -360,7 +367,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![Coloring by Clusters](./images/clustering_color_by_cluster.gif)"
]
},
{
Expand All @@ -381,14 +388,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Returning to the initial set of clusters, let’s dig into one final area in the embeddings plot. Notice how a few images of people playing soccer got lumped into a cluster of primarily tennis images. This is because we passed 2D dimensionality reduced vectors into our clustering routine rather than the embedding vectors themselves. While 2D projections are helpful for visualization, and techniques like UMAP are fairly good at retaining structure, relative distances are not exactly preserved, and some information is lost. Suppose we instead pass our CLIP embeddings directly into our clustering computation with the same hyperparameters. In that case, these soccer images are assigned to the same cluster as the rest of the soccer images, along with other field sports like frisbee and baseball"
"Returning to the initial set of clusters, let’s dig into one final area in the embeddings plot. Notice how a few images of people playing soccer got lumped into a cluster of primarily tennis images. This is because we passed 2D dimensionality reduced vectors into our clustering routine rather than the embedding vectors themselves. While 2D projections are helpful for visualization, and techniques like UMAP are fairly good at retaining structure, relative distances are not exactly preserved, and some information is lost. Suppose we instead pass our CLIP embeddings directly into our clustering computation with the same hyperparameters. In that case, these soccer images are assigned to the same cluster as the rest of the soccer images, along with other field sports like frisbee and baseball:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![UMAP Limitations](./images/clustering_umap_limitation.gif)"
]
},
{
Expand All @@ -411,14 +418,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![HDSCAN Clusters](./images/clustering_hdbscan.gif)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that for HDBSCAN, label ”-1” is given to all background images. These images are not merged into any of the final clusters."
"Note that for HDBSCAN, label `-1` is given to all background images. These images are not merged into any of the final clusters."
]
},
{
Expand All @@ -439,7 +446,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![Clustering Run Info](./images/clustering_get_clustering_info.jpg)"
]
},
{
Expand Down Expand Up @@ -498,7 +505,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**INSERT IMAGE**"
"![Labeling Clusters with GPT-4V](./images/clustering_gpt4v_labeling.gif)"
]
},
{
Expand Down Expand Up @@ -533,11 +540,6 @@
"- **Clustering Hyperparameters**: We barely touched the number of clusters in this walkthrough. Your results may vary as you increase or decrease this number. For some techniques, like k-means clustering, there are heuristics you can use to [estimate the optimal number of clusters](https://www.analyticsvidhya.com/blog/2021/05/k-mean-getting-the-optimal-number-of-clusters/). Don’t stop there; experiment with other hyperparameters as well!\n",
"- **Concept Modeling Techniques**: the built-in concept modeling technique in this walkthrough uses GPT-4V and some light prompting to identify each cluster's core concept. This is but one way to approach an open-ended problem. Try using [image captioning](https://github.com/jacobmarks/image-captioning) and [topic modeling](https://en.wikipedia.org/wiki/Topic_model), or create your own technique!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ your datasets and turn your good models into *great models*.
:header: Clustering Images with Embeddings
:description: Use embeddings to cluster images in your dataset and visualize the results in FiftyOne.
:link: clustering.html
:image: ../_static/images/tutorials/clustering.png
:image: ../_static/images/tutorials/clustering.jpg
:tags: App,Brain,Dataset-Curation,Embeddings,Visualization

.. End of tutorial cards
Expand Down

0 comments on commit 06af687

Please sign in to comment.