Remove unsqueeze call in image tutorial #1108

elisno · 2024-04-22T18:24:36Z

With the latest release of the datasets library (2.19.0), the torch formatter now handles PIL objects of 2d arrays (e.g. grayscale images) differently.

Now when calling Dataset.with_format("torch"), every image tensor now has shape (1, H, W) for grayscale images. Previously, this was done by the unsqueeze(0) call in the subsequent step in the preprocessing pipeline, but calling unsqueeze is redundant for 2d image datasets (stored as PIL objects).

You can verify this by running:

from datasets import load_dataset

dataset = load_dataset("fashion_mnist", split="train")
transformed_dataset = dataset.with_format("torch")
transformed_dataset[:2]["image"].shape
#torch.Size([2, 1, 28, 28])

Previously, this would have returned

#torch.Size([2, 28, 28])

which required the call to unsqueeze.

With the latest release of the datasets library (2.19.0), the torch formatter now handles PIL objects of 2d arrays (e.g. grayscale images) differently. Now when calling `Dataset.with_format("torch")`, every image tensor now has shape (1, H, W) for grayscale images. Previously, this was done by the `unsqueeze(0)` call in the subsequent step in the preprocessing pipeline, but calling `unsqueeze` is redundant for 2d image datasets (stored as PIL objects).

docs/source/tutorials/datalab/image.ipynb

elisno requested a review from sanjanag April 22, 2024 18:24

sanjanag approved these changes Apr 22, 2024

View reviewed changes

jwmueller reviewed Apr 22, 2024

View reviewed changes

docs/source/tutorials/datalab/image.ipynb Outdated Show resolved Hide resolved

put lower-bound of datasets version in image tutorial

84b5e12

jwmueller reviewed Apr 22, 2024

View reviewed changes

docs/source/tutorials/datalab/image.ipynb Outdated Show resolved Hide resolved

explain why we're using 255 to normalize

040d20d

jwmueller merged commit 2e9c4a9 into cleanlab:master Apr 22, 2024
11 of 19 checks passed

elisno mentioned this pull request Apr 23, 2024

image datalab tutorial broken: Getting build error RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [64, 1, 1, 28, 28] #1103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unsqueeze call in image tutorial #1108

Remove unsqueeze call in image tutorial #1108

elisno commented Apr 22, 2024

Remove unsqueeze call in image tutorial #1108

Remove unsqueeze call in image tutorial #1108

Conversation

elisno commented Apr 22, 2024