Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unsqueeze call in image tutorial #1108

Merged
merged 3 commits into from
Apr 22, 2024

Conversation

elisno
Copy link
Member

@elisno elisno commented Apr 22, 2024

With the latest release of the datasets library (2.19.0), the torch formatter now handles PIL objects of 2d arrays (e.g. grayscale images) differently.

Now when calling Dataset.with_format("torch"), every image tensor now has shape (1, H, W) for grayscale images. Previously, this was done by the unsqueeze(0) call in the subsequent step in the preprocessing pipeline, but calling unsqueeze is redundant for 2d image datasets (stored as PIL objects).

You can verify this by running:

from datasets import load_dataset

dataset = load_dataset("fashion_mnist", split="train")
transformed_dataset = dataset.with_format("torch")
transformed_dataset[:2]["image"].shape
#torch.Size([2, 1, 28, 28])

Previously, this would have returned

#torch.Size([2, 28, 28])

which required the call to unsqueeze.

With the latest release of the datasets library (2.19.0), the torch formatter now handles PIL objects of 2d arrays (e.g. grayscale images) differently.

Now when calling `Dataset.with_format("torch")`, every image tensor now has shape (1, H, W) for grayscale images. Previously, this was done by the `unsqueeze(0)` call in the subsequent step in the preprocessing pipeline, but calling `unsqueeze` is redundant for 2d image datasets (stored as PIL objects).
@elisno elisno requested a review from sanjanag April 22, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants