Description
To support our colleagues' work on the FLAVA paper, and to foster collaborations in the multi-modal space, we would like to implement a few new datasets. Almost all of them are classification datasets but some also support other tasks like segmentation.
- Food 101 @jdsgomes Food 101 dataset #5119
- Stanford Cars @abhi-glitchhg Stanford cars #5166
- FGVC Aircraft @sallysyw Adding fvgc_aircraft dataset #5178
- DTD. A good starting point is this PR from @pmeier Add DTD dataset #5115
- Oxford Pets. This one also comes with ROIs and segmentation masks, which would be nice to support. We could do something similar to CelebA with a
target_type
parameter. @pmeier OxfordIIITPet dataset #5116 - Flowers-102. @zhiqwang Add Flowers102 dataset #5177
- EuroSAT @frgfm Adds EuroSAT to the list of supported datasets #5114
- GSTRB. The homepage is timing out for me, but download links can be found here @sumukhaithal6 Add GTSRB dataset to the list of supporting datasets #5117
- PCAM @NicolasHug Add support for PCAM dataset #5203
- Clevr Counts. See also here for what we exactly need @pmeier add CLEVR dataset #5130
- FER2013 This is a Kaggle dataset, so I'm not sure we'll be able to support download
(but maybe)@pmeier FER2013 dataset #5120 - Sun397 @saswatpp Add SUN397 Dataset #5132
- Country211. Apparently download link is here @puhuk Country dataset #5138
- Rendered SST2 @jdsgomes Add Rendered sst2 dataset #5220
CC-ing @pmeier and @jdsgomes as previously discussed. We're on a fairly short timeline for this work, and ideally we would get all these in by end of January 2022.
I'm also wondering whether this is something that our open source contributors @oke-aditya @frgfm @zhiqwang could be interested in 🚀 ?
Implementing a new dataset
Implementing a dataset consists of 2 main things:
- The dataset class with a
root
,split
,transform
andtarget_transform
parameter. When available we should also support adownload
parameter (from what I checked, most of these are download-able apart maybe FER2013). See e.g. the MNIST class - A test class which will generate automatic tests, e.g. this one for MNIST.
If there's some ambiguity in the choices to make, the reference to follow is the VISSL where most of these datasets are already supported.
For contritbutors
If you're interesting in taking one of the datasets above, please comment below with "I'm working on dataset X" so that others don't pick the same! :)
cc @pmeier