Skip to content

Commit

Permalink
Add HF ImageNet dataset authentication to README (#140)
Browse files Browse the repository at this point in the history
Summary:
Running FLAVA pre-training in examples led to authentication failure when trying to download the ImageNet dataset. For a short term solution, we've added details to the README describing how users can generate their own HuggingFace access tokens for successful authentication.

Pull Request resolved: #140

Test Plan: Ran the train script from the README: `python -m flava.train config=flava/configs/pretraining/debug.yaml` to ensure datasets were downloaded successfully and script runs.

Reviewed By: ankitade

Differential Revision: D37736312

Pulled By: RdoubleA

fbshipit-source-id: 45b9b133c1cfb46f763870056e6a0700b2988cab
  • Loading branch information
RdoubleA authored and facebook-github-bot committed Jul 9, 2022
1 parent 5d8e997 commit 2352bc5
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 5 deletions.
6 changes: 3 additions & 3 deletions examples/flava/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ cd examples
pip install -r flava/requirements.txt
```

### Get ImageNet
### Access ImageNet

Get the ImageNet dataset's tar file by following the steps [here](https://huggingface.co/datasets/aps/imagenet2012#dataset-summary) and set the path to ImageNet tar file by `export IMAGENET_TAR=/path/to/imagenet_object_localization_patched2019.tar.gz`. Rest of the datasets required for a debug run should be automatically downloaded on first launch.
To access the ImageNet dataset, you must first create an account at [HuggingFace](https://huggingface.co/join). Once your account is created and your email is confirmed, log in, click on your profile, and go to Settings -> Access Tokens. Create a new token with READ access and copy it to clipboard. Then run `huggingface-cli login` in your terminal and paste the access token there. It should create an auth token at `~/.huggingface/token` that will be used to authenticate the dataset download request. Finally, visit the [dataset page](https://huggingface.co/datasets/imagenet-1k) and accept the terms and conditions of the dataset while logged into your account.

### Launching and test pretraining

Launch your FLAVA debug pretraining job after making sure `IMAGENET_TAR` variable has been exported by running the following command:
After making sure your access token was saved to `~/.huggingface/token`, launch your FLAVA debug pretraining job by running the following command:

```
python -m flava.train config=flava/configs/pretraining/debug.yaml
Expand Down
2 changes: 0 additions & 2 deletions examples/flava/configs/pretraining/debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ datasets:
- _target_: flava.definitions.HFDatasetInfo
key: imagenet-1k
subset: default
extra_kwargs:
data_dir: ${oc.env:IMAGENET_TAR}
text:
_target_: flava.definitions.TrainingSingleDatasetInfo
train:
Expand Down

0 comments on commit 2352bc5

Please sign in to comment.