Skip to content

Commit

Permalink
Updating the getting started and docs to be more verbose (facebookres…
Browse files Browse the repository at this point in the history
…earch#167)

Summary: Pull Request resolved: facebookresearch#167

Reviewed By: min-xu-ai

Differential Revision: D26225658

Pulled By: prigoyal

fbshipit-source-id: 0f2c0ee4b3ea277d4360e6021c3eee79fcc246e6
  • Loading branch information
prigoyal authored and facebook-github-bot committed Feb 3, 2021
1 parent 5c1aa40 commit c1664c9
Show file tree
Hide file tree
Showing 4 changed files with 115 additions and 7 deletions.
56 changes: 53 additions & 3 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,68 @@ imagenet_full_size
| | |_...
```

## Running SimCLR Pre-training on 1-gpu
## Running SimCLR Pre-training on 1-gpu on ImageNet1K

### If VISSL is built from source
We provide a config to train model using the pretext SimCLR task on the ResNet50 model.
Change the `DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path.
Change the `DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path.

```bash
cd $HOME/vissl
python3 tools/run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=test/integration_test/quick_simclr_imagefolder \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
```

### If using pre-built conda/pip VISSL packages

Users need to set the dataset and obtain the builtin tool for training. Follow the steps:

#### Step1: Setup ImageNet1K dataset
If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our [data documentation](https://vissl.readthedocs.io/en/latest/vissl_modules/data.html) and [tutorial](https://colab.research.google.com/drive/1CCuZ50BN99JcOB6VEPytVi_i2tSMd7A3#scrollTo=KPGCiTsXZeW3). NOTE that we need to register
the dataset with VISSL.

In your python interpretor:
```python
>>> json_data = {
"imagenet1k_folder": {
"train": ["<img_path>", "<lbl_path>"],
"val": ["<img_path>", "<lbl_path>"]
}
}
>>> from vissl.utils.io import save_file
>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json")
>>> from vissl.data.dataset_catalog import VisslDatasetCatalog
>>> print(VisslDatasetCatalog.list())
['imagenet1k_folder']
>>> print(VisslDatasetCatalog.get("imagenet1k_folder"))
{'train': ['<img_path>', '<lbl_path>'], 'val': ['<img_path>', '<lbl_path>']}
```

#### Step2: Get the builtin tool and yaml config file
We will use the pre-built VISSL tool for training [run_distributed_engines.py](https://github.com/facebookresearch/vissl/blob/stable/tools/run_distributed_engines.py) and the config file. Run

```bash
cd /tmp/ && mkdir -p /tmp/configs/config
wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py
wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml
wget -q https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py
```

#### Step3: Train
```bash
cd /tmp/
python3 run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=test/integration_test/quick_simclr \
config=quick_1gpu_resnet50_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
```
4 changes: 2 additions & 2 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ python -c 'import vissl, apex, cv2'
If you don't have anaconda, [run this bash scrip to install conda](https://github.com/facebookresearch/vissl/blob/master/docker/common/install_conda.sh).

```bash
conda create -n vissl python=3.7
conda create -n vissl_env python=3.7
source activate vissl_env
```

Expand All @@ -147,7 +147,7 @@ conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
#### Step 3: Install APEX (conda)

```bash
conda install -c vissl apex vissl
conda install -c vissl apex
```

#### Step 4: Install VISSL
Expand Down
58 changes: 58 additions & 0 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ We will use ImageNet-1K dataset and assume the downloaded data to look like:
Running SimCLR Pre-training on 1-gpu
------------------------------------------

If VISSL is built from source
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We provide a config to train model using the pretext SimCLR task on the ResNet50 model.
Change the :code:`DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path.

Expand All @@ -62,3 +65,58 @@ Change the :code:`DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset fold
config=test/integration_test/quick_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
If using pre-built conda/pip VISSL packages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Users need to set the dataset and obtain the builtin tool for training. Follow the steps:

- **Step1: Setup ImageNet1K dataset**
If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our `data documentation <https://vissl.readthedocs.io/en/latest/vissl_modules/data.html>`_ and `tutorial <https://colab.research.google.com/drive/1CCuZ50BN99JcOB6VEPytVi_i2tSMd7A3#scrollTo=KPGCiTsXZeW3>`_. NOTE that we need to register
the dataset with VISSL.

In your python interpretor:

.. code-block:: bash
>>> json_data = {
"imagenet1k_folder": {
"train": ["<img_path>", "<lbl_path>"],
"val": ["<img_path>", "<lbl_path>"]
}
}
>>> from vissl.utils.io import save_file
>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json")
>>> from vissl.data.dataset_catalog import VisslDatasetCatalog
>>> print(VisslDatasetCatalog.list())
['imagenet1k_folder']
>>> print(VisslDatasetCatalog.get("imagenet1k_folder"))
{'train': ['<img_path>', '<lbl_path>'], 'val': ['<img_path>', '<lbl_path>']}
- **Step2: Get the builtin tool and yaml config file**
We will use the pre-built VISSL tool for training `run_distributed_engines.py <https://github.com/facebookresearch/vissl/blob/stable/tools/run_distributed_engines.py>`_ and the config file. Run

.. code-block:: bash
cd /tmp/ && mkdir -p /tmp/configs/config
wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py
wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml
wget -q https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py
- **Step3: Train**

.. code-block:: bash
cd /tmp/
python3 run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=quick_1gpu_resnet50_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
4 changes: 2 additions & 2 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ If you don't have anaconda, `run this bash scrip to install conda <https://githu

.. code-block:: bash
conda create -n vissl python=3.7
conda create -n vissl_env python=3.7
source activate vissl_env
Expand All @@ -155,7 +155,7 @@ If you don't have anaconda, `run this bash scrip to install conda <https://githu

.. code-block:: bash
conda install -c vissl apex vissl
conda install -c vissl apex
- **Step 4: Install VISSL**
Expand Down

0 comments on commit c1664c9

Please sign in to comment.