Updating the getting started and docs to be more verbose (facebookres…

…earch#167) Summary: Pull Request resolved: facebookresearch#167 Reviewed By: min-xu-ai Differential Revision: D26225658 Pulled By: prigoyal fbshipit-source-id: 0f2c0ee4b3ea277d4360e6021c3eee79fcc246e6
johnnynunez · Feb 3, 2021 · c1664c9 · c1664c9
1 parent 5c1aa40
commit c1664c9
Show file tree

Hide file tree

Showing 4 changed files with 115 additions and 7 deletions.
diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md
@@ -37,18 +37,68 @@ imagenet_full_size
 |  |  |_...
 ```
 
-## Running SimCLR Pre-training on 1-gpu
+## Running SimCLR Pre-training on 1-gpu on ImageNet1K
 
+### If VISSL is built from source
 We provide a config to train model using the pretext SimCLR task on the ResNet50 model.
-Change the `DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path.
+Change the `DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path. 
 
 ```bash
+cd $HOME/vissl
+python3 tools/run_distributed_engines.py \
+    hydra.verbose=true \
+    config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
+    config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
+    config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
+    config=test/integration_test/quick_simclr_imagefolder \
+    config.CHECKPOINT.DIR="./checkpoints" \
+    config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
+```
+
+### If using pre-built conda/pip VISSL packages
+
+Users need to set the dataset and obtain the builtin tool for training. Follow the steps:
+
+#### Step1: Setup ImageNet1K dataset
+If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our [data documentation](https://vissl.readthedocs.io/en/latest/vissl_modules/data.html) and [tutorial](https://colab.research.google.com/drive/1CCuZ50BN99JcOB6VEPytVi_i2tSMd7A3#scrollTo=KPGCiTsXZeW3). NOTE that we need to register
+the dataset with VISSL.
+
+In your python interpretor:
+```python
+>>> json_data = {
+        "imagenet1k_folder": {
+            "train": ["<img_path>", "<lbl_path>"],
+            "val": ["<img_path>", "<lbl_path>"]
+        }
+    }
+>>> from vissl.utils.io import save_file
+>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json")
+>>> from vissl.data.dataset_catalog import VisslDatasetCatalog
+>>> print(VisslDatasetCatalog.list())
+['imagenet1k_folder']
+>>> print(VisslDatasetCatalog.get("imagenet1k_folder"))
+{'train': ['<img_path>', '<lbl_path>'], 'val': ['<img_path>', '<lbl_path>']}
+```
+
+#### Step2: Get the builtin tool and yaml config file
+We will use the pre-built VISSL tool for training [run_distributed_engines.py](https://github.com/facebookresearch/vissl/blob/stable/tools/run_distributed_engines.py) and the config file. Run
+
+```bash
+cd /tmp/ && mkdir -p /tmp/configs/config
+wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py
+wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml
+wget -q  https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py
+```
+
+#### Step3: Train
+```bash
+cd /tmp/
 python3 run_distributed_engines.py \
     hydra.verbose=true \
     config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
     config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
     config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
-    config=test/integration_test/quick_simclr \
+    config=quick_1gpu_resnet50_simclr \
     config.CHECKPOINT.DIR="./checkpoints" \
     config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
 ```
diff --git a/INSTALL.md b/INSTALL.md
@@ -134,7 +134,7 @@ python -c 'import vissl, apex, cv2'
 If you don't have anaconda, [run this bash scrip to install conda](https://github.com/facebookresearch/vissl/blob/master/docker/common/install_conda.sh).
 
 ```bash
-conda create -n vissl python=3.7
+conda create -n vissl_env python=3.7
 source activate vissl_env
 ```
 
@@ -147,7 +147,7 @@ conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
 #### Step 3: Install APEX (conda)
 
 ```bash
-conda install -c vissl apex vissl
+conda install -c vissl apex
 ```
 
 #### Step 4: Install VISSL

diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst
@@ -49,6 +49,9 @@ We will use ImageNet-1K dataset and assume the downloaded data to look like:
 Running SimCLR Pre-training on 1-gpu
 ------------------------------------------
 
+If VISSL is built from source
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 We provide a config to train model using the pretext SimCLR task on the ResNet50 model.
 Change the :code:`DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path.
 
@@ -62,3 +65,58 @@ Change the :code:`DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset fold
     	config=test/integration_test/quick_simclr \
     	config.CHECKPOINT.DIR="./checkpoints" \
     	config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
+
+
+If using pre-built conda/pip VISSL packages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Users need to set the dataset and obtain the builtin tool for training. Follow the steps:
+
+- **Step1: Setup ImageNet1K dataset**
+If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our `data documentation <https://vissl.readthedocs.io/en/latest/vissl_modules/data.html>`_ and `tutorial <https://colab.research.google.com/drive/1CCuZ50BN99JcOB6VEPytVi_i2tSMd7A3#scrollTo=KPGCiTsXZeW3>`_. NOTE that we need to register
+the dataset with VISSL.
+
+In your python interpretor:
+
+.. code-block:: bash
+
+	>>> json_data = {
+		"imagenet1k_folder": {
+		    "train": ["<img_path>", "<lbl_path>"],
+		    "val": ["<img_path>", "<lbl_path>"]
+		}
+	    }
+	>>> from vissl.utils.io import save_file
+	>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json")
+	>>> from vissl.data.dataset_catalog import VisslDatasetCatalog
+	>>> print(VisslDatasetCatalog.list())
+	['imagenet1k_folder']
+	>>> print(VisslDatasetCatalog.get("imagenet1k_folder"))
+	{'train': ['<img_path>', '<lbl_path>'], 'val': ['<img_path>', '<lbl_path>']}
+
+
+- **Step2: Get the builtin tool and yaml config file**
+We will use the pre-built VISSL tool for training `run_distributed_engines.py <https://github.com/facebookresearch/vissl/blob/stable/tools/run_distributed_engines.py>`_ and the config file. Run
+
+.. code-block:: bash
+
+	cd /tmp/ && mkdir -p /tmp/configs/config
+	wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py
+	wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml
+	wget -q  https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py
+
+
+- **Step3: Train**
+
+.. code-block:: bash
+
+	cd /tmp/
+	python3 run_distributed_engines.py \
+	    hydra.verbose=true \
+	    config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
+	    config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
+	    config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
+	    config=quick_1gpu_resnet50_simclr \
+	    config.CHECKPOINT.DIR="./checkpoints" \
+	    config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
+
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -140,7 +140,7 @@ If you don't have anaconda, `run this bash scrip to install conda <https://githu
 
 .. code-block:: bash
 
-    conda create -n vissl python=3.7
+    conda create -n vissl_env python=3.7
     source activate vissl_env
 
 
@@ -155,7 +155,7 @@ If you don't have anaconda, `run this bash scrip to install conda <https://githu
 
 .. code-block:: bash
 
-    conda install -c vissl apex vissl
+    conda install -c vissl apex
 
 
 - **Step 4: Install VISSL**