diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md index 71ad4b8b1..a007b6c16 100644 --- a/GETTING_STARTED.md +++ b/GETTING_STARTED.md @@ -37,18 +37,68 @@ imagenet_full_size | | |_... ``` -## Running SimCLR Pre-training on 1-gpu +## Running SimCLR Pre-training on 1-gpu on ImageNet1K +### If VISSL is built from source We provide a config to train model using the pretext SimCLR task on the ResNet50 model. -Change the `DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path. +Change the `DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path. ```bash +cd $HOME/vissl +python3 tools/run_distributed_engines.py \ + hydra.verbose=true \ + config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \ + config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \ + config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \ + config=test/integration_test/quick_simclr_imagefolder \ + config.CHECKPOINT.DIR="./checkpoints" \ + config.TENSORBOARD_SETUP.USE_TENSORBOARD=true +``` + +### If using pre-built conda/pip VISSL packages + +Users need to set the dataset and obtain the builtin tool for training. Follow the steps: + +#### Step1: Setup ImageNet1K dataset +If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our [data documentation](https://vissl.readthedocs.io/en/latest/vissl_modules/data.html) and [tutorial](https://colab.research.google.com/drive/1CCuZ50BN99JcOB6VEPytVi_i2tSMd7A3#scrollTo=KPGCiTsXZeW3). NOTE that we need to register +the dataset with VISSL. + +In your python interpretor: +```python +>>> json_data = { + "imagenet1k_folder": { + "train": ["", ""], + "val": ["", ""] + } + } +>>> from vissl.utils.io import save_file +>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json") +>>> from vissl.data.dataset_catalog import VisslDatasetCatalog +>>> print(VisslDatasetCatalog.list()) +['imagenet1k_folder'] +>>> print(VisslDatasetCatalog.get("imagenet1k_folder")) +{'train': ['', ''], 'val': ['', '']} +``` + +#### Step2: Get the builtin tool and yaml config file +We will use the pre-built VISSL tool for training [run_distributed_engines.py](https://github.com/facebookresearch/vissl/blob/stable/tools/run_distributed_engines.py) and the config file. Run + +```bash +cd /tmp/ && mkdir -p /tmp/configs/config +wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py +wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml +wget -q https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py +``` + +#### Step3: Train +```bash +cd /tmp/ python3 run_distributed_engines.py \ hydra.verbose=true \ config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \ config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \ config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \ - config=test/integration_test/quick_simclr \ + config=quick_1gpu_resnet50_simclr \ config.CHECKPOINT.DIR="./checkpoints" \ config.TENSORBOARD_SETUP.USE_TENSORBOARD=true ``` diff --git a/INSTALL.md b/INSTALL.md index da39c50a3..505a69659 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -134,7 +134,7 @@ python -c 'import vissl, apex, cv2' If you don't have anaconda, [run this bash scrip to install conda](https://github.com/facebookresearch/vissl/blob/master/docker/common/install_conda.sh). ```bash -conda create -n vissl python=3.7 +conda create -n vissl_env python=3.7 source activate vissl_env ``` @@ -147,7 +147,7 @@ conda install pytorch torchvision cudatoolkit=10.1 -c pytorch #### Step 3: Install APEX (conda) ```bash -conda install -c vissl apex vissl +conda install -c vissl apex ``` #### Step 4: Install VISSL diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst index 137a9036a..3a09d88dd 100644 --- a/docs/source/getting_started.rst +++ b/docs/source/getting_started.rst @@ -49,6 +49,9 @@ We will use ImageNet-1K dataset and assume the downloaded data to look like: Running SimCLR Pre-training on 1-gpu ------------------------------------------ +If VISSL is built from source +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + We provide a config to train model using the pretext SimCLR task on the ResNet50 model. Change the :code:`DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset folder path. @@ -62,3 +65,58 @@ Change the :code:`DATA.TRAIN.DATA_PATHS` path to the ImageNet train dataset fold config=test/integration_test/quick_simclr \ config.CHECKPOINT.DIR="./checkpoints" \ config.TENSORBOARD_SETUP.USE_TENSORBOARD=true + + +If using pre-built conda/pip VISSL packages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Users need to set the dataset and obtain the builtin tool for training. Follow the steps: + +- **Step1: Setup ImageNet1K dataset** +If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our `data documentation `_ and `tutorial `_. NOTE that we need to register +the dataset with VISSL. + +In your python interpretor: + +.. code-block:: bash + + >>> json_data = { + "imagenet1k_folder": { + "train": ["", ""], + "val": ["", ""] + } + } + >>> from vissl.utils.io import save_file + >>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json") + >>> from vissl.data.dataset_catalog import VisslDatasetCatalog + >>> print(VisslDatasetCatalog.list()) + ['imagenet1k_folder'] + >>> print(VisslDatasetCatalog.get("imagenet1k_folder")) + {'train': ['', ''], 'val': ['', '']} + + +- **Step2: Get the builtin tool and yaml config file** +We will use the pre-built VISSL tool for training `run_distributed_engines.py `_ and the config file. Run + +.. code-block:: bash + + cd /tmp/ && mkdir -p /tmp/configs/config + wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py + wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml + wget -q https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py + + +- **Step3: Train** + +.. code-block:: bash + + cd /tmp/ + python3 run_distributed_engines.py \ + hydra.verbose=true \ + config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \ + config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \ + config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \ + config=quick_1gpu_resnet50_simclr \ + config.CHECKPOINT.DIR="./checkpoints" \ + config.TENSORBOARD_SETUP.USE_TENSORBOARD=true + diff --git a/docs/source/installation.rst b/docs/source/installation.rst index 955d64a63..60e20e29d 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -140,7 +140,7 @@ If you don't have anaconda, `run this bash scrip to install conda