By Wuyang Li
Welcome to have a look at our previous work SCAN (AAAI'22 ORAL), which is the foundation of this work.
Check INSTALL.md for installation instructions.
If you have any problem in terms of installation, feel free to screenshot your issue for me. Thanks.
Step 1: Format three benchmark datasets. (BDD100k is also available)
We follow EPM to construct the training and testing set by three following settings. Annotation files are available at onedrive.
Cityscapes -> Foggy Cityscapes
- Download Cityscapes and Foggy Cityscapes dataset from the link. Particularly, we use leftImg8bit_trainvaltest.zip for Cityscapes and leftImg8bit_trainvaltest_foggy.zip for Foggy Cityscapes.
- Download and extract the converted annotation from the following links: Cityscapes and Foggy Cityscapes (COCO format)
- Extract the training sets from leftImg8bit_trainvaltest.zip, then move the folder
leftImg8bit/train/
toCityscapes/leftImg8bit/
directory. - Extract the training and validation set from leftImg8bit_trainvaltest_foggy.zip, then move the folder
leftImg8bit_foggy/train/
andleftImg8bit_foggy/val/
toCityscapes/leftImg8bit_foggy/
directory.
Sim10k -> Cityscapes (class car only)
- Download Sim10k dataset and Cityscapes dataset from the following links: Sim10k and Cityscapes. Particularly, we use repro_10k_images.tgz and repro_10k_annotations.tgz for Sim10k and leftImg8bit_trainvaltest.zip for Cityscapes.
- Download and extract the converted annotation from the following links: Sim10k (VOC format) and Cityscapes (COCO format)
- Extract the training set from repro_10k_images.tgz and repro_10k_annotations.tgz, then move all images under
VOC2012/JPEGImages/
toSim10k/JPEGImages/
directory and move all annotations underVOC2012/Annotations/
toSim10k/Annotations/
. - Extract the training and validation set from leftImg8bit_trainvaltest.zip, then move the folder
leftImg8bit/train/
andleftImg8bit/val/
toCityscapes/leftImg8bit/
directory.
KITTI -> Cityscapes (class car only)
- Download KITTI dataset and Cityscapes dataset from the following links: KITTI and Cityscapes. Particularly, we use data_object_image_2.zip for KITTI and leftImg8bit_trainvaltest.zip for Cityscapes.
- Download and extract the converted annotation from the following links: KITTI (VOC format) and Cityscapes (COCO format).
- Extract the training set from data_object_image_2.zip, then move all images under
training/image_2/
toKITTI/JPEGImages/
directory. - Extract the training and validation set from leftImg8bit_trainvaltest.zip, then move the folder
leftImg8bit/train/
andleftImg8bit/val/
toCityscapes/leftImg8bit/
directory.
[DATASET_PATH]
└─ Cityscapes
└─ cocoAnnotations
└─ leftImg8bit
└─ train
└─ val
└─ leftImg8bit_foggy
└─ train
└─ val
└─ KITTI
└─ Annotations
└─ ImageSets
└─ JPEGImages
└─ Sim10k
└─ Annotations
└─ ImageSets
└─ JPEGImages
Step 2: change the data root for your dataset at paths_catalog.py.
DATA_DIR = [$Your dataset root]
- We provide super detailed code comments in sigma_vgg16_cityscapace_to_foggy.yaml.
- We modify the trainer to meet the requirements of SIGMA.
- GM is integrated in this "middle layer": graph_matching_head.
- Node sampling is conducted together with fcos loss: loss.
- We preserve lots of APIs for many implementation choices in defaults
- We hope this work can inspire lots of good ideas
The ImageNet pretrained VGG-16 backbone (w/o BN) is available at link. You can use it if you cannot download the model through the link in the config file.
The well-trained models are available at: (onedrive).
- We can get higher results than the reported ones with tailor-tuned hyperparameters.
- E2E indicates end-to-end training for better reproducibility. Our config files are used for end-to-end training.
- Two-stage/ longer training and turning learning rate will make the results more stable and get higer mAP/AP75.
- After correcting a default hyper-parameter (as explained in the config file), Sim10k to City achieves better results than the reported ones.
- You can set MODEL.MIDDLE_HEAD.GM.WITH_CLUSTER_UPDATE False to accelerate training greatly with ignorable performance drops. You'd better also make this change for bs=2 since we found it more friendly for the small batch-size training.
- Results will be stable after the learning rate decline (in the training schedule).
Source | Target | E2E | Metric | Backbone | mAP | AP@50 | AP@75 | file |
---|---|---|---|---|---|---|---|---|
City | Foggy | COCO | V-16 | 24.0 | 43.6 | 23.8 | city_to_foggy_vgg16_43.58_mAP.pth | |
City | Foggy | COCO | V-16 | 24.3 | 43.9 | 22.6 | city_to_foggy_vgg16_43.90_mAP.pth | |
City | Foggy | COCO | V-16 | 22.0 | 43.5 | 21.8 | reproduced | |
City | Foggy | COCO | R-50 | 22.7 | 44.3 | 21.2 | city_to_foggy_res50_44.26_mAP.pth | |
City | BDD100k | COCO | V-16 | - | 32.7 | - | city_to_bdd100k_vgg16_32.65_mAP.pth | |
Sim10k | City | COCO | V-16 | 33.4 | 57.1 | 33.8 | sim10k_to_city_vgg16_53.73_mAP.pth | |
Sim10k | City | COCO | V-16 | 32.1 | 55.2 | 32.1 | reproduced | |
KITTI | City | COCO | V-16 | 22.6 | 46.6 | 20.0 | kitti_to_city_vgg16_46.45_mAP.pth |
- More results and models will be given.
- E2E training will achieve satisfactory results for better reproducibility.
- A faster-rcnn based implementation will be given.
Source | Target | E2E | Metric | Backbone | mAP | AP@50 | AP@75 | link |
---|---|---|---|---|---|---|---|---|
City | Foggy | COCO | V-16 | 22.6 | 44.5 | 20.0 | comming soon | |
City | Foggy | COCO | V-16 | 24.6 | 45.7 | 23.2 | comming soon | |
City | BDD100k | COCO | V-16 | 17.0 | 34.0 | 15.1 | comming soon | |
Sim10k | City | COCO | V-16 | 33.1 | 57.8 | 32.8 | comming soon | |
KITTI | City | COCO | V-16 | 24.9 | 49.1 | 22.5 | comming soon | |
City | KITTI | voc | V-16 | - | 76.9 | - | comming soon | |
Pascal | Clipart | voc | R-101 | - | 46.7 | - | comming soon | |
Pascal | Watercolor | voc | R-101 | - | 57.2 | - | comming soon | |
Pascal | Comic | voc | R-101 | - | 37.1 | - | comming soon |
Train the model from the scratch with the default setting (batchsize = 4):
python tools/train_net_da.py \
--config-file configs/SIGMA/xxx.yaml \
Test the well-trained model:
python tools/test_net.py \
--config-file configs/SIGMA/xxx.yaml \
MODEL.WEIGHT well_trained_models/xxx.pth
For example: test cityscapes to foggy cityscapes with ResNet50 backbone.
python tools/test_net.py \
--config-file configs/SIGMA/sigma_res50_cityscapace_to_foggy.yaml \
MODEL.WEIGHT well_trained_models/city_to_foggy_res50_44.26_mAP.pth
What we will provide in the extended journal version?
- More effective graph-related operations.
- Unifying the popular DA-FasterRCNN benchamrk in this project.
- Faster-RCNN based implementation (baseline: 38.3 mAP; ours: 43.5 mAP)
- More benchmark configs, models, and results, e.g., Pascal2Clipart (46.5 mAP)
bs=2 can work well on 12GB GPU and bs=4 can work well on 32GB GPU. If you meet the cuda out of memory error, you can try one/many of the followed operations:
- reuduce your batch-size to 2 (1 is not recommended) and double your training iterations
- disable the one-to-one (o2o) matching by setting MODEL.MIDDLE_HEAD.GM.MATCHING_CFG 'none'
- reduce the sampled node number MODEL.MIDDLE_HEAD.GM.NUM_NODES_PER_LVL_SR and MODEL.MIDDLE_HEAD.GM.NUM_NODES_PER_LVL_TG, e.g., from 100 to 50
If you think this work is helpful for your project, please give it a star and citation:
@inproceedings{li2022sigma,
title={SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection},
author={Li, Wuyang and Liu, Xinyu and Yuan, Yixuan},
booktitle={CVPR},
year={2022}
}
Relevant project:
@inproceedings{li2022scan,
title={SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation},
author={Li, Wuyang and Liu, Xinyu and Yao, Xiwen and Yuan, Yixuan},
booktitle={AAAI},
year={2022}
}
E-mail: wuyangli2-c@my.cityu.edu.hk
This work is based on SCAN (AAAI'22) and EPM (ECCV20).
The implementation of our anchor-free detector is from FCOS.
Domain Adaptive Object Detection (DAOD) leverages a labeled source domain to learn an object detector generalizing to a novel target domain free of annotations. Recent advances align class-conditional distributions through narrowing down cross-domain prototypes (class centers). Though great success, these works ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching. Specifically, we design a Graph-embedded Semantic Completion module (GSC) that completes mismatched semantics through generating hallucination graph nodes in missing categories. Then, we establish cross-image graphs to model class-conditional distributions and learn a graph-guided memory bank for better semantic completion in turn. After representing the source and target data as graphs, we reformulate the adaptation as a graph matching problem, i.e., finding well-matched node pairs across graphs to reduce the domain gap, which is solved with a novel Bipartite Graph Matching adaptor (BGM). In a nutshell, we utilize graph nodes to establish semantic-aware node affinity and leverage graph edges as quadratic constraints in a structure-aware matching loss, achieving fine-grained adaptation with a node-to-node graph matching. Extensive experiments demonstrate that our method outperforms existing works significantly.