By Wuyang Li
Domain Adaptive Object Detection (DAOD) strongly assumes a shared class space between the two domains.
This work breaks through the assumption and formulates Adaptive Open-set Object Detection (AOOD), by allowing the target domain with novel-class objects.
The object detector uses the base-class labels in the source domain for training, and aims to detect base-class objects and identify novel-class objects as unknown in the target domain.
If you have any ideas and problems you hope to discuss, you can reach me via E-mail.
I sincerely apologize for a big mistake while cleaning my code and make open-source. I show my sincerest apologies to readers who ran our code before and were unable to achieve similar results.
Since City Val only has 500 images and is insufficient to evaluate the open-set performance (e.g., AOSE), we follow the p2c setting to use all unlabeled data for evaluation. Please check on our corrected target domain dataset settings. I am so so sorry that I forgot this when I cleaned my code!! Besides, thank you for raising the issue #5 (comment) #4 (comment) to let me notice this mistake.
Line 156 in 1b6868e
git clone https://github.com/CityU-AIM-Group/SOMA.git
(b) Install the project following Deformable DETR
Note that the following is in line with our experimental environments, which is slightly different from the official one.
# Linux, CUDA>=9.2, GCC>=5.4
# (ours) CUDA=10.2, GCC=8.4, NVIDIA V100
# Establish the conda environment
conda create -n aood python=3.7 pip
conda activate aood
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
# Compile the project
cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
# NOTE: If you meet the permission denied issue when starting the training
cd ../../
chmod -R 777 ./
(Foggy) Cityscapes | Pascal VOC | Clipart | BDD100K (Daytime) | |
---|---|---|---|---|
Official Links | Imgs | Imgs+Labels | - | Imgs |
Our Links | Labels | - | Imgs+Labels | Labels |
(b) Download DINO-pretrained ResNet-50 from this link
[DATASET_PATH]
ββ Cityscapes
ββ AOOD_Annotations
ββ AOOD_Main
ββ train_source.txt
ββ train_target.txt
ββ val_source.txt
ββ val_target.txt
ββ leftImg8bit
ββ train
ββ val
ββ leftImg8bit_foggy
ββ train
ββ val
ββ bdd_daytime
ββ Annotations
ββ ImageSets
ββ JPEGImages
ββ clipart
ββ Annotations
ββ ImageSets
ββ JPEGImages
ββ VOCdevkit
ββ VOC2007
ββ VOC2012
For bdd100k daytime, put all images into bdd_daytime/JPEGImages/*.jpg.
The image settings for other benchmarks are consistent with SIGMA.
Replace the DATASET.COCO_PATH in all yaml files in config by your data root $DATASET_PATH, e.g.,
Replace the backbone loading path:
Line 107 in 41c11cb
We use two GPUs for training with 2 source images and 2 target images as input. Please take a look at the generated eval_results.txt file in OUTPUT_DIR, which saves the per-epoch evaluation results in the latex table format.
GPUS_PER_NODE=2
./tools/run_dist_launch.sh 2 python main_multi_eval.py --config_file {CONFIG_FILE} --opts DATASET.AOOD_SETTING 1
We provide some scripts in our experiments in run.sh. After "--opts", the settings will overwrite the default config file as the maskrcnn-benchmark framework.
Will be provided later
- The core idea is to select informative motifs (which can be treated as the mix-up of object queries) for self-training.
- You can try the DA version of OW-DETR in this repository by setting:
-opts AOOD.OW_DETR_ON True
- Adopting SAM to address AOOD may be a good direction.
- To visualize unknown boxes, post-processing is needed in PostProcess.
If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate your acknowledgment.
@InProceedings{Li_2023_ICCV,
author = {Li, Wuyang and Guo, Xiaoqing and Yuan, Yixuan},
title = {Novel Scenes \& Classes: Towards Adaptive Open-set Object Detection},
booktitle = {ICCV},
year = {2023},
}
Relevant project:
Exploring a similar task for the image classification. [link]
@InProceedings{Li_2023_CVPR,
author = {Li, Wuyang and Liu, Jie and Han, Bo and Yuan, Yixuan},
title = {Adjustment and Alignment for Unbiased Open Set Domain Adaptation},
booktitle = {CVPR},
year = {2023},
}
We greatly appreciate the tremendous effort for the following works.
- This work is based on the DAOD framework AQT.
- Our work is highly inspired by OW-DETR and OpenDet.
- The implementation of the basic detector is based on Deformable DETR.
Domain Adaptive Object Detection (DAOD) transfers an object detector to a novel domain free of labels. However, in the real world, besides encountering novel scenes, novel domains always contain novel-class objects de facto, which are ignored in existing research. Thus, we formulate and study a more practical setting, Adaptive Open-set Object Detection (AOOD), considering both novel scenes and classes. Directly combing off-the-shelled cross-domain and open-set approaches is sub-optimal since their low-order dependence, such as the confidence score, is insufficient for the AOOD with two dimensions of novel information. To address this, we propose a novel Structured Motif Matching (SOMA) framework for AOOD, which models the high-order relation with motifs, i.e., statistically significant subgraphs, and formulates AOOD solution as motif matching to learn with high-order patterns. In a nutshell, SOMA consists of Structure-aware Novel-class Learning (SNL) and Structure-aware Transfer Learning (STL). As for SNL, we establish an instance-oriented graph to capture the class-independent object feature hidden in different base classes. Then, a high-order metric is proposed to match the most significant motif as high-order patterns, serving for motif-guided novel-class learning. In STL, we set up a semantic-oriented graph to model the class-dependent relation across domains, and match unlabelled objects with high-order motifs to align the cross-domain distribution with structural awareness. Extensive experiments demonstrate that the proposed SOMA achieves state-of-the-art performance.