Skip to content

[ICCV' 23 ORAL] Novel Scenes & Classes: Towards Adaptive Open-set Object Detection

License

Notifications You must be signed in to change notification settings

CityU-AIM-Group/SOMA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[Paper Link] [Poster Link]

By Wuyang Li

Domain Adaptive Object Detection (DAOD) strongly assumes a shared class space between the two domains.

This work breaks through the assumption and formulates Adaptive Open-set Object Detection (AOOD), by allowing the target domain with novel-class objects.

The object detector uses the base-class labels in the source domain for training, and aims to detect base-class objects and identify novel-class objects as unknown in the target domain.

If you have any ideas and problems you hope to discuss, you can reach me via E-mail.

2024/02/29:

I sincerely apologize for a big mistake while cleaning my code and make open-source. I show my sincerest apologies to readers who ran our code before and were unable to achieve similar results.

Since City Val only has 500 images and is insufficient to evaluate the open-set performance (e.g., AOSE), we follow the p2c setting to use all unlabeled data for evaluation. Please check on our corrected target domain dataset settings. I am so so sorry that I forgot this when I cleaned my code!! Besides, thank you for raising the issue #5 (comment) #4 (comment) to let me notice this mistake.

img_folder=paths[target_domain]['train_img'],

πŸ’‘ Preparation

Step 1: Clone and Install the Project

(a) Clone the repository

git clone https://github.com/CityU-AIM-Group/SOMA.git

(b) Install the project following Deformable DETR

Note that the following is in line with our experimental environments, which is slightly different from the official one.

# Linux, CUDA>=9.2, GCC>=5.4
# (ours) CUDA=10.2, GCC=8.4, NVIDIA V100 
# Establish the conda environment

conda create -n aood python=3.7 pip
conda activate aood
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

# Compile the project
cd ./models/ops
sh ./make.sh

# unit test (should see all checking is True)
python test.py

# NOTE: If you meet the permission denied issue when starting the training
cd ../../ 
chmod -R 777 ./

Step 2: Download Necessary Resources

(a) Download pre-processed datasets (VOC format) from the following links

(Foggy) Cityscapes Pascal VOC Clipart BDD100K (Daytime)
Official Links Imgs Imgs+Labels - Imgs
Our Links Labels - Imgs+Labels Labels

(b) Download DINO-pretrained ResNet-50 from this link

Step 3: Change the Path

(a) Change the data path as follows.

[DATASET_PATH]
└─ Cityscapes
   └─ AOOD_Annotations
   └─ AOOD_Main
      └─ train_source.txt
      └─ train_target.txt
      └─ val_source.txt
      └─ val_target.txt
   └─ leftImg8bit
      └─ train
      └─ val
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ bdd_daytime
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ clipart
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ VOCdevkit
   └─ VOC2007
   └─ VOC2012

For bdd100k daytime, put all images into bdd_daytime/JPEGImages/*.jpg.

The image settings for other benchmarks are consistent with SIGMA.

(b) Change the data root in the config files

Replace the DATASET.COCO_PATH in all yaml files in config by your data root $DATASET_PATH, e.g.,

COCO_PATH: /home/wuyangli2/data/

(c) Change the path of DINO-pretrained backbone

Replace the backbone loading path:

state_dict = torch.load('./dino_resnet50_pretrain.pth')

πŸ”₯ Start Training

We use two GPUs for training with 2 source images and 2 target images as input. Please take a look at the generated eval_results.txt file in OUTPUT_DIR, which saves the per-epoch evaluation results in the latex table format.

GPUS_PER_NODE=2 
./tools/run_dist_launch.sh 2 python main_multi_eval.py --config_file {CONFIG_FILE} --opts DATASET.AOOD_SETTING 1

We provide some scripts in our experiments in run.sh. After "--opts", the settings will overwrite the default config file as the maskrcnn-benchmark framework.

πŸ“¦ Well-trained models

Will be provided later

πŸ’¬ Notification

  • The core idea is to select informative motifs (which can be treated as the mix-up of object queries) for self-training.
  • You can try the DA version of OW-DETR in this repository by setting:
-opts AOOD.OW_DETR_ON True
  • Adopting SAM to address AOOD may be a good direction.
  • To visualize unknown boxes, post-processing is needed in PostProcess.

πŸ“ Citation

If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate your acknowledgment.

@InProceedings{Li_2023_ICCV,
    author    = {Li, Wuyang and Guo, Xiaoqing and Yuan, Yixuan},
    title     = {Novel Scenes \& Classes: Towards Adaptive Open-set Object Detection},
    booktitle = {ICCV},
    year      = {2023},
}

Relevant project:

Exploring a similar task for the image classification. [link]

@InProceedings{Li_2023_CVPR,
    author    = {Li, Wuyang and Liu, Jie and Han, Bo and Yuan, Yixuan},
    title     = {Adjustment and Alignment for Unbiased Open Set Domain Adaptation},
    booktitle = {CVPR},
    year      = {2023},
}

🀞 Acknowledgements

We greatly appreciate the tremendous effort for the following works.

  • This work is based on the DAOD framework AQT.
  • Our work is highly inspired by OW-DETR and OpenDet.
  • The implementation of the basic detector is based on Deformable DETR.

πŸ“’ Abstract

Domain Adaptive Object Detection (DAOD) transfers an object detector to a novel domain free of labels. However, in the real world, besides encountering novel scenes, novel domains always contain novel-class objects de facto, which are ignored in existing research. Thus, we formulate and study a more practical setting, Adaptive Open-set Object Detection (AOOD), considering both novel scenes and classes. Directly combing off-the-shelled cross-domain and open-set approaches is sub-optimal since their low-order dependence, such as the confidence score, is insufficient for the AOOD with two dimensions of novel information. To address this, we propose a novel Structured Motif Matching (SOMA) framework for AOOD, which models the high-order relation with motifs, i.e., statistically significant subgraphs, and formulates AOOD solution as motif matching to learn with high-order patterns. In a nutshell, SOMA consists of Structure-aware Novel-class Learning (SNL) and Structure-aware Transfer Learning (STL). As for SNL, we establish an instance-oriented graph to capture the class-independent object feature hidden in different base classes. Then, a high-order metric is proposed to match the most significant motif as high-order patterns, serving for motif-guided novel-class learning. In STL, we set up a semantic-oriented graph to model the class-dependent relation across domains, and match unlabelled objects with high-order motifs to align the cross-domain distribution with structural awareness. Extensive experiments demonstrate that the proposed SOMA achieves state-of-the-art performance.

image

About

[ICCV' 23 ORAL] Novel Scenes & Classes: Towards Adaptive Open-set Object Detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published