- Python 3.11.9 toch 2.3.1 CUDA 12.2
- Install Yolo World
- Requires: mmcv, mmcv-lite, mmdet, mmengine, mmyolo, numpy, opencv-python, openmim, supervision, tokenizers, torch, torchvision, transformers, wheel
- Prepare datasets:
- M-OWODB and S-OWODB
- Download COCO and PASCAL VOC.
- Convert annotation format using
coco_to_voc.py
. - Move all images to
datasets/JPEGImages
and annotations todatasets/Annotations
.
- nu-OWODB
- For nu-OWODB, first download nuimages from here.
- Convert annotation format using
nuimages_to_voc.py
.
- M-OWODB and S-OWODB
-
Training open world object detector:
sh train.sh
- Model training starts from pretrained Yolo World checkpoint
-
To evaluate the model:
sh test_owod.sh
- To reproduce our results, please download our checkpoints here
If you find this code useful, please consider citing:
@misc{li2024openvocabularyopenworld,
title={From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects},
author={Zizhao Li and Zhengkang Xiang and Joseph West and Kourosh Khoshelham},
year={2024},
eprint={2411.18207},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.18207},
}