Skip to content

343gltysprk/ovow

Repository files navigation

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

main

Environment

  • Python 3.11.9 toch 2.3.1 CUDA 12.2
  • Install Yolo World
    • Requires: mmcv, mmcv-lite, mmdet, mmengine, mmyolo, numpy, opencv-python, openmim, supervision, tokenizers, torch, torchvision, transformers, wheel
  • Prepare datasets:
    • M-OWODB and S-OWODB
      • Download COCO and PASCAL VOC.
      • Convert annotation format using coco_to_voc.py.
      • Move all images to datasets/JPEGImages and annotations to datasets/Annotations.
    • nu-OWODB
      • For nu-OWODB, first download nuimages from here.
      • Convert annotation format using nuimages_to_voc.py.

Getting Started

  • Training open world object detector:

    sh train.sh
    
  • To evaluate the model:

    sh test_owod.sh
    
    • To reproduce our results, please download our checkpoints here

Citation

If you find this code useful, please consider citing:

@misc{li2024openvocabularyopenworld,
      title={From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects}, 
      author={Zizhao Li and Zhengkang Xiang and Joseph West and Kourosh Khoshelham},
      year={2024},
      eprint={2411.18207},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18207}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published