Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Official Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention" ( Paper )

Setup

For simplicity, you can directly run bash install.sh, which includes the following steps:

install pytorch 1.9.1 and other dependencies, e.g.,

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html # this might need to be changed due to cuda driver version 

pip install -r requirements.txt

install GroundingDINO and download pre-trained weights

cd GroundingDINO && python3 setup.py install

mkdir $PWD/GroundingDINO/weights/

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth -O $PWD/GroundingDINO/weights/groundingdino_swint_ogc.pth

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth -O $PWD/GroundingDINO/weights/groundingdino_swinb_cogcoor.pth

Dataset

- VG150
- COCO

prepare the dataset under the folder data with the instruction

Closed-set SGG

For training OvSGTR (w. Swin-T) on VG150, running with this command

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinT_OGC_full.py  ./data  ./logs/ovsgtr_vg_swint_full ./GroundingDINO/weights/groundingdino_swint_ogc.pth

or

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinB_full.py  ./data  ./logs/ovsgtr_vg_swinb_full ./GroundingDINO/weights/groundingdino_swinb_cogcoor.pth

for using Swin-B backbone. you might need to change the default devices of CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 in the script. Notice that the actual batch size = batch size (default 4 in config files) * num gpus. For inference, running with this command

bash scripts/DINO_eval.sh vg [config file] [data path] [output path] [checkpoint]

or

bash scripts/DINO_eval_dist.sh vg [config file] [data path] [output path] [checkpoint]

with multiple GPUs (there is a slight difference of the result output by DINO_eval.sh and DINO_eval_dist.sh due to data dividing and gathering).

Checkpoints

backbone	R@20/50/100	Checkpoint	Config
Swin-T	26.97 / 35.82 / 41.38	link	config/GroundingDINO_SwinT_OGC_full.py
Swin-B	27.75 / 36.44 / 42.35	link	config/GroundingDINO_SwinB_full.py
Swin-B (w.o. frequency bias, focal loss)	27.53 / 36.18 / 41.79	link	config/GroundingDINO_SwinB_full_open.py

OvD-SGG

for OvD-SGG mode, set sg_ovd_mode = True in the config file (e.g., config/GroundingDINO_SwinT_OGC_ovd.py). Following "Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning" and VS3, we split the VG150 into two parts, i.e., base objects VG150_BASE_OBJ_CATEGORIES, and novel objects in VG150_NOVEL2BASE. For PREDCLS, please set use_gt_box=True when calling inference scripts.

Checkpoints

backbone	R@20/50/100 (Base+Novel)	R@20/50/100 (Novel)	Checkpoint	Config
Swin-T	12.34 / 18.14 / 23.20	6.90 / 12.06 / 16.49	link	config/GroundingDINO_SwinT_OGC_ovd.py
Swin-B	15.43 / 21.35 / 26.22	10.21 / 15.58 / 19.96	link	config/GroundingDINO_SwinB_ovd.py

OvR-SGG

for OvR-SGG mode, set sg_ovr_mode = True in the config file (e.g., config/GroundingDINO_SwinT_OGC_ovr.py). Base object categories VG150_BASE_PREDICATE and novel object categories VG150_NOVEL_PREDICATE can be found in the datasets/vg.py.

Checkpoints

backbone	R@20/50/100 (Base+Novel)	R@20/50/100 (Novel)	Checkpoint	Config	Pre-trained checkpoint	Pre-trained config
Swin-T	15.85 / 20.50 / 23.90	10.17 / 13.47 / 16.20	link	config/GroundingDINO_SwinT_OGC_ovr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B	17.63 / 22.90 / 26.68	12.09 / 16.37 / 19.73	link	config/GroundingDINO_SwinB_ovr.py	link	config/GroundingDINO_SwinB_pretrain.py

OvD+R-SGG

For OvD+R-SGG mode, set both sg_ovd_mode = True and sg_ovr_mode = True (e.g., config/GroundingDINO_SwinT_OGC_ovdr.py)

Checkpoints

backbone	R@20/50/100 (Joint)	R@20/50/100 (Novel Object)	R@20/50/100 (Novel Relation)	Checkpoint	Config	Pre-trained checkpoint	Pre-trained config
Swin-T	10.02 / 13.50 / 16.37	10.56 / 14.32 / 17.48	7.09 / 9.19 / 11.18	link	config/GroundingDINO_SwinT_OGC_ovdr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B	12.37 / 17.14 / 21.03	12.63 / 17.58 / 21.70	10.56 / 14.62 / 18.22	link	config/GroundingDINO_SwinB_ovdr.py	link	config/GroundingDINO_SwinB_pretrain.py

Acknowledgement

Thank Scene-Graph-Benchmark.pytorch and GroundingDINO for their awesome code and models.

Citation

Please cite OvSGTR in your publications if it helps your research:

@inproceedings{chen2024expanding,
  title={Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention},
  author={Chen, Zuyao and Wu, Jinlin and Lei, Zhen and Zhang, Zhaoxiang and Chen, Changwen},
  booktitle={European Conference on Computer Vision},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
GroundingDINO		GroundingDINO
config		config
datasets		datasets
figures		figures
models		models
paper		paper
scripts		scripts
tools		tools
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
install.sh		install.sh
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Setup

Dataset

Closed-set SGG

Checkpoints

OvD-SGG

Checkpoints

OvR-SGG

Checkpoints

OvD+R-SGG

Checkpoints

Acknowledgement

Citation

About

Releases

Packages

Languages

License

gpt4vision/OvSGTR

Folders and files

Latest commit

History

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Setup

Dataset

Closed-set SGG

Checkpoints

OvD-SGG

Checkpoints

OvR-SGG

Checkpoints

OvD+R-SGG

Checkpoints

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages