PyTorch implementation for Cross-modal Active Complementary Learning with Self-refining Correspondence (NeurIPS 2023). The solution to the noisy correspondence problem in image-text matching.
Our directory structure of data
.
data
├── f30k_precomp # pre-computed BUTD region features for Flickr30K, provided by SCAN
│ ├── train_ids.txt
│ ├── train_caps.txt
│ ├── ......
│
├── coco_precomp # pre-computed BUTD region features for COCO, provided by SCAN
│ ├── train_ids.txt
│ ├── train_caps.txt
│ ├── ......
│
├── cc152k_precomp # pre-computed BUTD region features for cc152k, provided by NCR
│ ├── train_ids.txt
│ ├── train_caps.tsv
│ ├── ......
│
└── vocab # vocab files provided by SCAN and NCR
├── f30k_precomp_vocab.json
├── coco_precomp_vocab.json
└── cc152k_precomp_vocab.json
We follow SCAN to obtain image features and vocabularies.
Following NCR, we use a subset of Conceptual Captions (CC), named CC152K. CC152K contains training 150,000 samples from the CC training split, 1,000 validation samples and 1,000 testing samples from the CC validation split.
sh train.sh
#!/bin/bash
# More recommended hyperparameter settings can be found the in the Table 1 at https://openreview.net/attachment?id=UBBeUjTja8&name=supplementary_material
filename=f30k
module_name=SGR
# VSEinfty SAF SGR
gpus=3
# schedules=30
# schedules='2,2,2,20'
# lr_update=10
# schedules='5,5,5,40'
schedules='5,5,5,30'
lr_update=15
noise_rate=0.8
warm_epoch=2
tau=0.05
alpha=0.8
folder_name=./NCR_logs/${filename}_${module_name}_${noise_rate}
noise_file=./noise_index/f30k_precomp_0.8.npy
data_path='/home_bak/hupeng/data/data'
vocab_path='/home_bak/hupeng/data/vocab'
CUDA_VISIBLE_DEVICES=$gpus python train.py --val_step 1000 --gpu $gpus --alpha $alpha --data_name ${filename}_precomp \
--tau $tau --data_path $data_path --vocab_path $vocab_path --warm_epoch $warm_epoch\
--schedules $schedules --lr_update $lr_update --noise_file $noise_file --module_name $module_name --folder_name $folder_name --noise_ratio $noise_rate
python eval.py
If CRCL is useful for your research, please cite the following paper:
@article{qin2024cross,
title={Cross-modal Active Complementary Learning with Self-refining Correspondence},
author={Qin, Yang and Sun, Yuan and Peng, Dezhong and Zhou, Joey Tianyi and Peng, Xi and Hu, Peng},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}