Official repository for CVPR 2022 L3D-IVU Workshop paper《Self-supervised Video Representation Learning with Cascade Positive Retrieval》
conda env create -f environment.yml
Download both RGB and TVL1 optical flow LMDB files in CoCLR's repo and extract them to data
folder.
Then, rewrite your root path of these files in dataset/lmdb_dataset.py
.
Following directory structure:
--data
----UCF101
------ucf101_frame.lmdb
------...
------...
------ucf101_tvl1_frame.lmdb
------...
------...
Follow this
repo to extract RGB and optical flow frames.
Next, run process_data/src/convert_video_to_lmdb.py
to creat LMDB files.
Both pretrained InfoNCE RGB and Flow models on UCF101 are provided by CoCLR.
--cascade
: enable CPR.--stage 7
: number of stages for CPR.--ratio 0.5
: selection ratio at each stage.
For RGB Model:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 16 \
--epochs 100 --schedule 80 --name_prefix Cycle1-FlowMining_Cascade_flow_debug -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar} \
--cascade --stage 7 --ratio 0.5
For Flow Model:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 --reverse \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 16 \
--epochs 100 --schedule 80 --name_prefix Cycle1-RGBMining_Cascade_Sg_5_Rt_0.50_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar} \
--cascade --stage 7 --ratio 0.50
For RGB Model:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 16 \
--epochs 50 --schedule 40 --name_prefix Cycle1-FlowMining_Cascade_Sg_7_Rt_0.50_ -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar} \
--cascade --stage 7 --ratio 0.5
For Flow Model:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 --reverse \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 16 \
--epochs 50 --schedule 40 --name_prefix Cycle1-RGBMining_Cascade_7_Rt_0.50_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar} \
--cascade --stage 7 --ratio 0.5
--dataset ucf101
: run RGB (ucf101) or Flow (ucf101-f) input.--ten_crop
: test with 10-crop that gives better performace than center-crop.
cd eval/
CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --batch_size 32 --train_what ft --epochs 500 --schedule 400 450 \
--test {selected_rgb_finetuned_checkpoint.pth.tar} --ten_crop
cd eval/
CUDA_VISIBLE_DEVICES=0 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --test {selected_rgb_pretrained_checkpoint.pth.tar} --retrieval
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_nce.py --net s3d --model ubernce --moco-k 2048 \
--dataset ucf101-2clip --seq_len 32 --ds 1 --batch_size 16 \
--epochs 300 --schedule 240 -j 8 --name_prefix scratch_
Our implementation is adapted from CoCLR. We thank the authors for the contribution.
If you find our work helpful, please consider citing:
@article{wu2022-cpr,
title={Self-supervised Video Representation Learning with Cascade Positive Retrieval},
author={Wu, Cheng-En and Lai, Farley and Hu, Yu Hen and Kadav, Asim},
journal={arXiv preprint arXiv:2201.07989},
year={2022}
}