Kinetics is a collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version. The videos include human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 400/600/700 video clips. Each clip is human annotated with a single action class and lasts around 10 seconds.
The Kinetics project publications can be found here: https://deepmind.com/research/open-source/kinetics.
The Kinetics dataset can be downloaded here: https://github.com/cvdfoundation/kinetics-dataset.
Following the next steps to generate Similar Frame Datasets from Kinetics-700!
git clone https://github.com/cvdfoundation/kinetics-dataset.git
cd make_dataset
Randomly sample 2 frames from each video in the test test of Kinetics-700 to get distraction images for SFD.
python sample_frame_from_testset.py
Uniformly sample 8 frames from each video in the validation set of Kinetics-700 to get query-target candidates for SFD.
python sample_frames_from_valset.py
Generate annotation file (.csv format) of query-target candidates.
python get_img_path_and_class.py
Check whether human is in the image using Faster R-CNN
python get_img_path_and_class.py
Ask blip2 whether key information is included in the frame
python blip2_answer_for_sfd.py
Generate the final annotation file for SFD.
python generate_final_val_file.py
change your config here
afcl_config.py
start training
bash train_afcl.sh