Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition (accepted by IEEE Transactions on Image Processing)
Homepage: https://yangliu9208.github.io/home/
Figure 1: Configuration of the deep neural network for image modality. "f" denotes the number of convolutional filters and their receptive field size, "st" denotes the convolutional stride, "pad" denotes the number of pixels to add to each size of the input, "LRN" denotes whether Local Response Normalization (LRN) is applied or not, and “pool” denotes the downsampling factor.
Figure 2: Configuration of the deep neural network for keyframe modality.
Figure 3: Configuration of the deep neural network for video modality
Stanford40->UCF101 dataset can be downloaded here. Stanford40
ASD->UCF101 dataset can be downloaded here. ASD
EAD->HMDB51 dataset can be downloaded here. EAD, HMDB51
BU101->UCF101 dataset can be downloaded here. BU101, UCF101
Soon will be available.