META: Motion Excitation with Temporal Attention for Compressed Video Action Recognition

Shaojie Li,Jiaqi Li,Jinxin Guo,Ming Ma
DOI: https://doi.org/10.1109/icpads60453.2023.00043
2023-01-01
Abstract:Compressed video action recognition has gained significant attention recently due to its ability to replace the raw video with I-frames and compressed motion clues, such as motion vectors and residuals. This results in substantial reductions in storage and computation costs. However, this task suffers from coarse and a lack of structures that can capture long-range spatiotemporal dependencies. To address these issues, this paper proposes a novel module called Motion Excitation with Temporal Attention (META) and utilizes network structures that can capture long-range dependencies. The META module stimulates motion information between I-frames and enhances the motion representation of the motion vectors. It first assigns different weights to feature-level frames, and then calculates the feature-level temporal differences from spatiotemporal features. Finally, it utilizes these differences to excite the motion-sensitive channels of the features. In addition, for compressed video action recognition, we have introduced a new network structure that combines CNN and Transformer. It can seamlessly integrate the merits of convolution and self-attention in a concise transformer format. The proposed method is evaluated on the challenging HMDB-51 and UCF-101 datasets. The extensive comparison results and ablation studies demonstrate the effectiveness and strength of the proposed method.
What problem does this paper attempt to address?