A Progressive Difference Method for Capturing Visual Tempos on Action Recognition

Xiaoxiao Sheng,Kunchang Li,Zhiqiang Shen,Gang Xiao
DOI: https://doi.org/10.1109/tcsvt.2022.3207518
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Visual tempos show the dynamics of action instances, characterizing the diversity of the actions, such as walking slowly and running quickly. To facilitate action recognition, it is essential to capture visual tempos. To this end, previous methods sample raw videos at multiple frame rates or integrate multi-scale temporal features. These methods inevitably introduce two-stream networks or feature-level pyramid structures, leading to expensive computation. In this work, we propose a progressive difference method to capture visual tempos for efficient action recognition, by computing coarse-to-fine motion information within a small neighborhood around temporal frames. Specifically, the uniform sampling method is first applied to each video, and then first-order temporal differences around each frame are calculated to describe local motions. On the basis of differences, further computing the variations of differences, namely second-order differences, can gradually capture fine-grained spatiotemporal features and characterize the areas where the motion cues are more prominent. On one hand, multi-order motion differences can be combined with raw input to describe the diversity of the actions. On the other hand, the variations of first-order differences information can be used to activate first-order salient motion regions, thereby facilitating the discrimination of finer-grained actions. Our method can be combined with existing backbones in a plug-and-play manner. Extensive experiments are conducted on several video benchmarks, including Kinetics400, HMDB51, UCF101, UAV-Human, Something-Something V1 and V2. We also give detailed analysis and qualitative experiments to demonstrate the effectiveness of our method.
What problem does this paper attempt to address?