Differential motion attention network for efficient action recognition

Caifeng Liu,Fangjie Gu
DOI: https://doi.org/10.1007/s00371-024-03478-0
IF: 2.835
2024-06-15
The Visual Computer
Abstract:Despite the great progresses achieved by commonly-used 3D CNNs and two-stream methods in action recognition, they cause heavy computational burden which are inefficient and even infeasible in real-world scenarios. In this paper, we propose differential motion attention network (DMANet) to specially highlight human dynamics toward efficient action recognition. First, we argue that consecutive frames contain redundant static features and construct a low computational unit for discriminative motion extraction to highlight the human action trajectories across consecutive frames. Second, as not all spatial regions in images play an equal role in depicting human actions, we propose an adaptive protocol to dynamically emphasize informative spatial regions. As an end-to-end lightweight framework, our DMANet outperforms costly 3D CNNs and two-stream methods by 2.3% with only 0.23 computations and other efficient methods by 1.6% on Something–Something v1 dataset. Experimental results on two temporal-related datasets and the large-scale scene-related Kinetics-400 dataset prove the efficacy of DMANet. In-depth ablations further give both quantitative and qualitative support on its effects.
computer science, software engineering
What problem does this paper attempt to address?