Multi-stream network with key frame sampling for human action recognition

Limin Xia,Xin Wen
DOI: https://doi.org/10.1007/s11227-024-05893-5
IF: 3.3
2024-02-06
The Journal of Supercomputing
Abstract:Human action recognition is a challenging task in the field of computer vision, where deep learning-based methods have made significant progress. Existing methods often use uniform or random sampling, which results in behaviorally irrelevant frames that contain redundant irrelevant features, which in turn leads to misclassification and high computational cost. To solve the problem of misclassification caused by unsuitable sampling methods and to reduce the computational cost, we propose a novel framework named Multi-Stream network based on Key Frame Sampling for human action recognition (MS-KFS). Specifically, we first introduce self-attention to associate deep information between different regions. On this basis, a key frame sampling module will be trained based on rewards and pseudo-labels to extract the moments where key actions are performed. Finally, the novel difference feature as well as the appearance and motion feature is developed to enhance feature characteristic in terms of both depth and timing, and thus, the features will be put into classifier and accomplish the task. A series experiment results validate that the MS-KFS outperforms the state-of-art methods.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?