Action recognition method based on a novel keyframe extraction method and enhanced 3D convolutional neural network

Li, Saiwei,Zhang, Yuankui,Lu, Hongyi,Pan, Hao
DOI: https://doi.org/10.1007/s13042-024-02235-y
2024-06-12
International Journal of Machine Learning and Cybernetics
Abstract:At present, action recognition is a challenging task in the field of computer vision. Traditional action recognition methods cannot fully extract the spatiotemporal features of actions in video. To address the problem, an action recognition method based on keyframe extraction and DAMR_3DNet (D3DNet+3D Attention Mechanism module+3D Residual module) is proposed. Firstly, we explore a keyframe extraction method based on image information entropy and hog_ssim similarity algorithm, which selects keyframes from the input video to represent video content. And we take the selected keyframes as the model input to reduce the computational complexity of network model. Afterward, we design a DAMR_3DNet model to recognize action and reduce the parameters of network. The D3DNet module improves the C3D network by using the 3D decoupled convolution substituting the 3D convolution and introducing a feature fusion layer. And a 3D attention mechanism is designed to strengthen the action features and reduce the influence of background features. Finally, a 3D residual structure is explored to avoid gradient disappearance while fusing the high-level and low-level spatiotemporal features. Experiments consistently show the superiority of the proposed method on UCF101, Chinese sign language (CSL) and HMDB51 datasets. And the results demonstrate that the proposed method is effective, which improves the performance of action recognition and outperforms the most state-of-the-art methods.
computer science, artificial intelligence
What problem does this paper attempt to address?