Multi-scale residual network model combined with Global Average Pooling for action recognition
Jianjun Li,Yu Han,Ming Zhang,Gang Li,Baohua Zhang
DOI: https://doi.org/10.1007/s11042-021-11435-5
IF: 2.577
2021-10-01
Multimedia Tools and Applications
Abstract:Human Action Recognition is a research hotspot in the field of computer vision. However, due to the complexity of the environment and the diversity of actions, Human Action Recognition still faces many challenges. At the same time, traditional CNN has problems such as single feature scale, decreased accuracy of deep network, and excessive network parameters. Aiming at the above research problems, this paper proposes a novel residual network model based on Multi-scale Feature Fusion and Global Average Pooling. The model uses a Multi-scale Feature Fusion module to extract feature information of different scales, enriches spatial-time information. At the end of the network, Global Average Pooling is used to instead of a Fully Connected layer. Compared with a Fully Connected layer, Global Average Pooling will dilute the combination of the relative positions of different features. Therefore, the features trained by convolution are more effective. In addition, Global Average Pooling can realize direct mapping between output channels and feature categories to reduce excessive model parameters. The model in this paper is verified on the UT-interaction dataset, UCF11 (YouTube Action dataset), UCF101 dataset and CAVIAR dataset. The results show that compared with the state-of-the-art approaches, this approach has high recognition accuracy and excellent robustness, and has excellent performance on datasets with complex backgrounds and diverse action categories.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering