BDNet: a Method Based on Forward and Backward Convolutional Networks for Action Recognition in Videos

Chuanjiang Leng,Qichuan Ding,Chengdong Wu,Ange Chen,Huan Wang,Hao Wu
DOI: https://doi.org/10.1007/s00371-023-03073-9
2024-01-01
Abstract:Human action recognition analyses the behaviour in a scene according to the spatial-temporal features carried in a series of image sequences. The critical challenge is to extract informative spatial-temporal features in a limited-length video that frequently constrains the receptive field of the 3D Convolutional Neural Network(CNN). However, present methods mainly consider modeling the action’s spatial-temporal features along a single direction and ignore the information in the opposite. Moreover, the fixed-weight fusion of spatial and temporal features does not distinguish their importance for each action sequence. To attack the problems above, we propose a bi-directional network (BDNet) to combinate the features from both directions of action for recognizing action. Two CNNs are set up to extract spatial-temporal features along the forward and backward action, respectively. Then, a dynamic fusion strategy is adopted to measure the importance of spatial and temporal features for each action. We conducted many experiments on the commonly used action recognition dataset UCF101. Compared with other work, the proposed method achieves promising performance in accuracy and efficiency.
What problem does this paper attempt to address?