DC3D: A Video Action Recognition Network Based on Dense Connection

Xiaofang Mu,Zhenyu Liu,Jiaji Liu,Hao Li,Yue Li,Yikun Li
DOI: https://doi.org/10.1109/CBD58033.2022.00032
2022-01-01
Abstract:Efficiently extracting the temporal and spatial information of motion in the video, and how to obtain the spatiotemporal features with high degree of differentiation, is the key issue to improve the accuracy of action recognition classification. In this paper, a dense 3D convolutional block is designed as the basic unit to construct a dense convolutional 3D network, the spatiotemporal features existing in the video are extracted at the same time, and the transmission and reuse of the features in the network are strengthened, effectively fuse the shallow and deep spatiotemporal features of the network. At the same time, in order to make the features extracted by the network sufficiently discriminative, this paper proposes a joint loss function based on the Fisher discriminant regularization term, it can make the trained network have the ability to increase the inter-class dispersion and reduce the intra-class dispersion of the classified samples, and improve the classification accuracy. Experiments on the UCF-101 human actions classes dataset show that the network recognition accuracy rate proposed in this paper reaches 92.4%, which is higher than 85.2% of the C3D network, which proves the effectiveness of the method proposed in this paper.
What problem does this paper attempt to address?