Frame Attention Mechanism and Transfer Learning Network for Classification on Motion Video

Kechuan Liu,Gaohao Zhou,Xin Cheng
DOI: https://doi.org/10.1109/ipec54454.2022.9777334
2022-01-01
Abstract:Video content classification has a wide range of application scenarios. In situations such as public place security and behavior prediction, the type of behavior of a person is inferred from the continuous images in the video. Video data adds the concept of time compared to image data, which inevitably increases the overall computational effort. A typical research tool uses 3D convolution for feature map extraction in the time domain, but such methods exhibit significant performance loss due to frame down sampling. Inspired by the human judgment of video content, this study computes features on a single frame by transfer learning and then encodes the position of the resulting set of features. An attention mechanism is used to determine the keyframe locations. Finally, a multilayer perceptron is combined to achieve video content classification. The results of the study show that with the chosen dataset. Our model outperforms the 3D convolutional model in discriminative accuracy, confusion matrix, and down sampling conditions.
What problem does this paper attempt to address?