DA-ResNet: dual-stream ResNet with attention mechanism for classroom video summary

Yuxiang Wu,Xiaoyan Wang,Tianpan Chen,Yan Dou
DOI: https://doi.org/10.1007/s10044-024-01256-1
IF: 2.307
2024-03-15
Pattern Analysis and Applications
Abstract:It is important to generate both diverse and representative video summary for massive videos. In this paper, a convolution neural network based on dual-stream attention mechanism(DA-ResNet) is designed to obtain candidate summary sequences for classroom scenes. DA-ResNet constructs a dual stream input of image frame sequence and optical flow frame sequence to enhance the expression ability. The network also embeds the attention mechanism into ResNet. On the other hand, the final video summary is obtained by removing redundant frames with the improved hash clustering algorithm. In this process, preprocessing is performed first to reduce computational complexity. And then hash clustering is used to retain the frame with the highest entropy value in each class, removing other similar frames. To verify its effectiveness in classroom scenes, we also created ClassVideo, a real dataset consisting of 45 videos from the normal teaching environment of our school. The results of the experiments show the competitiveness of the proposed method DA-ResNet outperforms the existing methods by about 8% in terms of the F-measure. Besides, the visual results also demonstrate its ability to produce classroom video summaries that are very close to the human preferences.
computer science, artificial intelligence
What problem does this paper attempt to address?