Accurate Classroom Person Detection Based on Multi-Frame Feature Fusion with Attention

Guoan Cheng,Jiandong Ren,Yuping Li,Shengke Wang,Yougang Ding,Hao Wang
DOI: https://doi.org/10.1109/icivc58118.2023.10270326
2023-01-01
Abstract:In recent years, object detection technology has developed rapidly and is widely used in the field of intelligent video monitoring, especially in the classroom scenes, where incorporating the intelligent object detection technique into the video monitoring systems for intelligent monitoring purpose has important practical significance. However, surveillance videos may encounter issues such as object occlusion, motion blur, posture changes, and video defocus, which poses great challenges to object detection in the surveillance video scenarios. Therefore, in this paper, we conduct in-depth research on human detection in classroom multi camera surveillance video scenes and propose a multi frame feature fusion network based on the attention mechanism. The algorithm first adds the dilated convolution layers to the ResNet50 backbone network to expand the receptive field and retain more feature information; Secondly, a multi frame feature fusion strategy based on the attention mechanism was proposed to enhance the feature representation ability of the deep detector by fusing the feature information of adjacent frames, thereby improving the accuracy of human detection in the classroom monitoring video. To verify the effectiveness of the proposed method, we have compiled and produced a large-scale classroom monitoring video dataset, named OUC-ClassVideo. The proposed object detection network, which is developed based on the attention mechanism and multi frame feature fusion strategy, has achieved satisfactory performance on both the public dataset ImageNet VID and the OUC-ClassVideo dataset collected by ourselves.
What problem does this paper attempt to address?