Weakly Supervised Fixated Object Detection in Traffic Videos Based on Driver’s Selective Attention Mechanism

Yi Shi,Long Qin,Shixuan Zhao,Kaifu Yang,Yuyong Cui,Hongmei Yan
DOI: https://doi.org/10.1109/tcsvt.2024.3421988
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Traffic scene perception has a significant impact on driving safety. Inexperienced or distracted drivers usually do not allocate enough attention to the objects closely related to the driving task, which causes potential road hazards. In contrast, experienced drivers pay close attention to the objects highly relevant to the driving task under the guidance of visual selective attention, thus achieving driving safety. However, apart from traffic saliency prediction, few existing works have integrated human driver’s perception with computer models to detect the objects attracting the attention of experienced drivers in traffic videos. In this work, we aim to detect these objects, specifically referred to as traffic fixated objects. To achieve this goal, a new eye-tracking-based video fixated object detection dataset (ET-VFOD) is firstly built, which can be as a benchmark for researchers interested in attention-inspired fixated object detection. Then, we propose a traffic video fixated object detection network named VFOD-Net. VFOD-Net decodes the information closely related to the driving task from the reference frames. The information is used as a top-down prior to modulate the model’s encoding process of the current frame, thus improving the detection performance. Considering the high cost of manual annotation, a weakly supervised traffic video fixated object detection pipeline is developed. Experimental results on the ET-VFOD dataset show that our proposed weakly supervised method achieves detection performance close to that of the fully supervised model, which verifies the effectiveness of the proposed method. Our work combines bottom-up and top-down attention to detect the vital objects in traffic videos from the perspective of human drivers, showing potential applications in intelligent driving, such as driver monitoring and warning systems. The dataset and code are available in https://github.com/YiShi701/VFOD_Net.
What problem does this paper attempt to address?