TransConvNet: Perform perceptually relevant driver's visual attention predictions

Chuan Xu,Bo Jiang,Yan Su
DOI: https://doi.org/10.1016/j.compeleceng.2024.109104
IF: 4.152
2024-02-03
Computers & Electrical Engineering
Abstract:Drivers adeptly allocate their attention to critical areas and targets in a dynamically evolving driving environment, thereby ensuring the utmost safety. However, prevailing research primarily focuses on static perspectives or relies solely on the feature extraction capabilities of the Convolutional Neural Network (CNN). CNN inherently possesses limitations in capturing long-range contextual information, thus limiting its ability to emulate human attention allocation in dynamic traffic scenarios. Considering this, we suggest a novel driver's visual attention model that synergistically combines the transformer with CNN to accurately forecast the driver's visual attention allocation. The proposed model adopts a dynamic standpoint, incorporating the capacity for comprehensive long-range contextual encoding that encompasses spatial and temporal dimensions. The feature pyramid network fuses multi-scale features, which better preserves the details and semantic information of each scale and strengthens the perceptual ability of the model. Experimental findings suggest that the proposed model enhances the precision of visual attention prediction and yields state-of-the-art performance on the DR(eye)VE dataset. Eventually, the proposed model is implemented on the TDV dataset for generalization experiments and verified its adaptability.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?