Driver intention prediction based on multi-dimensional cross-modality information interaction

Mengfan Xue,Zengkui Xu,Shaohua Qiao,Jiannan Zheng,Tao Li,Yuerong Wang,Dongliang Peng
DOI: https://doi.org/10.1007/s00530-024-01282-3
IF: 3.9
2024-03-16
Multimedia Systems
Abstract:Driver intention prediction allows drivers to perceive possible dangers in the fastest time and has become one of the most important research topics in the field of self-driving in recent years. In this study, we propose a driver intention prediction method based on multi-dimensional cross-modality information interaction. First, an efficient video recognition network is designed to extract channel-temporal features of in-side (driver) and out-side (road) videos, respectively, in which we design a cross-modality channel-spatial weight mechanism to achieve information interaction between the two feature extraction networks corresponding, respectively, to the two modalities, and we also introduce a contrastive learning module by which we force the two feature extraction networks to enhance structural knowledge interaction. Then, the obtained representations of in- and outside videos are fused using a ResLayer-based module to get a preliminary prediction which is then corrected by incorporating the GPS information to obtain a final decision. Besides, we employ a multi-task framework to train the entire network. We validate the proposed method on the public dataset Brain4Car, and the results show that the proposed method achieves competitive results in accuracy while balancing performance and computation.
computer science, information systems, theory & methods
What problem does this paper attempt to address?