Pedestrian Facial Attention Detection Using Deep Fusion and Multi-modal Fusion Classifier

Jing Lian,Zhenghao Wang,Dongfang Yang,Wen Zheng,Linhui Li,Yibin Zhang
DOI: https://doi.org/10.1109/tcsvt.2024.3465438
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Pedestrian facial attention plays an essential role in autonomous driving scenarios where a vehicle has to handle complex interactions with pedestrians. By inferring whether pedestrians are making eye contact with the ego-vehicle, the intention of pedestrians can be deduced. However, traditional gaze estimation and eye detection algorithms have limitations in complex traffic scenes due to the lower resolution caused by spatial distance and the lack of visual features caused by occlusion. To address these limitations, this study proposes an innovative pedestrian facial attention detection framework. The proposed framework adopts a deep feature fusion strategy to achieve a deep-level fusion of visual features and semantic pose features. Moreover, a multi-modal fusion classifier that helps discover the cross-model spatial interactive representation from the feature maps, thus enhancing the robustness of model generalization, is proposed. The proposed framework is verified by experiments on public JAAD and LOOK datasets. The experimental results demonstrate the effectiveness of the proposed framework, indicating that it can achieve better performance compared to the existing methods.
What problem does this paper attempt to address?