Action-ViT: Pedestrian Intent Prediction in Traffic Scenes
Shengzhe Zhao,Haopeng Li,Qiuhong Ke,Liangchen Liu,Rui Zhang
DOI: https://doi.org/10.1109/lsp.2021.3134194
2022-01-01
IEEE Signal Processing Letters
Abstract:Pedestrian crossing intention prediction is crucial to traffic safety, which is a challenging task in real traffic scenarios. Traditional methods infer the intention of pedestrians to cross by predicting their future movements based on the observed trajectories in history. The performance of those methods is limited due to insufficient features and sources of information. To address those limitations, we propose a ViT-based model which incorporates multi-modal data to predict the pedestrian crossing intention. Specifically, the proposed model takes into consideration the visual information, poses, bounding box coordinates and action annotations, and gradually fuses those features for the final prediction. Besides, different data processing methods are designed based on the corresponding characteristics of different modalities to make full use of each type of data. Extensive ablation studies are conducted to show the performance of temporal modelling and feature fusion.
engineering, electrical & electronic
What problem does this paper attempt to address?