Pedestrian Trajectory Prediction Based on Multimodal Fusion of Motion Sensing Encoder Decoder Network

Zhiqiang Tang,Wenxia Xu,Baocheng Yu,Jian Huang,Xinxing Chen,Hanyan He
DOI: https://doi.org/10.1109/m2vip62491.2024.10746073
2024-01-01
Abstract:Aiming at the problem that traditional pedestrian trajectory prediction tasks are mostly concentrated in the bird 's-eye view and are not suitable for practical applications such as robots, this paper proposes a motion awareness encoder decoder network framework based on multi-modal fusion in the first perspective. The framework integrates multi-source heterogeneous information such as visual image, geometric position, robot motion and pedestrian posture, and shows the influence of pedestrian motion on trajectory prediction. In the encoder, two branches of target estimation and action prediction are designed, which are used to perceive the intention of pedestrian action and the future target position. The decoder adopts the Bi-LSTM architecture, which is able to capture both forward and back-up contextual information of the sequence. Finally, the three related tasks of target estimation, action prediction and trajectory prediction are integrated into a joint training framework through the multi-task loss function. The experimental results show that the proposed method achieves excellent performance on several standard indexes, and verifies the effectiveness of the multi-modal fusion and motion perception strategy.
What problem does this paper attempt to address?