Abstract:Pedestrian trajectory prediction using video is essential for many practical traffic applications. Most existing pedestrian trajectory prediction methods are based on fully connected long short-term memory (LSTM) networks and perform well on public datasets. However, these methods still have three defects: a) Most of them rely on manual annotations to obtain information about the environment surrounding the subject pedestrian, which limits practical applications; b) The interaction among pedestrians and obstacles in a scene is little studied, which leads to greater prediction error; c) Traditional LSTM methods are based on the previous moment and ignore the correlation between the future and distant past states of the pedestrian, which generates unrealistic trajectories. To tackle these problems, first, in the stage of data processing, we use an image semantic segmentation algorithm to obtain multi-category obstacle information and design an end-to-end "Siamese Position Extraction" model to obtain more accurate pedestrian interaction data. Second, we design an end-to-end fully convolutional LSTM encoder-decoder with an attention mechanism (FLEAM) to overcome the shortcomings of LSTM. Third, we compare FLEAM with several state-of-the-art LSTM-based prediction methods on multiple video sequences in the datasets ETH, UCY and MOT20. The results show that our approach generates the same prediction error as the best results of the state-of-the-art method. However, FLEAM has more potential for practice application because it does not rely on manually annotated data. We further validate the effectiveness of FLEAM by employing manually annotated data, finding that it generates much less prediction error.

Pedestrian Trajectory Prediction Based on Multimodal Fusion of Motion Sensing Encoder Decoder Network

Crossmodal Transformer Based Generative Framework for Pedestrian Trajectory Prediction

A multimodal stepwise-coordinating framework for pedestrian trajectory prediction

Multimodal Transformer Networks for Pedestrian Trajectory Prediction.

Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar

Spatio-Temporal Interaction Aware and Trajectory Distribution Aware Graph Convolution Network for Pedestrian Multimodal Trajectory Prediction

Crossing-Road Pedestrian Trajectory Prediction Via Encoder-Decoder LSTM.

Pedestrian Trajectory Prediction in Heterogeneous Traffic Using Pose Keypoints-Based Convolutional Encoder-Decoder Network

Multimodal Forward Generation Transformer Network for Inconspicuous Pedestrian Trajectory Prediction

Pedestrian Trajectory Prediction in Heterogeneous Traffic using Facial Keypoints- based Convolutional Encoder-decoder Network

Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction

Fully Convolutional Encoder-Decoder With an Attention Mechanism for Practical Pedestrian Trajectory Prediction

MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction

Multi-Relational Pedestrian Trajectory Prediction in Complex Scenes.

Context-aware Pedestrian Trajectory Prediction with Multimodal Transformer

End-to-end Pedestrian Trajectory Prediction Via Efficient Multi-modal Predictors

Modeling social interaction and intention for pedestrian trajectory prediction

Pedestrian Motion Trajectory Prediction in Intelligent Driving from Far Shot First-Person Perspective Video

Pedestrian behavior prediction model with a convolutional LSTM encoder–decoder

Enhanced Multimodal Trajectory Prediction for Autonomous Vehicles Using Advanced Diffusion Model Techniques

CF-LSTM: Cascaded Feature-Based Long Short-Term Networks for Predicting Pedestrian Trajectory