Abstract:Forecasting the trajectory of pedestrians in shared urban traffic environments from non-invasive sensor modalities is still considered one of the challenging problems facing the development of autonomous vehicles (AVs). In the literature, this problem is often tackled using recurrent neural networks (RNNs). Despite the powerful capabilities of RNNs in capturing the temporal dependency in the pedestrians' motion trajectories, they were argued to be challenged when dealing with longer sequential data. Additionally, whilst the accommodation for contextual information (such as scene semantics and agents interactions) was shown to be effective for robust trajectory prediction, they can also impact the overall real-time performance of prediction system. Thus, in this work, we are introducing a framework based on the transformer networks that were demonstrated recently to be more efficient and outperformed RNNs in many sequential-based tasks. We relied on a fusion of sensor modalities, namely the past positional information, agent interactions information and scene physical semantics information as an input to our framework in order to not only provide a robust trajectory prediction of pedestrians, but also achieve real-time performance for multi-pedestrians' trajectory prediction. We have evaluated our framework on three real-life datasets of pedestrians in shared urban traffic environments and it has outperformed the compared baseline approaches in both short-term and long-term prediction horizons. For the short-term prediction horizon, our approach has achieved lower scores according to the average displacement error and the root-mean squared error (ADE/RMSE) of predictions over the state-of-the art (SOTA) approach by more than 11 cm and 23 cm, respectively. While for the long-term prediction horizon, our approach has achieved lower ADE and FDE over the SOTA approach by more than 62 cm and 165 cm, respectively. Additionally, our approach has achieved superior real time performance by scoring only 0.025 s (i.e., it can provide 40 individual trajectory predictions per second).

Trajectory Unified Transformer for Pedestrian Trajectory Prediction

Dynamic-learning Spatial-Temporal Transformer Network for Vehicular Trajectory Prediction at Urban Intersections

Crossmodal Transformer Based Generative Framework for Pedestrian Trajectory Prediction

Social-Transformer: Pedestrian Trajectory Prediction in Autonomous Driving Scenes

Multimodal Transformer Networks for Pedestrian Trajectory Prediction.

Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Pedestrian Trajectory Prediction via Spatial Interaction Transformer Network

Context-aware Pedestrian Trajectory Prediction with Multimodal Transformer

Learning Sparse Interaction Graphs of Partially Detected Pedestrians for Trajectory Prediction

Attention-aware Social Graph Transformer Networks for Stochastic Trajectory Prediction

Pedestrian Trajectory Prediction Using Dynamics-based Deep Learning

Joint Intention and Trajectory Prediction Based on Transformer

Enhancing Pedestrian Trajectory Prediction with Crowd Trip Information

Multi-Relational Pedestrian Trajectory Prediction in Complex Scenes.

Pedestrian Trajectory Prediction for Real-Time Autonomous Systems via Context-Augmented Transformer Networks

Pedestrian Motion Prediction Using Transformer-based Behavior Clustering and Data-Driven Reachability Analysis

An improved GAN with transformers for pedestrian trajectory prediction models

Multimodal Forward Generation Transformer Network for Inconspicuous Pedestrian Trajectory Prediction

Pedestrian Trajectory Prediction using Context-Augmented Transformer Networks

Obstacle‐transformer: A Trajectory Prediction Network Based on Surrounding Trajectories

Hyper-STTN: Social Group-aware Spatial-Temporal Transformer Network for Human Trajectory Prediction with Hypergraph Reasoning