Abstract:As a safety critical task, autonomous driving requires accurate predictions of road users' future trajectories for safe motion planning, particularly under challenging conditions. Yet, many recent deep learning methods suffer from a degraded performance on the challenging scenarios, mainly because these scenarios appear less frequently in the training data. To address such a long-tail issue, existing methods force challenging scenarios closer together in the feature space during training to trigger information sharing among them for more robust learning. These methods, however, primarily rely on the motion patterns to characterize scenarios, omitting more informative contextual information, such as interactions and scene layout. We argue that exploiting such information not only improves prediction accuracy but also scene compliance of the generated trajectories. In this paper, we propose to incorporate richer training dynamics information into a prototypical contrastive learning framework. More specifically, we propose a two-stage process. First, we generate rich contextual features using a baseline encoder-decoder framework. These features are split into clusters based on the model's output errors, using the training dynamics information, and a prototype is computed within each cluster. Second, we retrain the model using the prototypes in a contrastive learning framework. We conduct empirical evaluations of our approach using two large-scale naturalistic datasets and show that our method achieves state-of-the-art performance by improving accuracy and scene compliance on the long-tail samples. Furthermore, we perform experiments on a subset of the clusters to highlight the additional benefit of our approach in reducing training bias.

What problem does this paper attempt to address?

This paper focuses on the problem of long-term trajectory prediction in autonomous driving, which is a safety-critical task that requires accurate prediction of future trajectories of road users to achieve safe dynamic planning. Current deep learning methods perform poorly in handling challenging scenarios, mainly due to the occurrence of these scenarios less frequently in the training data, resulting in a long-tail distribution problem. To address this issue, existing methods attempt to bring challenging samples closer in the feature space to promote information sharing. However, these methods primarily rely on motion patterns and overlook more informative contextual information such as interaction and scene layout. The paper proposes a new framework called TrACT (Training Dynamics Aware Contrastive Learning Framework), which utilizes training dynamic information (e.g., the final epoch value of model output error and the variance of error across all training epochs) to cluster samples. Firstly, a baseline encoder-decoder framework is employed to generate rich contextual features. Then, based on the output errors of these features, they are assigned to different clusters and prototypes of each cluster are computed. Subsequently, the model is retrained in a prototype contrastive learning framework to generate more robust trajectories. Experimental results demonstrate that TrACT achieves state-of-the-art performance on two large-scale naturalistic datasets, particularly improving accuracy and scene compliance on long-tail samples. Additionally, the authors showcase the additional benefit of using dataset maps to reduce training bias. In summary, the main contributions of this paper include: 1. The proposal of the TrACT framework, which clusters data samples using training dynamic information to form clusters of different difficulty levels and then trains using prototypes of these clusters in a contrastive learning framework. 2. Extensive experiments conducted on two benchmark datasets, demonstrating the predictive performance of TrACT in the most challenging scenarios. 3. Demonstration of the improvement in scene compliance of generated trajectories using safety metrics. 4. Showcase of the advantage of reducing training bias in long-tail scenarios using dataset maps.

TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction

Dynamic-learning Spatial-Temporal Transformer Network for Vehicular Trajectory Prediction at Urban Intersections

Enhanced Multimodal Trajectory Prediction for Autonomous Vehicles Using Advanced Diffusion Model Techniques

FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-Tail Trajectory Prediction

Action-based Contrastive Learning for Trajectory Prediction

Spatio- Temporal Neural Network with Contrastive Learning for Vehicle Trajectory Prediction

A bidirectional trajectory contrastive learning model for driving intention prediction

Context‐aware trajectory prediction for autonomous driving in heterogeneous environments

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

A multi-modal vehicle trajectory prediction framework via conditional diffusion model: A coarse-to-fine approach

Jointly Contrastive Representation Learning on Road Network and Trajectory

Trajformer: Trajectory Prediction with Local Self-Attentive Contexts for Autonomous Driving

An End-to-End Vehicle Trajcetory Prediction Framework

Lane-changing trajectory prediction based on multi-task learning

Self-supervised contrastive representation learning for large-scale trajectories

TrajPRed: Trajectory Prediction with Region-based Relation Learning

Multi-modal Motion Prediction using Temporal Ensembling with Learning-based Aggregation

Vehicle Trajectory Prediction for Automated Driving Based on Temporal Convolution Networks

Multiple Contextual Cues Integrated Trajectory Prediction for Autonomous Driving

PreCLN: Pretrained-based contrastive learning network for vehicle trajectory prediction

Diverse Multiple Trajectory Prediction Using a Two-Stage Prediction Network Trained With Lane Loss