TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction

Junrui Zhang,Mozhgan Pourkeshavarz,Amir Rasouli
2024-04-30
Abstract:As a safety critical task, autonomous driving requires accurate predictions of road users' future trajectories for safe motion planning, particularly under challenging conditions. Yet, many recent deep learning methods suffer from a degraded performance on the challenging scenarios, mainly because these scenarios appear less frequently in the training data. To address such a long-tail issue, existing methods force challenging scenarios closer together in the feature space during training to trigger information sharing among them for more robust learning. These methods, however, primarily rely on the motion patterns to characterize scenarios, omitting more informative contextual information, such as interactions and scene layout. We argue that exploiting such information not only improves prediction accuracy but also scene compliance of the generated trajectories. In this paper, we propose to incorporate richer training dynamics information into a prototypical contrastive learning framework. More specifically, we propose a two-stage process. First, we generate rich contextual features using a baseline encoder-decoder framework. These features are split into clusters based on the model's output errors, using the training dynamics information, and a prototype is computed within each cluster. Second, we retrain the model using the prototypes in a contrastive learning framework. We conduct empirical evaluations of our approach using two large-scale naturalistic datasets and show that our method achieves state-of-the-art performance by improving accuracy and scene compliance on the long-tail samples. Furthermore, we perform experiments on a subset of the clusters to highlight the additional benefit of our approach in reducing training bias.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper focuses on the problem of long-term trajectory prediction in autonomous driving, which is a safety-critical task that requires accurate prediction of future trajectories of road users to achieve safe dynamic planning. Current deep learning methods perform poorly in handling challenging scenarios, mainly due to the occurrence of these scenarios less frequently in the training data, resulting in a long-tail distribution problem. To address this issue, existing methods attempt to bring challenging samples closer in the feature space to promote information sharing. However, these methods primarily rely on motion patterns and overlook more informative contextual information such as interaction and scene layout. The paper proposes a new framework called TrACT (Training Dynamics Aware Contrastive Learning Framework), which utilizes training dynamic information (e.g., the final epoch value of model output error and the variance of error across all training epochs) to cluster samples. Firstly, a baseline encoder-decoder framework is employed to generate rich contextual features. Then, based on the output errors of these features, they are assigned to different clusters and prototypes of each cluster are computed. Subsequently, the model is retrained in a prototype contrastive learning framework to generate more robust trajectories. Experimental results demonstrate that TrACT achieves state-of-the-art performance on two large-scale naturalistic datasets, particularly improving accuracy and scene compliance on long-tail samples. Additionally, the authors showcase the additional benefit of using dataset maps to reduce training bias. In summary, the main contributions of this paper include: 1. The proposal of the TrACT framework, which clusters data samples using training dynamic information to form clusters of different difficulty levels and then trains using prototypes of these clusters in a contrastive learning framework. 2. Extensive experiments conducted on two benchmark datasets, demonstrating the predictive performance of TrACT in the most challenging scenarios. 3. Demonstration of the improvement in scene compliance of generated trajectories using safety metrics. 4. Showcase of the advantage of reducing training bias in long-tail scenarios using dataset maps.