Progressive Pretext Task Learning for Human Trajectory Prediction

Xiaotong Lin,Tianming Liang,Jianhuang Lai,Jian-Fang Hu
2024-07-16
Abstract:Human trajectory prediction is a practical task of predicting the future positions of pedestrians on the road, which typically covers all temporal ranges from short-term to long-term within a trajectory. However, existing works attempt to address the entire trajectory prediction with a singular, uniform training paradigm, neglecting the distinction between short-term and long-term dynamics in human trajectories. To overcome this limitation, we introduce a novel Progressive Pretext Task learning (PPT) framework, which progressively enhances the model's capacity of capturing short-term dynamics and long-term dependencies for the final entire trajectory prediction. Specifically, we elaborately design three stages of training tasks in the PPT framework. In the first stage, the model learns to comprehend the short-term dynamics through a stepwise next-position prediction task. In the second stage, the model is further enhanced to understand long-term dependencies through a destination prediction task. In the final stage, the model aims to address the entire future trajectory task by taking full advantage of the knowledge from previous stages. To alleviate the knowledge forgetting, we further apply a cross-task knowledge distillation. Additionally, we design a Transformer-based trajectory predictor, which is able to achieve highly efficient two-step reasoning by integrating a destination-driven prediction strategy and a group of learnable prompt embeddings. Extensive experiments on popular benchmarks have demonstrated that our proposed approach achieves state-of-the-art performance with high efficiency. Code is available at <a class="link-external link-https" href="https://github.com/iSEE-Laboratory/PPT" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue in pedestrian trajectory prediction where existing methods often adopt a single and unified training paradigm to handle the entire trajectory prediction, ignoring the distinction between short-term dynamics and long-term dependencies. To overcome this limitation, the authors propose a novel Progressive Pretext Task learning (PPT) framework, which progressively enhances the model's ability to capture short-term dynamics and long-term dependencies through three stages, thereby achieving accurate prediction of the entire future trajectory. Specifically: 1. **Short-term Dynamics Capture**: The task in the first stage is to make the model understand the short-term dynamics in the trajectory by progressively predicting the next position. 2. **Long-term Dependencies Capture**: The task in the second stage is to enhance the model's ability to capture long-term dependencies by predicting the destination and using diversity loss to encourage the diversity of pedestrian intentions. 3. **Complete Trajectory Prediction**: The task in the third stage is to utilize the knowledge from the first two stages to predict the complete future trajectory and introduce cross-task knowledge distillation to avoid forgetting the learned knowledge. Additionally, the authors designed a Transformer-based trajectory predictor, which achieves efficient prediction through a two-step reasoning strategy (first predicting the destination and then generating the remaining trajectory points). Experimental results show that this framework achieves the current best performance on multiple benchmark datasets.