Abstract:To model the indeterminacy of human behaviors, stochastic trajectory prediction requires a sophisticated multi-modal distribution of future trajectories. Emerging diffusion models have revealed their tremendous representation capacities in numerous generation tasks, showing potential for stochastic trajectory prediction. However, expensive time consumption prevents diffusion models from real-time prediction, since a large number of denoising steps are required to assure sufficient representation ability. To resolve the dilemma, we present LEapfrog Diffusion model (LED), a novel diffusion-based trajectory prediction model, which provides real-time, precise, and diverse predictions. The core of the proposed LED is to leverage a trainable leapfrog initializer to directly learn an expressive multi-modal distribution of future trajectories, which skips a large number of denoising steps, significantly accelerating inference speed. Moreover, the leapfrog initializer is trained to appropriately allocate correlated samples to provide a diversity of predicted future trajectories, significantly improving prediction performances. Extensive experiments on four real-world datasets, including NBA/NFL/SDD/ETH-UCY, show that LED consistently improves performance and achieves 23.7%/21.9% ADE/FDE improvement on NFL. The proposed LED also speeds up the inference 19.3/30.8/24.3/25.1 times compared to the standard diffusion model on NBA/NFL/SDD/ETH-UCY, satisfying real-time inference needs. Code is available at <a class="link-external link-https" href="https://github.com/MediaBrain-SJTU/LED" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily addresses two key issues when using diffusion models for stochastic trajectory prediction in real-time applications: 1. **Real-time inference takes too long**: To ensure representational capacity and generate high-quality samples, standard diffusion models require a large number of denoising steps, which consume more computational time. For example, on the NBA dataset, diffusion models need approximately 100 denoising steps to achieve decent prediction performance, which takes about 886 milliseconds to complete one prediction, while the next frame of data arrives every 200 milliseconds. 2. **Independent and identically distributed samples may not capture enough modes in the underlying distribution**: A limited number of independent and identically distributed samples may fail to capture enough modes in the underlying distribution of the generative model. Empirically, a few independently sampled trajectories may miss some important future possibilities, significantly reducing prediction performance due to the lack of proper sample allocation. To address the above issues, the authors propose the LEapfrog Diffusion model (LED), a novel denoising diffusion-based stochastic trajectory prediction model that significantly accelerates inference speed and adaptively allocates multiple related predictions to provide prediction diversity. ### Main Contributions 1. **Proposed a new LEapfrog Diffusion model (LED)**, a denoising diffusion-based stochastic trajectory prediction model. LED achieves accurate and diverse predictions with fast inference speed. 2. **Introduced a new trainable "leapfrog" initializer** that can directly model complex denoising distributions, accelerate inference speed, and adaptively allocate sample diversity to improve prediction performance. 3. **Conducted extensive experiments on four datasets**, including NBA, NFL football dataset, Stanford Drone Dataset, and ETH-UCY dataset. The results show that the proposed method achieves state-of-the-art performance on all datasets compared to previous methods; and compared to standard diffusion models, it improves inference speed by approximately 20 times, meeting the needs of real-time prediction. Through these contributions, the paper demonstrates how to effectively leverage the advantages of diffusion models while overcoming their limitations in real-time applications, particularly in complex human behavior prediction tasks.

Leapfrog Diffusion Model for Stochastic Trajectory Prediction

Enhanced Multimodal Trajectory Prediction for Autonomous Vehicles Using Advanced Diffusion Model Techniques

Motion Latent Diffusion for Stochastic Trajectory Prediction.

DICE: Diverse Diffusion Model with Scoring for Trajectory Prediction

Intention-aware Denoising Diffusion Model for Trajectory Prediction

TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction

Predicting Long-Term Human Behaviors in Discrete Representations via Physics-Guided Diffusion

ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

EquiDiff: A Conditional Equivariant Diffusion Model For Trajectory Prediction

Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion

BCDiff: Bidirectional Consistent Diffusion for Instantaneous Trajectory Prediction.

Predicting trajectory destinations based on diffusion model integrating spatiotemporal features and urban contexts

GDTS: Goal-Guided Diffusion Model with Tree Sampling for Multi-Modal Pedestrian Trajectory Prediction

Diffusion-Based Environment-Aware Trajectory Prediction

Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation

DiffWT: Diffusion-Based Pedestrian Trajectory Prediction with Time-frequency Wavelet Transform

Bayesian-Optimized One-Step Diffusion Model with Knowledge Distillation for Real-Time 3D Human Motion Prediction

DiffTraj: Generating GPS Trajectory with Diffusion Probabilistic Model

Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

Constraint-Aware Diffusion Models for Trajectory Optimization