Abstract:Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. However, the scarcity of large-scale driving datasets has hindered the development of robust and generalizable motion prediction models, limiting their ability to capture complex interactions and road geometries. Inspired by recent advances in natural language processing (NLP) and computer vision (CV), self-supervised learning (SSL) has gained significant attention in the motion prediction community for learning rich and transferable scene representations. Nonetheless, existing pre-training methods for motion prediction have largely focused on specific model architectures and single dataset, limiting their scalability and generalizability. To address these challenges, we propose SmartPretrain, a general and scalable SSL framework for motion prediction that is both model-agnostic and dataset-agnostic. Our approach integrates contrastive and reconstructive SSL, leveraging the strengths of both generative and discriminative paradigms to effectively represent spatiotemporal evolution and interactions without imposing architectural constraints. Additionally, SmartPretrain employs a dataset-agnostic scenario sampling strategy that integrates multiple datasets, enhancing data volume, diversity, and robustness. Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics. For instance, SmartPretrain significantly reduces the MissRate of Forecast-MAE by 10.6%. These results highlight SmartPretrain's effectiveness as a unified, scalable solution for motion prediction, breaking free from the limitations of the small-data regime. Codes are available at <a class="link-external link-https" href="https://github.com/youngzhou1999/SmartPretrain" rel="external noopener nofollow">this https URL</a>

SLAMP: Stochastic Latent Appearance and Motion Prediction

State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend

A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

Adaptive Hierarchical Motion-Focused Model for Video Prediction.

Motion Graph Unleashed: A Novel Approach to Video Prediction

Probabilistic Future Prediction for Video Scene Understanding

Lightweight Stochastic Video Prediction via Hybrid Warping

Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction

Visual Representation Learning with Stochastic Frame Prediction

Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

Active Patterns Perceived for Stochastic Video Prediction

Implicit Latent Variable Model for Scene-Consistent Motion Forecasting

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Human Motion Prediction Using Manifold-Aware Wasserstein GAN

Vehicle Motion Forecasting using Prior Information and Semantic-assisted Occupancy Grid Maps

Stochastic Multi-Person 3D Motion Forecasting

ASTM - an Attention Based Spatiotemporal Model for Video Prediction Using 3D Convolutional Neural Networks.

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

Fast Semantic-Aware Motion State Detection for Visual SLAM in Dynamic Environment