Abstract:The multi-modality and stochastic characteristics of human behavior make motion prediction a highly challenging task, which is critical for autonomous driving. While deep learning approaches have demonstrated their great potential in this area, it still remains unsolved to establish a connection between multiple driving scenes (e.g., merging, roundabout, intersection) and the design of deep learning models. Current learning-based methods typically use one unified model to predict trajectories in different scenarios, which may result in sub-optimal results for one individual scene. To address this issue, we propose Multi-Scenes Network (aka. MS-Net), which is a multi-path sparse model trained by an evolutionary process. MS-Net selectively activates a subset of its parameters during the inference stage to produce prediction results for each scene. In the training stage, the motion prediction task under differentiated scenes is abstracted as a multi-task learning problem, an evolutionary algorithm is designed to encourage the network search of the optimal parameters for each scene while sharing common knowledge between different scenes. Our experiment results show that with substantially reduced parameters, MS-Net outperforms existing state-of-the-art methods on well-established pedestrian motion prediction datasets, e.g., ETH and UCY, and ranks the 2nd place on the INTERACTION challenge.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper aims to address the challenge of pedestrian motion prediction in multiple scenarios. Specifically, existing deep learning-based methods typically use a unified model to predict trajectories in different scenarios (e.g., merging, roundabouts, intersections), which may lead to suboptimal results in specific scenarios. This single-model approach cannot effectively handle the multimodal and stochastic characteristics between different traffic scenarios, thereby affecting the performance of autonomous driving systems. To solve this problem, the authors propose a multi-path sparse model—MS-Net (Multi-Scenes Network). MS-Net is trained through evolutionary algorithms and can selectively activate parts of the model's parameters during the inference phase to adapt to each specific traffic scenario. In this way, MS-Net not only improves prediction accuracy but also significantly reduces the number of model parameters, enhancing computational efficiency. ### Main Contributions 1. **Proposed Multi-Path Sparse Model**: MS-Net is a new multi-path sparse model specifically designed to address the multi-scenario motion prediction problem in autonomous driving, treating the motion prediction task as a multi-task learning problem. 2. **Utilized Evolutionary Learning Techniques**: By optimizing the multi-path sparse model through evolutionary algorithms, the overall prediction performance is improved, and parameter efficiency is enhanced. 3. **Adapted to Different Scenario Complexities**: MS-Net can adaptively adjust the number of network layers, dynamically evolving its expressive capability according to the complexity of the scenario. 4. **Experimental Validation**: Evaluations on multiple multi-scenario datasets (such as ETH, UCY, and INTERACTION) show that MS-Net significantly improves the performance of existing state-of-the-art methods while reducing the number of parameters and ranked second in the INTERACTION challenge. ### Method Overview 1. **Meta-Model Initialization**: A meta-model is selected from the existing unified motion prediction networks as the foundation. 2. **Knowledge Pool**: The knowledge pool is used to store shared knowledge and initially contains a meta-model. During the evolutionary process, template models are selected from the knowledge pool to generate sub-models. 3. **Evolutionary Algorithm**: Four evolutionary mechanisms are designed: - **Model Evolution**: Dynamically compress or expand the model structure, adjusting the number of network layers according to the scenario complexity. - **Knowledge Transfer**: Partially inherit the knowledge of existing models while adding new scenario-specific knowledge. - **Scoring Function**: Select sub-models with the fewest additional parameters and highest accuracy. - **Hyperparameter Tuning**: Use a random walk algorithm to adjust the hyperparameters of sub-models, improving generalization performance. Through these mechanisms, MS-Net can significantly reduce the number of model parameters and enhance computational efficiency while maintaining high prediction accuracy.

MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes

Multimodal Vehicle Trajectory Prediction Based on Graph Convolutional Networks

Multi-Relational Pedestrian Trajectory Prediction in Complex Scenes.

Hierarchical Multi-Supervision Multi-Interaction Graph Attention Network for Multi-Camera Pedestrian Trajectory Prediction

MSN: Multi-Style Network for Trajectory Prediction.

Real-Time Heterogeneous Road-Agents Trajectory Prediction Using Hierarchical Convolutional Networks and Multi-task Learning

Multi-PPTP: Multiple Probabilistic Pedestrian Trajectory Prediction in the Complex Junction Scene

MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction

Multiple Goals Network for Pedestrian Trajectory Prediction in Autonomous Driving.

MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

A Unified Environmental Network for Pedestrian Trajectory Prediction

Multi-Stream Representation Learning for Pedestrian Trajectory Prediction

3D-Mbnet: Intention Based Multimodal Vehicle Trajectory Prediction with 3D Social Convolution

Collaborative Motion Prediction Via Neural Motion Message Passing.

MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration

Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction

CSCNet: Contextual Semantic Consistency Network for Trajectory Prediction in Crowded Spaces

MSTP-Net: Multiscale Spatio-temporal Parallel Networks for Human Motion Prediction

Multimodal Transformer Networks for Pedestrian Trajectory Prediction.

A Multi-Stage Goal-Driven Network for Pedestrian Trajectory Prediction

VisionNet: A Drivable-space-based Interactive Motion Prediction Network for Autonomous Driving