Abstract:In a dynamic environment, autonomous driving vehicles require accurate decision-making and trajectory planning. To achieve this, autonomous vehicles need to understand their surrounding environment and predict the behavior and future trajectories of other traffic participants. In recent years, vectorization methods have dominated the field of motion prediction due to their ability to capture complex interactions in traffic scenes. However, existing research using vectorization methods for scene encoding often overlooks important physical information about vehicles, such as speed and heading angle, relying solely on displacement to represent the physical attributes of agents. This approach is insufficient for accurate trajectory prediction models. Additionally, agents' future trajectories can be diverse, such as proceeding straight or making left or right turns at intersections. Therefore, the output of trajectory prediction models should be multimodal to account for these variations. Existing research has used multiple regression heads to output future trajectories and confidence, but the results have been suboptimal. To address these issues, we propose QINET, a method for accurate multimodal trajectory prediction for all agents in a scene. In the scene encoding part, we enhance the feature attributes of agent vehicles to better represent the physical information of agents in the scene. Our scene representation also possesses rotational and spatial invariance. In the decoder part, we use cross-attention and induce the generation of multimodal future trajectories by employing a self-learned query matrix. Experimental results demonstrate that QINET achieves state-of-the-art performance on the Argoverse motion prediction benchmark and is capable of fast multimodal trajectory prediction for multiple agents.

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

VectorMapNet: End-to-end Vectorized HD Map Learning

Vectorized Representation Dreamer (VRD): Dreaming-Assisted Multi-Agent Motion-Forecasting

VAD: Vectorized Scene Representation for Efficient Autonomous Driving

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

VisionNet: A Drivable-space-based Interactive Motion Prediction Network for Autonomous Driving

SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations

Query-Informed Multi-Agent Motion Prediction

Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images

ProIn: Learning to Predict Trajectory Based on Progressive Interactions for Autonomous Driving

VGA: A Virtual-interaction-force Graph Attention Model for Agent Trajectory Prediction in Traffic Scenarios

CAR-Net: Clairvoyant Attentive Recurrent Network

Spatio-Temporal Context Graph Transformer Design for Map-Free Multi-Agent Trajectory Prediction

Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention

StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving

Hierarchical vector transformer vehicle trajectories prediction with diffusion convolutional neural networks

SVG-Net: An SVG-based Trajectory Prediction Model

VRR-Net: Learning Vehicle–Road Relationships for Vehicle Trajectory Prediction on Highways

Enhancing Vectorized Map Perception with Historical Rasterized Maps

Online Map Vectorization for Autonomous Driving: A Rasterization Perspective