Abstract:As autonomous driving systems mature, motion forecasting has received increasing attention as a critical requirement for planning. Of particular importance are interactive situations such as merges, unprotected turns, etc., where predicting individual object motion is not sufficient. Joint predictions of multiple objects are required for effective route planning. There has been a critical need for high-quality motion data that is rich in both interactions and annotation to develop motion planning models. In this work, we introduce the most diverse interactive motion dataset to our knowledge, and provide specific labels for interacting objects suitable for developing joint prediction models. With over 100,000 scenes, each 20 seconds long at 10 Hz, our new dataset contains more than 570 hours of unique data over 1750 km of roadways. It was collected by mining for interesting interactions between vehicles, pedestrians, and cyclists across six cities within the United States. We use a high-accuracy 3D auto-labeling system to generate high quality 3D bounding boxes for each road agent, and provide corresponding high definition 3D maps for each scene. Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models. Finally, we provide strong baseline models for individual-agent prediction and joint-prediction. We hope that this new large-scale interactive motion dataset will provide new opportunities for advancing motion forecasting models.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: in an autonomous driving system, how to effectively predict the movements of multiple interacting objects, especially in complex interaction scenarios (such as lane - changing, unprotected turns, etc.). Traditional single - object motion prediction models are not sufficient to deal with these complex situations, so models capable of joint prediction need to be developed. Specifically, this paper mainly focuses on the following aspects: 1. **Lack of high - quality interaction data**: Most of the existing data sets focus on the representation of a single object and pay less attention to large - scale interaction modeling. In order to develop effective joint prediction models, a high - quality data set with rich interactions and annotations is required. 2. **Challenges of multi - object joint prediction**: In the autonomous driving environment, especially when it involves the interactions between vehicles, pedestrians and cyclists, it is not enough to predict the movement of a single object only. It is necessary to be able to predict the motion trajectories of multiple interacting objects simultaneously in order to achieve more effective path planning. 3. **Insufficiency of evaluation metrics**: The existing evaluation metrics are mainly for the prediction performance of a single object and lack effective evaluation methods for multi - object joint prediction. Therefore, new evaluation metrics need to be introduced to comprehensively measure the performance of joint prediction models. To solve these problems, the authors proposed a large - scale interactive motion data set named **WAYMO OPEN MOTION DATASET** and made the following contributions: - **Large - scale interaction data set**: This data set contains more than 100,000 scenes, each 20 seconds long, with a total duration of more than 570 hours, covering 1,750 kilometers of roads, and is collected from driving data in six cities in the United States. The data set contains rich interaction scenarios, such as vehicle - pedestrian interactions, vehicle - vehicle interactions, etc. - **High - quality annotations**: Use a high - precision 3D automatic annotation system to generate high - quality 3D bounding boxes and provide corresponding high - definition 3D maps. - **New evaluation metrics**: Introduce a set of new evaluation metrics, including minimum average displacement error (minADE), minimum final displacement error (minFDE), overlap rate (OR), missing rate (MR) and mean average precision (mAP), to comprehensively evaluate the performance of single - object and multi - object joint prediction models. - **Baseline models**: Provide powerful baseline models for single - object prediction and joint prediction to help researchers better understand and improve existing motion prediction models. Through these contributions, the authors hope that this new large - scale interactive motion data set can provide new opportunities and directions for the development of motion prediction models.

Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset

WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

INT2: Interactive Trajectory Prediction at Intersections

Motion Forecasting in Continuous Driving

Scalability in Perception for Autonomous Driving: Waymo Open Dataset.

INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps

SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts

EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving

ReCoAt: A Deep Learning-based Framework for Multi-Modal Motion Prediction in Autonomous Driving Application

Accurate Trajectory Extraction of Dynamic Targets for Driving Behaviour Analysis

Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving

Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving

ROAD-Waymo: Action Awareness at Scale for Autonomous Driving

Constructing a Highly Interactive Vehicle Motion Dataset

A multi-modal spatial–temporal model for accurate motion forecasting with visual fusion

Learning Interaction-aware Motion Prediction Model for Decision-making in Autonomous Driving

Adaptive Visual Interaction Based Multi-Target Future State Prediction For Autonomous Driving Vehicles

WOMD-Reasoning: A Large-Scale Dataset and Benchmark for Interaction and Intention Reasoning in Driving

CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal Relationships

Learn to Predict How Humans Manipulate Large-Sized Objects From Interactive Motions