Abstract:The task of motion prediction is pivotal for autonomous driving systems, providing crucial data to choose a vehicle behavior strategy within its surroundings. Existing motion prediction techniques primarily focus on predicting the future trajectory of each agent in the scene individually, utilizing its past trajectory data. In this paper, we introduce an end-to-end neural network methodology designed to predict the future behaviors of all dynamic objects in the environment. This approach leverages the occupancy map and the scene's motion flow. We are investigatin various alternatives for constructing a deep encoder-decoder model called OFMPNet. This model uses a sequence of bird's-eye-view road images, occupancy grid, and prior motion flow as input data. The encoder of the model can incorporate transformer, attention-based, or convolutional units. The decoder considers the use of both convolutional modules and recurrent blocks. Additionally, we propose a novel time-weighted motion flow loss, whose application has shown a substantial decrease in end-point error. Our approach has achieved state-of-the-art results on the Waymo Occupancy and Flow Prediction benchmark, with a Soft IoU of 52.1% and an AUC of 76.75% on Flow-Grounded Occupancy.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: predicting the future behaviors of all dynamic objects in an urban environment, including the future occupancy of currently visible and currently invisible vehicles and the future flow of all vehicles. Specifically, the paper aims to predict the future behaviors of these dynamic objects by introducing an end - to - end neural network method (OFMPNet), using occupancy maps and scene motion flows. This problem is crucial for autonomous driving systems because it provides key data for choosing vehicle behavior strategies. ### Main Tasks 1. **Future Occupancy Prediction of Currently Visible Vehicles**: - Given the historical information of all agents in the past \(T_h\) time steps, predict the occupancy grid of currently visible vehicles within the next \(N\) seconds. - Each occupancy grid is an array of \(m\times m\times1\), with values ranging from [0, 1], representing the probability that a currently visible vehicle occupies this grid cell. 2. **Future Occupancy Prediction of Currently Invisible Vehicles**: - Also given the historical information of all agents in the past \(T_h\) time steps, predict the occupancy grid of currently invisible vehicles within the next \(N\) seconds. - Each occupancy grid is also an array of \(m\times m\times1\), with values ranging from [0, 1], representing the probability that a currently invisible vehicle occupies this grid cell. 3. **Future Motion Flow Prediction of All Vehicles**: - Predict the motion flow of all vehicles (currently visible or invisible) within the next \(N\) seconds. - Each motion flow field is an array of \(m\times m\times2\), containing (dx, dy) values, representing the displacement of the vehicle part within this grid cell. ### Method Overview To achieve the above tasks, the paper proposes three OFMPNet models with different architectures: - **OFMPNet - Swin**: Combines Swin Transformer and LSTM units for feature extraction. - **OFMPNet - ULSTM**: Replaces the residual convolutional layers in U - Net with LSTM blocks to capture flow features. - **OFMPNet - R2AttU - T2**: A dual - recursive residual convolutional neural network designed based on the U - Net encoder - decoder architecture and introduces an attention mechanism. In addition, the paper also introduces a new time - weighted motion flow loss function to reduce the end - point error and has achieved state - of - the - art results on the Waymo Occupancy and Flow Prediction benchmark. ### Main Contributions 1. Proposes a new deep encoder - decoder model OFMPNet for occupancy and flow prediction problems. 2. Introduces time - weighted loss as part of the occupancy - flow loss in multi - task learning, improving the performance of the motion flow prediction task. 3. Conducts training, validation, and testing on the Waymo Open Motion dataset, with performance comparable to existing state - of - the - art methods. Through these methods, the paper provides a powerful framework that can accurately predict the future behaviors of dynamic objects in complex urban environments, thus providing more reliable support for autonomous driving systems.

OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment

Occupancy Flow Fields for Motion Forecasting in Autonomous Driving

Flow-guided Motion Prediction with Semantics and Dynamic Occupancy Grid Maps

StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving

Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

HGNET: A Hierarchical Feature Guided Network for Occupancy Flow Field Prediction

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation

End-to-End Interactive Prediction and Planning with Optical Flow Distillation for Autonomous Driving

Urban Flow Prediction with Spatial–temporal Neural ODEs

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling

Motion Perceiver: Real-Time Occupancy Forecasting for Embedded Systems

Efficient Baselines for Motion Prediction in Autonomous Driving

Predicting Future Spatiotemporal Occupancy Grids with Semantics for Autonomous Driving

Deep Learning of Spatiotemporal Patterns for Urban Mobility Prediction Using Big Data

Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps