OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment

Youshaa Murhij,Dmitry Yudin
2024-04-03
Abstract:The task of motion prediction is pivotal for autonomous driving systems, providing crucial data to choose a vehicle behavior strategy within its surroundings. Existing motion prediction techniques primarily focus on predicting the future trajectory of each agent in the scene individually, utilizing its past trajectory data. In this paper, we introduce an end-to-end neural network methodology designed to predict the future behaviors of all dynamic objects in the environment. This approach leverages the occupancy map and the scene's motion flow. We are investigatin various alternatives for constructing a deep encoder-decoder model called OFMPNet. This model uses a sequence of bird's-eye-view road images, occupancy grid, and prior motion flow as input data. The encoder of the model can incorporate transformer, attention-based, or convolutional units. The decoder considers the use of both convolutional modules and recurrent blocks. Additionally, we propose a novel time-weighted motion flow loss, whose application has shown a substantial decrease in end-point error. Our approach has achieved state-of-the-art results on the Waymo Occupancy and Flow Prediction benchmark, with a Soft IoU of 52.1% and an AUC of 76.75% on Flow-Grounded Occupancy.
Computer Vision and Pattern Recognition,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: predicting the future behaviors of all dynamic objects in an urban environment, including the future occupancy of currently visible and currently invisible vehicles and the future flow of all vehicles. Specifically, the paper aims to predict the future behaviors of these dynamic objects by introducing an end - to - end neural network method (OFMPNet), using occupancy maps and scene motion flows. This problem is crucial for autonomous driving systems because it provides key data for choosing vehicle behavior strategies. ### Main Tasks 1. **Future Occupancy Prediction of Currently Visible Vehicles**: - Given the historical information of all agents in the past \(T_h\) time steps, predict the occupancy grid of currently visible vehicles within the next \(N\) seconds. - Each occupancy grid is an array of \(m\times m\times1\), with values ranging from [0, 1], representing the probability that a currently visible vehicle occupies this grid cell. 2. **Future Occupancy Prediction of Currently Invisible Vehicles**: - Also given the historical information of all agents in the past \(T_h\) time steps, predict the occupancy grid of currently invisible vehicles within the next \(N\) seconds. - Each occupancy grid is also an array of \(m\times m\times1\), with values ranging from [0, 1], representing the probability that a currently invisible vehicle occupies this grid cell. 3. **Future Motion Flow Prediction of All Vehicles**: - Predict the motion flow of all vehicles (currently visible or invisible) within the next \(N\) seconds. - Each motion flow field is an array of \(m\times m\times2\), containing (dx, dy) values, representing the displacement of the vehicle part within this grid cell. ### Method Overview To achieve the above tasks, the paper proposes three OFMPNet models with different architectures: - **OFMPNet - Swin**: Combines Swin Transformer and LSTM units for feature extraction. - **OFMPNet - ULSTM**: Replaces the residual convolutional layers in U - Net with LSTM blocks to capture flow features. - **OFMPNet - R2AttU - T2**: A dual - recursive residual convolutional neural network designed based on the U - Net encoder - decoder architecture and introduces an attention mechanism. In addition, the paper also introduces a new time - weighted motion flow loss function to reduce the end - point error and has achieved state - of - the - art results on the Waymo Occupancy and Flow Prediction benchmark. ### Main Contributions 1. Proposes a new deep encoder - decoder model OFMPNet for occupancy and flow prediction problems. 2. Introduces time - weighted loss as part of the occupancy - flow loss in multi - task learning, improving the performance of the motion flow prediction task. 3. Conducts training, validation, and testing on the Waymo Open Motion dataset, with performance comparable to existing state - of - the - art methods. Through these methods, the paper provides a powerful framework that can accurately predict the future behaviors of dynamic objects in complex urban environments, thus providing more reliable support for autonomous driving systems.