Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

Yu Yang,Jianbiao Mei,Yukai Ma,Siliang Du,Wenqing Chen,Yijie Qian,Yuxiang Feng,Yong Liu
2024-10-12
Abstract:World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D forecasting world model to end-to-end planning for autonomous driving. Specifically, we first introduce a semantic and motion-conditional normalization in the memory module, which accumulates semantic and dynamic information from historical BEV embeddings. These BEV features are then conveyed to the world decoder for future occupancy and flow forecasting, considering both geometry and spatiotemporal modeling. Additionally, we propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation and facilitate a broader range of downstream applications. Furthermore, we explore integrating the generative capabilities of the 4D world model with end-to-end planning, enabling continuous forecasting of future states and the selection of optimal trajectories using an occupancy-based cost function. Extensive experiments on the nuScenes dataset demonstrate that our method can generate plausible and controllable 4D occupancy, opening new avenues for driving world generation and end-to-end planning.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problems of future state prediction and end - to - end planning in autonomous driving. Specifically, the paper proposes a vision - centered 4D occupancy prediction and planning world model named **Drive - OccWorld**. By integrating historical observation data and multiple action conditions, this model can achieve accurate prediction of future environmental states and perform continuous trajectory planning on this basis. The following are the main objectives of the paper: 1. **4D occupancy prediction**: - Predict the occupancy and flow of objects in the future environment, including geometric and spatio - temporal modeling. - Generate future occupancy and flow predictions through historical BEV (Bird - Eye View) embeddings and a world decoder. 2. **Controllable generation**: - Inject flexible action conditions, such as speed, steering angle, trajectory, and high - level commands, enabling the model to generate controllable future states. - Integrate multiple action conditions through a unified conditional interface to enhance the controllability of the model and the extensiveness of downstream applications. 3. **End - to - end planning**: - Combine the generation ability of the world model with a planner to achieve continuous future prediction and trajectory selection. - Use an occupancy - based cost function to select the optimal trajectory and ensure the safe navigation of the vehicle in complex environments. ### Main contributions 1. **Propose Drive - OccWorld**: - Design a vision - centered world model for 4D occupancy and flow prediction. - Explore the integration of the future prediction ability of the world model with end - to - end planning. 2. **Semantic and motion condition normalization module**: - Design a simple and efficient semantic and motion condition normalization module to enhance prediction and planning performance. 3. **Unified conditional interface**: - Provide a unified conditional interface that integrates multiple action conditions, enhances the controllability of the model, and supports a wider range of downstream applications. ### Experimental results 1. **4D occupancy and flow prediction**: - On the nuScenes dataset, Drive - OccWorld improves the mIoU f by 2.0% compared to existing methods in the dilated occupancy and flow prediction tasks. - In the fine - grained occupancy prediction task, the mIoU of Drive - OccWorld at current and future timestamps is 1.6% and 1.1% higher than that of Cam4DOcc respectively. 2. **End - to - end planning**: - When using the ground - truth trajectory as an action condition, the planning results of Drive - OccWorld are better than those using the predicted trajectory. - Drive - OccWorld performs excellently in the L2 distance and collision rate metrics, showing its potential in safe navigation. ### Conclusion By proposing Drive - OccWorld, this paper successfully solves the problems of future state prediction and end - to - end planning in autonomous driving. The model not only performs well in occupancy and flow prediction but also significantly improves the safety and robustness of autonomous driving through controllable generation and end - to - end planning.