OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction

Zhenxing Ming,Julie Stephany Berrio,Mao Shan,Stewart Worrall
2024-05-09
Abstract:A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. This paper introduces OccFusion, a novel sensor fusion framework for predicting 3D occupancy. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes and semanticKITTI dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in autonomous vehicles (AVs), the existing 3D semantic occupancy prediction models mainly rely on surround - view camera images, which makes them perform unstably when the lighting and weather conditions change. Specifically: 1. **Lighting and Weather Sensitivity**: The perception ability of surround - view cameras highly depends on lighting conditions (such as at night) and weather conditions (such as rain, heavy fog), and these factors will lead to inconsistent performance of the model in different scenarios, with potential safety risks. 2. **Multi - sensor Fusion Requirement**: In order to improve the accuracy and robustness of 3D semantic occupancy prediction, it is necessary to integrate information from other sensors (such as LiDAR and millimeter - wave radar). To solve the above problems, the paper proposes a multi - sensor fusion framework named OccFusion. By integrating the feature information from surround - view cameras, LiDAR, and millimeter - wave radar, OccFusion aims to improve the accuracy and robustness of 3D semantic occupancy prediction, thereby ensuring that autonomous vehicles can operate safely and reliably under various environmental conditions. ### Main Contributions of OccFusion: - **Multi - sensor Fusion Framework**: Proposed a multi - sensor fusion framework that integrates camera, LiDAR, and radar information for the 3D semantic occupancy prediction task. - **Comparison with Existing Methods**: Verified the advantages of the multi - sensor fusion method in the 3D semantic occupancy prediction task through experiments. - **Ablation Study**: Conducted extensive ablation experiments to evaluate the performance gains of different sensor combinations under challenging lighting and weather conditions. - **Perception Range Analysis**: Analyzed in detail the influence of different perception ranges on the model performance, considering various sensor combinations and challenging scenarios. ### Conclusion: By introducing multi - sensor fusion technology, OccFusion significantly improves the accuracy and robustness of 3D semantic occupancy prediction, especially performing well in complex environments such as at night and on rainy days. This improvement is of great significance for enhancing the safety and reliability of autonomous vehicles.