Abstract:A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. This paper introduces OccFusion, a novel sensor fusion framework for predicting 3D occupancy. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes and semanticKITTI dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in autonomous vehicles (AVs), the existing 3D semantic occupancy prediction models mainly rely on surround - view camera images, which makes them perform unstably when the lighting and weather conditions change. Specifically: 1. **Lighting and Weather Sensitivity**: The perception ability of surround - view cameras highly depends on lighting conditions (such as at night) and weather conditions (such as rain, heavy fog), and these factors will lead to inconsistent performance of the model in different scenarios, with potential safety risks. 2. **Multi - sensor Fusion Requirement**: In order to improve the accuracy and robustness of 3D semantic occupancy prediction, it is necessary to integrate information from other sensors (such as LiDAR and millimeter - wave radar). To solve the above problems, the paper proposes a multi - sensor fusion framework named OccFusion. By integrating the feature information from surround - view cameras, LiDAR, and millimeter - wave radar, OccFusion aims to improve the accuracy and robustness of 3D semantic occupancy prediction, thereby ensuring that autonomous vehicles can operate safely and reliably under various environmental conditions. ### Main Contributions of OccFusion: - **Multi - sensor Fusion Framework**: Proposed a multi - sensor fusion framework that integrates camera, LiDAR, and radar information for the 3D semantic occupancy prediction task. - **Comparison with Existing Methods**: Verified the advantages of the multi - sensor fusion method in the 3D semantic occupancy prediction task through experiments. - **Ablation Study**: Conducted extensive ablation experiments to evaluate the performance gains of different sensor combinations under challenging lighting and weather conditions. - **Perception Range Analysis**: Analyzed in detail the influence of different perception ranges on the model performance, considering various sensor combinations and challenging scenarios. ### Conclusion: By introducing multi - sensor fusion technology, OccFusion significantly improves the accuracy and robustness of 3D semantic occupancy prediction, especially performing well in complex environments such as at night and on rainy days. This improvement is of great significance for enhancing the safety and reliability of autonomous vehicles.

OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction

OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction

OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction

AFOcc: Multi-Modal Semantic Occupancy Prediction with Accurate Fusion

Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction

MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

PMAFusion: Projection-Based Multi-Modal Alignment for 3D Semantic Occupancy Prediction

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

AdaptiveOcc: Adaptive Octree-based Network for Multi-Camera 3D Semantic Occupancy Prediction in Autonomous Driving

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

$α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

AdaOcc: Adaptive-Resolution Occupancy Prediction

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

MonoOcc: Digging into Monocular Semantic Occupancy Prediction

Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation

Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving