MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction

Zikun Xu,Jianqiang Wang,Shaobing Xu
2024-08-19
Abstract:LiDAR-based 3D occupancy prediction evolved rapidly alongside the emergence of large datasets. Nevertheless, the potential of existing diverse datasets remains underutilized as they kick in individually. Models trained on a specific dataset often suffer considerable performance degradation when deployed to real-world scenarios or datasets involving disparate LiDARs. This paper aims to develop a generalized model called MergeOcc, to simultaneously handle different LiDARs by leveraging multiple datasets. The gaps among LiDAR datasets primarily manifest in geometric disparities and semantic inconsistencies. Thus, MergeOcc incorporates a novel model featuring a geometric realignment module and a semantic label mapping module to enable multiple datasets training (MDT). The effectiveness of MergeOcc is validated through experiments on two prominent datasets for autonomous vehicles: OpenOccupancy-nuScenes and SemanticKITTI. The results demonstrate its enhanced robustness and remarkable performance across both types of LiDARs, outperforming several SOTA multi-modality methods. Notably, despite using an identical model architecture and hyper-parameter set, MergeOcc can significantly surpass the baseline due to its exposure to more diverse data. MergeOcc is considered the first cross-dataset 3D occupancy prediction pipeline that effectively bridges the domain gap for seamless deployment across heterogeneous platforms.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the domain gap between different LiDARs (Light Detection and Ranging) to improve the robustness and generalization ability of 3D occupancy prediction models on various LiDAR data. Specifically, existing LiDAR-based 3D occupancy prediction models are typically trained and tested on a specific dataset, which limits their performance in real-world applications. When these models are directly deployed on different LiDAR platforms or datasets, they often suffer significant performance degradation due to geometric differences and semantic inconsistencies between different LiDARs. ### Main Contributions 1. **MergeOcc Framework**: MergeOcc is the first comprehensive method capable of handling 3D dense perception tasks for different types of LiDARs simultaneously. The framework includes a geometric alignment module and a semantic label mapping module, which bridge the gap between different LiDARs, allowing the model to learn from a broader range of data, thereby improving its adaptability and performance in heterogeneous data domains. 2. **Experimental Validation**: Extensive experiments on two well-known occupancy prediction datasets (OpenOccupancy-nuScenes and SemanticKITTI) validate the effectiveness of MergeOcc. The results show that MergeOcc performs excellently on both types of LiDARs, with significant performance improvements compared to other state-of-the-art multimodal methods. 3. **Model Advantages**: Despite using the same set of hyperparameters and architecture, MergeOcc significantly outperforms baseline models, thanks to the superiority of the multi-dataset training (MDT) paradigm. This indicates that the current 3D occupancy datasets are insufficient in capacity, and the MDT paradigm holds great potential. ### Method Overview 1. **Geometric Alignment Module**: Ensures the model's adaptability to different LiDARs through point cloud range alignment and dataset-specific normalization operations. This module addresses the geometric differences between different LiDARs. 2. **Semantic Label Mapping Module**: Resolves semantic inconsistencies between different datasets by automatically constructing a unified label space. This module ensures consistency and accuracy in multi-dataset training. 3. **Multi-Dataset Training**: By merging multiple datasets and optimizing a unified loss function, the model can learn from a broader range of data, thereby enhancing its generalization ability. ### Experimental Results - **Compared to D.M. Methods**: MergeOcc effectively bridges the domain gap between different LiDARs, enabling the model to leverage more extensive and diverse data. On the OpenOccupancy-nuScenes dataset, MergeOcc improves geometric IoU and semantic mIoU by 14.8% and 17.2%, respectively; on the SemanticKITTI dataset, it improves by 33.4% and 12.5%, respectively. - **Compared to Single-Domain Trained Baseline Models**: Despite using the same architecture and hyperparameter set, MergeOcc shows significant performance improvements due to exposure to a broader range of data. Specifically, the in-domain performance improvements are as follows: Gnu: +9.3%, Snu: +7.1%; Gsk: +14.5%, Ssk: +9.8%. This indicates that the current 3D occupancy datasets are insufficient in capacity, and the MDT paradigm holds significant potential. - **Compared to SOTA Methods**: MergeOcc outperforms the state-of-the-art LiDAR method PointOcc by 6.1% in IoU and the state-of-the-art multimodal model M-CONet by 10.7% and 2.8% in IoU and mIoU, respectively, demonstrating its ability to extract insights from diverse data. ### Visualization Results Figure 4 shows a visual comparison of the occupancy prediction results generated by the main methods. More visualization results can be found in Appendix A.8. ### Conclusion MergeOcc successfully bridges the domain gap between different LiDARs through geometric alignment and semantic label mapping modules, enhancing the robustness and generalization ability of 3D occupancy prediction models on various LiDAR data. Experimental results show that MergeOcc performs excellently on multiple datasets, significantly outperforming existing baseline models and state-of-the-art methods.