Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

Shiqi Tan,Hamidreza Fazlali,Yixuan Xu,Yuan Ren,Bingbing Liu
2024-07-13
Abstract:Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange). Specifically, a distortion-compensating knowledge distillation (DCKD) strategy is designed to remedy the adverse effect of RV projection of RGB images. Moreover, a context-based feature fusion module is introduced for robust and preservative sensor fusion. Finally, in order to address the limited resolution of RV and its insufficiency of 3D topology, a new point refinement scheme is devised for proper aggregation of features in 2D and augmentation of point features in 3D. We evaluated the proposed method on large-scale autonomous driving datasets \ie SemanticKITTI and nuScenes. In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on improving the performance and accuracy of 3D point - cloud semantic segmentation methods based on Range - View (RV) in real - time applications. Specifically, the paper proposes improvement schemes for the following challenges: 1. **The distortion problem of RGB image projection**: Due to the sparse nature of 3D point clouds, RV projection will lead to the loss of appearance information in RGB images, thus affecting the segmentation effect. To this end, the paper proposes a Distortion - Compensating Knowledge Distillation (DCKD) strategy, which uses a pre - trained teacher model to guide the student model and reduce the information loss caused by RV projection. 2. **Multi - sensor information fusion**: In order to make full use of the advantages of both camera and LiDAR sensors, the paper designs a Context - based Feature Fusion (CFF) module, which can combine the data of these two modalities more effectively and improve the robustness and accuracy of segmentation. 3. **The performance gap in segmentation between projected points and the complete point cloud**: Since the RV method only processes a part of the input point cloud, existing post - processing techniques such as k - Nearest Neighbors (kNN) have a large performance gap when predicting the label of each 3D point. To solve this problem, the paper introduces a new point refinement scheme (Point Refinement), including two modules, Semantic - Range - Remission Feature Aggregation (SR2FA) and 3D Neighborhood - Aware Feature Augmentation (3D - NAFA), to improve the segmentation consistency between projected points and the complete point cloud. Through the above methods, the paper aims to improve the performance of the RV method in real - time 3D point - cloud semantic segmentation tasks, especially in application scenarios such as autonomous driving, to achieve higher accuracy and real - time performance. The experimental results show that the proposed LaCRange method has achieved excellent performance on both SemanticKITTI and nuScenes, two large - scale autonomous driving datasets, and has reached the state - of - the - art level especially in the nuScenes benchmark test.