Li Li,Hubert P. H. Shum,Toby P. Breckon
Abstract:3D point clouds play a pivotal role in outdoor scene perception, especially in the context of autonomous driving. Recent advancements in 3D LiDAR segmentation often focus intensely on the spatial positioning and distribution of points for accurate segmentation. However, these methods, while robust in variable conditions, encounter challenges due to sole reliance on coordinates and point intensity, leading to poor isometric invariance and suboptimal segmentation. To tackle this challenge, our work introduces Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture. Our RAPiD features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize inherent LiDAR isotropic radiation and semantic categorization for enhanced local representation and computational efficiency, while incorporating a 4D distance metric that integrates geometric and surface material reflectivity for improved semantic segmentation. To effectively embed high-dimensional RAPiD features, we propose a double-nested autoencoder structure with a novel class-aware embedding objective to encode high-dimensional features into manageable voxel-wise embeddings. Additionally, we propose RAPiD-Seg which incorporates a channel-wise attention fusion and two effective RAPiD-Seg variants, further optimizing the embedding for enhanced performance and generalization. Our method outperforms contemporary LiDAR segmentation work in terms of mIoU on SemanticKITTI (76.1) and nuScenes (83.6) datasets.
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve
This paper aims to address some key challenges in 3D LiDAR point cloud segmentation, particularly the lack of robustness in existing methods when dealing with rigid transformations (such as rotation and translation), different viewpoints, point density variations, and occlusions. Specifically, existing methods mainly rely on point coordinates and intensity data, which leads to poor isometric invariance and suboptimal segmentation results. To address these issues, the authors introduce a new feature representation method—Range-Aware Pointwise Distance Distribution (RAPiD) feature, and propose the corresponding RAPiD-Seg architecture.
### Main Contributions
1. **Proposed a new Range-Aware Pointwise Distance Distribution (RAPiD) feature**: This feature captures local geometric structures within specific regions of interest (ROI), exhibiting invariance to rigid transformations and adaptability to changes in point cloud sparsity.
2. **Designed a new embedding method**: By using a class-aware dual-layer nested autoencoder (RAPiDAE), the high-dimensional RAPiD features are compressed into manageable voxel-level embeddings, balancing efficiency and fidelity.
3. **Proposed a novel open-source network architecture RAPiD-Seg**: This architecture achieves state-of-the-art performance on the SemanticKITTI and nuScenes datasets (mIoU of 76.1 and 83.6, respectively) through a modular LiDAR segmentation approach.
### Key Technologies of the Solution
- **RAPiD Feature**: By calculating the distance distribution of each point within its specific ROI, combining geometric and surface material reflectance differences, a 4D distance metric is generated, enhancing the accuracy of semantic segmentation.
- **Dual-layer Nested Autoencoder**: Utilizing a class-aware objective function to optimize the embedding of high-dimensional features, improving computational efficiency and feature representation fidelity.
- **Channel Attention Fusion Mechanism**: Effectively fuses different LiDAR point attributes through a channel attention mechanism, emphasizing information-rich features and suppressing irrelevant features.
- **Two RAPiD-Seg Variants**: R-RAPiD-Seg and C-RAPiD-Seg utilize R-RAPiD and C-RAPiD features, respectively, with the former used for fast 3D segmentation and the latter further improving performance through pre-trained AE and backbone networks.
### Experimental Results
The authors conducted extensive experiments on the SemanticKITTI and nuScenes datasets, showing that RAPiD-Seg outperforms existing state-of-the-art methods across multiple metrics, particularly excelling in mIoU.
### Conclusion
By introducing the RAPiD feature and RAPiD-Seg architecture, this paper effectively addresses robustness issues in 3D LiDAR point cloud segmentation, particularly in handling rigid transformations, different viewpoints, point density variations, and occlusions. These innovations provide significant technical support for outdoor scene understanding in fields such as autonomous driving.