Small but Mighty: Enhancing 3D Point Clouds Semantic Segmentation with U-Next Framework

Ziyin Zeng,Qingyong Hu,Zhong Xie,Jian Zhou,Yongyang Xu
2023-04-03
Abstract:We study the problem of semantic segmentation of large-scale 3D point clouds. In recent years, significant research efforts have been directed toward local feature aggregation, improved loss functions and sampling strategies. While the fundamental framework of point cloud semantic segmentation has been largely overlooked, with most existing approaches rely on the U-Net architecture by default. In this paper, we propose U-Next, a small but mighty framework designed for point cloud semantic segmentation. The key to this framework is to learn multi-scale hierarchical representations from semantically similar feature maps. Specifically, we build our U-Next by stacking multiple U-Net $L^1$ codecs in a nested and densely arranged manner to minimize the semantic gap, while simultaneously fusing the feature maps across scales to effectively recover the fine-grained details. We also devised a multi-level deep supervision mechanism to further smooth gradient propagation and facilitate network optimization. Extensive experiments conducted on three large-scale benchmarks including S3DIS, Toronto3D, and SensatUrban demonstrate the superiority and the effectiveness of the proposed U-Next architecture. Our U-Next architecture shows consistent and visible performance improvements across different tasks and baseline models, indicating its great potential to serve as a general framework for future research.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of semantic segmentation of large - scale 3D point clouds. Specifically, the authors point out that although a great deal of research efforts have been made in recent years in terms of local feature aggregation, improved loss functions and sampling strategies, the basic framework of 3D point - cloud semantic segmentation has been widely ignored, and most existing methods rely on the U - Net architecture by default. However, the U - Net architecture has some inherent limitations when dealing with 3D point clouds, such as: 1. **Semantic gap**: Although the skip connections in the U - Net architecture can combine the feature maps of the encoder and the decoder, these feature maps are not semantically similar, resulting in a poor combination effect. 2. **Information loss**: The disorder and irregularity of 3D point clouds make it easy to lose a large amount of information during down - sampling and up - sampling. 3. **Difficult optimization**: Due to the large differences between feature maps at different levels, simply aggregating multi - scale features may increase the difficulty of optimization. To solve these problems, the authors propose a new framework named U - Next, which stacks multiple basic U - Net L1 sub - networks and introduces a multi - level deep supervision mechanism to minimize the semantic gap and effectively restore fine - grained details. ### Main contributions 1. **In - depth analysis of the U - Net architecture**: The authors conduct an in - depth analysis of the evolution of U - Net and its variants in the field of 3D point clouds, and identify the U - Net L1 sub - network as a suitable component for fine - grained point - cloud segmentation. 2. **Propose the U - Next framework**: By stacking multiple basic U - Net L1 sub - networks and introducing multi - level deep supervision, a general and effective segmentation architecture is constructed. 3. **Experimental verification**: Extensive experiments are carried out on three large - scale benchmark datasets (S3DIS, Toronto3D and SensatUrban) to verify the effectiveness and generality of the U - Next architecture. ### Experimental results - **S3DIS dataset**: On the S3DIS dataset, RandLA - Net using the U - Next framework has increased by 1.5% and 3.2% in overall accuracy (OA) and mean intersection - over - union (mIoU) respectively, reaching 89.5% and 73.2%. - **Toronto3D dataset**: On the Toronto3D dataset, the U - Next framework also shows a significant performance improvement. - **SensatUrban dataset**: On the SensatUrban dataset, the U - Next framework also achieves a significant performance improvement. ### Conclusion The U - Next framework effectively solves some key problems in 3D point - cloud semantic segmentation by improving the U - Net architecture, demonstrating its wide applicability and effectiveness on different tasks and baseline models.