Abstract:We study the problem of semantic segmentation of large-scale 3D point clouds. In recent years, significant research efforts have been directed toward local feature aggregation, improved loss functions and sampling strategies. While the fundamental framework of point cloud semantic segmentation has been largely overlooked, with most existing approaches rely on the U-Net architecture by default. In this paper, we propose U-Next, a small but mighty framework designed for point cloud semantic segmentation. The key to this framework is to learn multi-scale hierarchical representations from semantically similar feature maps. Specifically, we build our U-Next by stacking multiple U-Net $L^1$ codecs in a nested and densely arranged manner to minimize the semantic gap, while simultaneously fusing the feature maps across scales to effectively recover the fine-grained details. We also devised a multi-level deep supervision mechanism to further smooth gradient propagation and facilitate network optimization. Extensive experiments conducted on three large-scale benchmarks including S3DIS, Toronto3D, and SensatUrban demonstrate the superiority and the effectiveness of the proposed U-Next architecture. Our U-Next architecture shows consistent and visible performance improvements across different tasks and baseline models, indicating its great potential to serve as a general framework for future research.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of semantic segmentation of large - scale 3D point clouds. Specifically, the authors point out that although a great deal of research efforts have been made in recent years in terms of local feature aggregation, improved loss functions and sampling strategies, the basic framework of 3D point - cloud semantic segmentation has been widely ignored, and most existing methods rely on the U - Net architecture by default. However, the U - Net architecture has some inherent limitations when dealing with 3D point clouds, such as: 1. **Semantic gap**: Although the skip connections in the U - Net architecture can combine the feature maps of the encoder and the decoder, these feature maps are not semantically similar, resulting in a poor combination effect. 2. **Information loss**: The disorder and irregularity of 3D point clouds make it easy to lose a large amount of information during down - sampling and up - sampling. 3. **Difficult optimization**: Due to the large differences between feature maps at different levels, simply aggregating multi - scale features may increase the difficulty of optimization. To solve these problems, the authors propose a new framework named U - Next, which stacks multiple basic U - Net L1 sub - networks and introduces a multi - level deep supervision mechanism to minimize the semantic gap and effectively restore fine - grained details. ### Main contributions 1. **In - depth analysis of the U - Net architecture**: The authors conduct an in - depth analysis of the evolution of U - Net and its variants in the field of 3D point clouds, and identify the U - Net L1 sub - network as a suitable component for fine - grained point - cloud segmentation. 2. **Propose the U - Next framework**: By stacking multiple basic U - Net L1 sub - networks and introducing multi - level deep supervision, a general and effective segmentation architecture is constructed. 3. **Experimental verification**: Extensive experiments are carried out on three large - scale benchmark datasets (S3DIS, Toronto3D and SensatUrban) to verify the effectiveness and generality of the U - Next architecture. ### Experimental results - **S3DIS dataset**: On the S3DIS dataset, RandLA - Net using the U - Next framework has increased by 1.5% and 3.2% in overall accuracy (OA) and mean intersection - over - union (mIoU) respectively, reaching 89.5% and 73.2%. - **Toronto3D dataset**: On the Toronto3D dataset, the U - Next framework also shows a significant performance improvement. - **SensatUrban dataset**: On the SensatUrban dataset, the U - Next framework also achieves a significant performance improvement. ### Conclusion The U - Next framework effectively solves some key problems in 3D point - cloud semantic segmentation by improving the U - Net architecture, demonstrating its wide applicability and effectiveness on different tasks and baseline models.

Small but Mighty: Enhancing 3D Point Clouds Semantic Segmentation with U-Next Framework

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

PointMS: Semantic Segmentation for Point Cloud Based on Multi-scale Directional Convolution

3D Semantic Segmentation Using Deep Learning for Large-Scale Indoor Point Cloud

Associate Semantic-Instance Segmentation of 3D Point Clouds Based on Local Feature Extraction

A Multi-scale Network for Semantic Segmentation of 3D Point Clouds

3D Object Segmentation Using Cross-Window Point Transformer with Latent Semantic Boundary Guidance

Semantic Segmentation of Point Cloud Scene via Multi-Scale Feature Aggregation and Adaptive Fusion

PointNest: Learning Deep Multiscale Nested Feature Propagation for Semantic Segmentation of 3-D Point Clouds

Exploring Dual Representations in Large-Scale Point Clouds: A Simple Weakly Supervised Semantic Segmentation Framework

Dilated Nearest-Neighbor Encoding for 3D Semantic Segmentation of Point Clouds

Context-Aware Network for Semantic Segmentation Toward Large-Scale Point Clouds in Urban Environments

A Large-Scale Point Cloud Semantic Segmentation Network Via Local Dual Features and Global Correlations

Large-scale point cloud semantic segmentation via local perception and global descriptor vector

Multi-Scale Superpoint Network for 3D Point Cloud Semantic Segmentation.

Semantic segmentation of large-scale point clouds based on dilated nearest neighbors graph

NeiEA-NET: Semantic segmentation of large-scale point cloud scene via neighbor enhancement and aggregation

Mining Local Geometric Structure for Large-Scale 3D Point Clouds Semantic Segmentation

TempNet: Online Semantic Segmentation on Large-scale Point Cloud Series

Boosting Lidar 3D Object Detection with Point Cloud Semantic Segmentation

SEGCloud: Semantic Segmentation of 3D Point Clouds