Abstract:3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.

Weakly supervised point cloud semantic segmentation with the fusion of heterogeneous network features

Weakly Supervised Semantic Segmentation of Point Cloud Scenes Using Boundary-based Feature Aggregation

Weakly Supervised Point Cloud Segmentation Via Deep Morphological Semantic Information Embedding

PointMS: Semantic Segmentation for Point Cloud Based on Multi-scale Directional Convolution

Superpoint-guided Semi-supervised Semantic Segmentation of 3D Point Clouds

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

A Multi-phase Camera-LiDAR Fusion Network for 3D Semantic Segmentation with Weak Supervision

Associate Semantic-Instance Segmentation of 3D Point Clouds Based on Local Feature Extraction

Dense Supervision Propagation for Weakly Supervised Semantic Segmentation on 3D Point Clouds

SIESEF-FusionNet: Spatial Inter-correlation Enhancement and Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic Segmentation

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud

Weakly-Supervised Point Cloud Semantic Segmentation Based on Dilated Region

Robust 3D Semantic Segmentation Method Based on Multi-Modal Collaborative Learning

SSPC-Net: Semi-supervised Semantic 3D Point Cloud Segmentation Network

MFFNet: Multimodal Feature Fusion Network for Point Cloud Semantic Segmentation

Weakly Supervised 3D Segmentation Via Receptive-Driven Pseudo Label Consistency and Structural Consistency.

MLFNet- Point Cloud Semantic Segmentation Convolution Network Based on Multi-Scale Feature Fusion

Distribution Guidance Network for Weakly Supervised Point Cloud Semantic Segmentation

Learning Inter-Superpoint Affinity for Weakly Supervised 3D Instance Segmentation

Dual fusion network for semantic segmentation of point clouds

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation