Abstract:In this paper we study the task of a single-view image-guided point cloud completion. Existing methods have got promising results by fusing the information of image into point cloud explicitly or implicitly. However, given that the image has global shape information and the partial point cloud has rich local details, We believe that both modalities need to be given equal attention when performing modality fusion. To this end, we propose a novel dual-channel modality fusion network for image-guided point cloud completion(named DMF-Net), in a coarse-to-fine manner. In the first stage, DMF-Net takes a partial point cloud and corresponding image as input to recover a coarse point cloud. In the second stage, the coarse point cloud will be upsampled twice with shape-aware upsampling transformer to get the dense and complete point cloud. Extensive quantitative and qualitative experimental results show that DMF-Net outperforms the state-of-the-art unimodal and multimodal point cloud completion works on ShapeNet-ViPC dataset.

What problem does this paper attempt to address?

### The Problem Addressed by the Paper This paper aims to address the task of point cloud completion guided by single-view images. Existing methods have achieved satisfactory results by explicitly or implicitly integrating image information into point clouds. However, images contain global shape information, while partial point clouds contain rich local details. These two modalities need to be given equal attention during modality fusion. To this end, the authors propose a novel Dual-Channel Modality Fusion Network (DMF-Net) for point cloud completion guided by single-view images. ### Specific Problem Description 1. **Quality Issues of Point Cloud Data**: - Point cloud data is often incomplete and sparse due to self-occlusion, occlusion between objects, uneven lighting, and low scanning resolution. - These low-quality point cloud data make it difficult to understand 3D shapes, thereby affecting the performance of many downstream tasks such as point cloud detection, point cloud reconstruction, and point cloud upsampling. 2. **Limitations of Existing Methods**: - Most existing learning methods adopt an encoder-decoder architecture, where the encoder extracts global latent vectors from partial inputs, and the decoder generates complete point clouds based on the global latent representation. - However, the global information extracted from incomplete point clouds may be ambiguous and misleading. - Although some multimodal completion networks improve performance by introducing single-view images, the modality fusion process of these methods is mainly dominated by the point cloud modality, lacking reasonable utilization of the image modality. ### Solution To overcome the above issues, the authors propose DMF-Net, which has the following features: 1. **Dual-Channel Modality Fusion**: - In the encoding stage, DMF-Net proposes a dual-channel modality fusion strategy that can symmetrically capture complementary information from images and point clouds. - In this way, the image modality not only guides the point cloud modality but also complements it, ensuring that both modalities have equal importance during modality fusion. 2. **Shape-Aware Upsampling Transformer**: - In the second stage, DMF-Net utilizes a shape-aware upsampling transformer to upsample the coarse point cloud twice, generating a dense and complete point cloud. - The shape-aware upsampling transformer captures local geometric details by encouraging self-attention mechanisms among local neighborhood points. 3. **Two-Stage Generation**: - In the first stage, DMF-Net adopts an encoder-decoder architecture to generate a sparse but complete point cloud. - In the second stage, the coarse point cloud is upsampled to generate a dense and complete point cloud. ### Experimental Results Extensive quantitative and qualitative experimental results show that DMF-Net outperforms existing unimodal and multimodal point cloud completion methods on the ShapeNet-ViPC dataset. ### Conclusion By proposing DMF-Net, this paper addresses the issues of insufficient modality fusion and lack of local details in the task of point cloud completion guided by single-view images, significantly improving the quality and performance of point cloud completion.

DMF-Net: Image-Guided Point Cloud Completion with Dual-Channel Modality Fusion and Shape-Aware Upsampling Transformer

Dual-scale Point Cloud Completion Network Based on High-Frequency Feature Fusion

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

Point Cloud Completion Cascade Optimization Network Based on Feature Fusion

Adaptive Recurrent Forward Network for Dense Point Cloud Completion

Attention-based Multi-modal Fusion Network for Semantic Scene Completion.

Image-Guided Point Cloud Shape Completion Based on CTA Mechanism

CDPNet: Cross-Modal Dual Phases Network for Point Cloud Completion

DuInNet: Dual-Modality Feature Interaction for Point Cloud Completion

MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching.

FuNet: Multi-Feature Fusion for Point Cloud Completion Network

MLFT-Net: Point Cloud Completion Using MultiLevel Feature Transformer

DMFF: dual-way multimodal feature fusion for 3D object detection

Distinguishing and Matching-Aware Unsupervised Point Cloud Completion

CSDN: Cross-Modal Shape-Transfer Dual-Refinement Network for Point Cloud Completion

Stage-Aware Interaction Network for Point Cloud Completion

N-DPC: Dense 3D Point Cloud Completion Based on Improved Multi-Stage Network

Point Cloud Completion Via Self-projected View Augmentation and Implicit Field Constraint

Point Cloud Completion Via Skeleton-Detail Transformer

Research on Multi-modal Point Cloud Completion Task

Multi-feature fusion point cloud completion network