Abstract:Robust local cross-domain feature descriptors of 2D images and 3D point clouds play an important role in 2D and 3D vision applications, e.g. augmented Reality (AR) and robot navigation. Essentially, the robust local cross-domain feature descriptors have the potential to establish a spatial relationship between 2D space and 3D space. However, it is challenging for manual-based or traditional deep learning-based methods to represent the invariant cross-domain feature descriptors between 2D images and 3D point clouds. Specifically, the mainstream point cloud deep learning network is used to extract the global structure information of the scene. Due to the dimensional difference, there is a large gap between the two-dimensional picture and the three-dimensional structure feature in feature accommodation. In this paper, based on the 2D image patch and 3D point cloud volume dataset, a novel network, 2D3D-MVPNet, is proposed to jointly learn robust local cross-domain feature descriptors between 2D images and 3D point clouds. The 2D3D-MVPNet contains a point cloud branch and an image branch, which are optimized with triplet loss and a second-order similarity regularization. Specifically, for the point cloud branch, first, a novel point cloud feature descriptor extractor, named the image-based point cloud encoder, is introduced to learn a local 3D feature descriptor consistent with the local 2D feature descriptor, so that the local 3D feature descriptors contain both geometry and colour texture information. Second, to overcome the challenge of random order of projected image inputs, a symmetric function is introduced to deal with the feature combination of point cloud projections. Experiments show that the local cross-domain feature descriptors of 2D images and 3D point clouds learned by 2D3D-MVPNet achieve extraordinary 2D to 3D retrieval performance. In addition, several 3D point cloud registration results demonstrate the effectiveness of the image-based point cloud encoder.

Multi-View PointNet for 3D Scene Understanding

Multi-View Vision Fusion Network: Can 2D Pre-Trained Model Boost 3D Point Cloud Data-Scarce Learning?

MVP-Net: Multiple View Pointwise Semantic Segmentation of Large-Scale Point Clouds

MPVNN: Multi-resolution Point-Voxel Non-parametric Network for 3D Point Cloud Processing

PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition.

Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

Multi Voxel-Point Neurons Convolution (MVPConv) for Fast and Accurate 3D Deep Learning

MFFNet: Multimodal Feature Fusion Network for Point Cloud Semantic Segmentation

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

Multi Point-Voxel Convolution (MPVConv) for Deep Learning on Point Clouds

MVX-Net: Multimodal VoxelNet for 3D Object Detection

MANet: Multimodal Attention Network based Point- View fusion for 3D Shape Recognition

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

MVG-Net: LiDAR Point Cloud Semantic Segmentation Network Integrating Multi-View Images

Visibility-Aware Point-Based Multi-View Stereo Network

Multi-feature Fusion VoteNet for 3D Object Detection

2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds

End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

Multimodal Virtual Point 3D Detection

VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion