Abstract:Robust local cross-domain feature descriptors of 2D images and 3D point clouds play an important role in 2D and 3D vision applications, e.g. augmented Reality (AR) and robot navigation. Essentially, the robust local cross-domain feature descriptors have the potential to establish a spatial relationship between 2D space and 3D space. However, it is challenging for manual-based or traditional deep learning-based methods to represent the invariant cross-domain feature descriptors between 2D images and 3D point clouds. Specifically, the mainstream point cloud deep learning network is used to extract the global structure information of the scene. Due to the dimensional difference, there is a large gap between the two-dimensional picture and the three-dimensional structure feature in feature accommodation. In this paper, based on the 2D image patch and 3D point cloud volume dataset, a novel network, 2D3D-MVPNet, is proposed to jointly learn robust local cross-domain feature descriptors between 2D images and 3D point clouds. The 2D3D-MVPNet contains a point cloud branch and an image branch, which are optimized with triplet loss and a second-order similarity regularization. Specifically, for the point cloud branch, first, a novel point cloud feature descriptor extractor, named the image-based point cloud encoder, is introduced to learn a local 3D feature descriptor consistent with the local 2D feature descriptor, so that the local 3D feature descriptors contain both geometry and colour texture information. Second, to overcome the challenge of random order of projected image inputs, a symmetric function is introduced to deal with the feature combination of point cloud projections. Experiments show that the local cross-domain feature descriptors of 2D images and 3D point clouds learned by 2D3D-MVPNet achieve extraordinary 2D to 3D retrieval performance. In addition, several 3D point cloud registration results demonstrate the effectiveness of the image-based point cloud encoder.

Direct Image to Point Cloud Descriptors Matching for 6-DOF Camera Localization in Dense 3D Point Cloud

Unconstrained Matching of 2D and 3D Descriptors for 6-DOF Pose Estimation

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization

A Coarse-to-Fine Algorithm for Matching and Registration in 3D Cross-Source Point Clouds

A Novel Approach to the Extraction of Key Points from 3-D Rigid Point Cloud Using 2-D Images Transformation

3D point cloud based indoor mobile robot in 6-DoF pose localization using Fast Scene Recognition and Alignment approach

Real-time Image-based 6-DOF Localization in Large-Scale Environments

Matching Algorithm for 3D Point Cloud Recognition and Registration Based on Multi-Statistics Histogram Descriptors

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Combining two point clouds generated from depth camera

Learning and Matching Multi-View Descriptors for Registration of Point Clouds

2D3D-MatchNet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud

DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization

2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds

A Novel Local Feature Descriptor and an Accurate Transformation Estimation Method for 3-D Point Cloud Registration.

InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

Learning General and Distinctive 3D Local Deep Descriptors for Point Cloud Registration

Hybrid3D: learning 3D hybrid features with point clouds and multi-view images for point cloud registration

Efficient 2D-3D Matching for Multi-Camera Visual Localization

DeepPoint3D: Learning discriminative local descriptors using deep metric learning on 3D point clouds