Abstract:Existing learning-based point feature descriptors are usually task-agnostic, which pursue describing the individual 3D point clouds as accurate as possible. However, the matching task aims at describing the corresponding points consistently across different 3D point clouds. Therefore these too accurate features may play a counterproductive role due to the inconsistent point feature representations of correspondences caused by the unpredictable noise, partiality, deformation, etc., in the local geometry. In this paper, we propose to learn a robust task-specific feature descriptor to consistently describe the correct point correspondence under interference. Born with an Encoder and a Dynamic Fusion module, our method EDFNet develops from two aspects. First, we augment the matchability of correspondences by utilizing their repetitive local structure. To this end, a special encoder is designed to exploit two input point clouds jointly for each point descriptor. It not only captures the local geometry of each point in the current point cloud by convolution, but also exploits the repetitive structure from paired point cloud by Transformer. Second, we propose a dynamical fusion module to jointly use different scale features. There is an inevitable struggle between robustness and discriminativeness of the single scale feature. Specifically, the small scale feature is robust since little interference exists in this small receptive field. But it is not sufficiently discriminative as there are many repetitive local structures within a point cloud. Thus the resultant descriptors will lead to many incorrect matches. In contrast, the large scale feature is more discriminative by integrating more neighborhood information. But it is easier to be disturbed since there is much more interference in the large receptive field. Compared with the conventional fusion strategy that handles multiple scale features equally,- we analyze the consistency of them to judge the clean ones and perform larger aggregation weights on them during fusion. Then, a robust and discriminative feature descriptor is achieved by focusing on multiple clean scale features. Extensive evaluations validate that EDFNet learns a task-specific descriptor, which achieves state-of-the-art or comparable performance for robust matching of 3D point clouds.

Learning General Descriptors for Image Matching with Regression Feedback

Learning Enriched Feature Descriptor for Image Matching and Visual Measurement

Learning General Feature Descriptor for Visual Measurement with Hierarchical View Consistency

Learning Local Feature Descriptors Through Ranking Losses Improved by Variance Shrinkage

Towards Self-Similarity Consistency and Feature Discrimination for Unsupervised Domain Adaptation.

Learning Local Event-based Descriptor for Patch-based Stereo Matching

DualRC: A Dual-Resolution Learning Framework with Neighbourhood Consensus for Visual Correspondences

A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images

Local Image Descriptors with Statistical Losses

Kernelized Subspace Pooling for Deep Local Descriptors.

OD-Net: Orthogonal descriptor network for multiview image keypoint matching

Deep learning feature representation for image matching under large viewpoint and viewing direction change

Dense correspondence through descriptor matching

Learning a Task-Specific Descriptor for Robust Matching of 3D Point Clouds

CDbin: Compact Discriminative Binary Descriptor Learned With Efficient Neural Network

Deep Unsupervised Binary Descriptor Learning Through Locality Consistency and Self Distinctiveness

Digging Into Self-Supervised Learning of Feature Descriptors

Deep modality independent descriptor learning for optical and sar image patch matching

Improving the generalization of network based relative pose regression: dimension reduction as a regularizer

Online Invariance Selection for Local Feature Descriptors

Attention Guided Invariance Selection for Local Feature Descriptors