Abstract:Existing learning-based point feature descriptors are usually task-agnostic, which pursue describing the individual 3D point clouds as accurate as possible. However, the matching task aims at describing the corresponding points consistently across different 3D point clouds. Therefore these too accurate features may play a counterproductive role due to the inconsistent point feature representations of correspondences caused by the unpredictable noise, partiality, deformation, etc., in the local geometry. In this paper, we propose to learn a robust task-specific feature descriptor to consistently describe the correct point correspondence under interference. Born with an Encoder and a Dynamic Fusion module, our method EDFNet develops from two aspects. First, we augment the matchability of correspondences by utilizing their repetitive local structure. To this end, a special encoder is designed to exploit two input point clouds jointly for each point descriptor. It not only captures the local geometry of each point in the current point cloud by convolution, but also exploits the repetitive structure from paired point cloud by Transformer. Second, we propose a dynamical fusion module to jointly use different scale features. There is an inevitable struggle between robustness and discriminativeness of the single scale feature. Specifically, the small scale feature is robust since little interference exists in this small receptive field. But it is not sufficiently discriminative as there are many repetitive local structures within a point cloud. Thus the resultant descriptors will lead to many incorrect matches. In contrast, the large scale feature is more discriminative by integrating more neighborhood information. But it is easier to be disturbed since there is much more interference in the large receptive field. Compared with the conventional fusion strategy that handles multiple scale features equally,- we analyze the consistency of them to judge the clean ones and perform larger aggregation weights on them during fusion. Then, a robust and discriminative feature descriptor is achieved by focusing on multiple clean scale features. Extensive evaluations validate that EDFNet learns a task-specific descriptor, which achieves state-of-the-art or comparable performance for robust matching of 3D point clouds.

D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors Via Agent-based Transformers

D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer

HD2Reg: Hierarchical Descriptors and Detectors for Point Cloud Registration

D2Former: Dual-Domain Transformer for Change Detection in VHR Remote Sensing Images

MatchDet: A Collaborative Framework for Image Matching and Object Detection

ContextMatcher: Detector-Free Feature Matching with Cross-Modality Context

2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds

Learning Enriched Feature Descriptor for Image Matching and Visual Measurement

Dynamic Keypoint Detection Network for Image Matching

P2-Net - Joint Description and Detection of Local Features for Pixel and Point Matching.

Deep Descriptor Transforming for Image Co-Localization

Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming

A Concurrent Multiscale Detector for End-to-End Image Matching

Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints

Contextdesc: Local Descriptor Augmentation With Cross-Modality Context

Learning a Task-Specific Descriptor for Robust Matching of 3D Point Clouds

Hierarchical Context Embedding for Region-based Object Detection.

Valid assessment of writing and access to academic discourse.

HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Learning Geometric Feature Embedding with Transformers for Image Matching