Learning Enriched Feature Descriptor for Image Matching and Visual Measurement

Yakun Ju,Junyu Dong,Sen Wang,Y. Rao,Feng Gao,H. Fan
DOI: https://doi.org/10.1109/TIM.2023.3249237
IF: 5.6
IEEE Transactions on Instrumentation and Measurement
Abstract:Recent feature descriptor research has witnessed tremendous progress with the development of the deep neural network. However, most existing descriptors solely focus on learning strong discriminativeness with deep-invariant features, neglecting their representation ability and rich hierarchical clues hidden in images, which could further establish high-quality matches via implicit hierarchical comparisons in rich representative descriptor space. In this article, we consider both the discriminative and representation ability of feature descriptors to enrich the descriptor space with a novel representative learning framework. On the one hand, we introduce histogram of oriented gradient (HOG) as a prior term to guide our descriptor to learn a powerful representation and robustness in a self-supervised manner. On the other hand, we present an adaptive triplet loss (ATL), which penalizes the triplet loss (TL) according to the descriptor matching distances in order to encourage our descriptor to learn strong discriminativeness. Moreover, to fully use the information encapsulated in images and boost the representation ability, we propose a novel HIerarchical Feature Transformer Network (HIFT), which derives dense descriptions from the semantic and cross-scale-enhanced hierarchical features in a local-to-global manner. Extensive experiments on popular feature matching and visual localization benchmarks show that the HIFT achieves highly competitive performance compared with the state-of-the-art methods. Applications on visual measurement tasks of visual 3-D reconstruction and ego-motion estimation also demonstrate the high generalization ability of our method. Our model is available at https://github.com/Ray2OUC/HIFT.
Computer Science
What problem does this paper attempt to address?