Learning General Descriptors for Image Matching with Regression Feedback

Yuan Rao,Yakun Ju,Cong Li,Eric Rigall,Jian Yang,Hao Fan,Junyu Dong
DOI: https://doi.org/10.1109/tcsvt.2023.3267279
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Recent advances on feature descriptors for image matching put more emphasis on encoding invariances (e.g. illumination invariance) to promote the descriptors’ discriminative power. However, according to the information entropy, more invariance implies greater certainty and less informativeness in a descriptor. Consequently, descriptors encoding too many invariances usually show poor generalization to unknown image changes, lacking enough informativeness to cover the large uncertainty in unseen scenes. This limits the application scenarios of learned descriptors. In this paper, we propose to alleviate this issue from the perspective of informativeness and we thus design hierarchical consistent constraint by introducing regression feedback in a self-supervised manner. Combined with the hardest-within-batch matching constraint, we form a novel dual supervision framework, to encourage the descriptor to learn an informative representation while maintaining a good discriminative power. Moreover, to fully mine the context information hidden in image and boost the informativeness in turn, we present AANet, a descriptor network that efficiently predicts dense description by the powerful Attentional Aggregation of multi-level features. Experiments across challenging feature matching on HPatches, RDNIM datasets, and visual localization tasks on Aachen Day-night dataset show that our method outperforms recent state-of-the-art descriptors while keeping encouraging efficiency. The application of visual 3D reconstruction on various scenarios also demonstrates the high generalization ability of our method.
What problem does this paper attempt to address?