OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Hanwen Jiang,Arjun Karpur,Bingyi Cao,Qixing Huang,Andre Araujo
2024-05-22
Abstract:The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of $7$ datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of $20.9\%$ with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by $9.5\%$ relatively.Code and model can be found at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the field of image matching, although existing methods perform well in traditional benchmark tests, their generalization ability on new and unseen image domains is limited. Specifically, while current methods perform excellently in specific visual domains (such as outdoor and indoor scenes), their performance usually drops significantly when dealing with data from different domains (for example, object - centered or aerial - view images). This limits the potential of these methods in real - world applications. Therefore, this paper proposes OmniGlue, which is the first learnable image matcher designed with generalization ability as the core principle. OmniGlue enhances the generalization ability for domains not seen during training by leveraging the extensive knowledge of visual foundation models to guide the feature - matching process. In addition, OmniGlue also introduces a new keypoint - location - guided attention mechanism, which separates spatial and appearance information, thereby generating more powerful matching descriptors. Through comprehensive experiments on datasets of multiple different image domains, it is proven that OmniGlue has a relative gain of 20.9% over existing methods on unseen domains and also shows significant advantages over directly comparable reference models.