OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Hanwen Jiang,Arjun Karpur,Bingyi Cao,Qixing Huang,Andre Araujo

2024-05-22

Abstract:The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of $7$ datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of $20.9\%$ with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by $9.5\%$ relatively.Code and model can be found at

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the field of image matching, although existing methods perform well in traditional benchmark tests, their generalization ability on new and unseen image domains is limited. Specifically, while current methods perform excellently in specific visual domains (such as outdoor and indoor scenes), their performance usually drops significantly when dealing with data from different domains (for example, object - centered or aerial - view images). This limits the potential of these methods in real - world applications. Therefore, this paper proposes OmniGlue, which is the first learnable image matcher designed with generalization ability as the core principle. OmniGlue enhances the generalization ability for domains not seen during training by leveraging the extensive knowledge of visual foundation models to guide the feature - matching process. In addition, OmniGlue also introduces a new keypoint - location - guided attention mechanism, which separates spatial and appearance information, thereby generating more powerful matching descriptors. Through comprehensive experiments on datasets of multiple different image domains, it is proven that OmniGlue has a relative gain of 20.9% over existing methods on unseen domains and also shows significant advantages over directly comparable reference models.

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

LightGlue: Local Feature Matching at Light Speed

RGM: A Robust Generalizable Matching Model

Omni-IML: Towards Unified Image Manipulation Localization

GIM: Learning Generalizable Image Matcher From Internet Videos

Adaptive Assignment for Geometry Aware Local Feature Matching

Video object matching across multiple non-overlapping camera views based on multi-feature fusion and incremental learning.

Deep learning feature representation for image matching under large viewpoint and viewing direction change

RGM: A Robust Generalist Matching Model.

A Hypergraph Matching Framework for Refining Multi-source Feature Correspondences.

Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence

Learning Geometric Feature Embedding with Transformers for Image Matching

Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation

GeoGlue: feature matching with self-supervised geometric priors for high-resolution UAV images

Robust feature matching using guided local outlier factor

Joint Graph Learning and Matching for Semantic Feature Correspondence

Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning

ContextMatcher: Detector-Free Feature Matching with Cross-Modality Context

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals