Abstract:Feature matching is a crucial technique in computer vision. A unified perspective for this task is to treat it as a searching problem, aiming at an efficient search strategy to narrow the search space to point matches between images. One of the key aspects of search strategy is the search space, which in current approaches is not carefully defined, resulting in limited matching accuracy. This paper, thus, pays attention to the search space and proposes to set the initial search space for point matching as the matched image areas containing prominent semantic, named semantic area matches. This search space favors point matching by salient features and alleviates the accuracy limitation in recent Transformer-based matching methods. To achieve this search space, we introduce a hierarchical feature matching framework: Area to Point Matching (A2PM), to first find semantic area matches between images and later perform point matching on area matches. We further propose Semantic and Geometry Area Matching (SGAM) method to realize this framework, which utilizes semantic prior and geometry consistency to establish accurate area matches between images. By integrating SGAM with off-the-shelf state-of-the-art matchers, our method, adopting the A2PM framework, achieves encouraging precision improvements in massive point matching and pose estimation experiments.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the matching accuracy in the feature - matching task in computer vision. Specifically, existing feature - matching methods have difficulty achieving high - accuracy matching when facing challenges such as extreme viewpoints, illumination changes, repetitive patterns, and motion blur. These problems mainly stem from an ill - defined search space and a poorly designed corresponding search strategy. Therefore, the paper proposes a new search space - semantic area matches, and utilizes this search space through a hierarchical feature - matching framework (Area to Point Matching, A2PM) to improve the accuracy and robustness of point matching. ### Main Contributions 1. **Defined a New Search Space**: Proposed semantic area matches as a new search space for feature matching. By narrowing the search range to areas containing significant semantics, the accuracy of point matching is improved. 2. **Proposed the SGAM Method**: Combined semantic priors and geometric consistency, and proposed the Semantic and Geometry Area Matching (SGAM) method to achieve accurate area and point matching. 3. **Significantly Improved Matching Performance**: By combining the SGAM method with existing detector - free matchers, a significant performance improvement was achieved, reaching up to 29.16% and 11.04% performance improvements in a wide range of point - matching and pose - estimation experiments respectively. ### Method Overview - **Semantic Area Matching (SAM)**: - **Semantic Object Area (SOA)**: Detect semantic object areas by identifying connected components in the image and establishing bounding boxes. To reduce redundancy, areas that are spatially close and have the same semantics are merged. - **Semantic Intersection Area (SIA)**: Detect areas containing multiple semantics through a sliding window, and further refine the positions of these areas to capture stable features of large objects. - **Matching Process**: Use semantic - surrounding descriptors and Hamming distance for SOA matching, and use semantic - scale descriptors and L2 distance for SIA matching. - **Geometric Area Matching (GAM)**: - **Predictor (GP)**: Screen out true matches in suspicious areas through geometric consistency. - **Rejector (GR)**: Calculate the geometric - consistency threshold to filter out incorrect and low - quality area matches. - **Global Matching Collection Module (GMC)**: Collect global correspondences in low - semantic scenes to improve the uniformity of matching distribution. ### Formula Explanation - **Geometric Consistency**: - **Sampson Distance**: \[ d_{i,i}=\sum_{m = 1}^{M}\frac{(p_m^T F_i q_m)^2}{(F_i q_m)_1^2+(F_i q_m)_2^2+(F_i^T p_m)_1^2+(F_i^T p_m)_2^2} \] - **Cross - Region Geometric Consistency**: \[ d_{i,j}=D(F_i, P_j)\to0 \] - **Geometric Consistency of Region - Matching Sets**: \[ GA_{i,\pi(i)}=\frac{1}{N}\sum_{j = 1}^{N}d_{i,j} \] Through these methods and formulas, the paper successfully solves the problems of an ill - defined search space and insufficient matching accuracy in existing feature - matching methods, providing a new solution for the feature - matching task in the field of computer vision.

Searching from Area to Point: A Hierarchical Framework for Semantic-Geometric Combined Feature Matching

Matching Images Based on Consistency Graph and Region Adjacency Graphs.

MESA: Matching Everything by Segmenting Anything

DMESA: Densely Matching Everything by Segmenting Anything

A Hypergraph Matching Framework for Refining Multi-source Feature Correspondences.

Robust feature point matching based on geometric consistency and affine invariant spatial constraint

Video object matching across multiple non-overlapping camera views based on multi-feature fusion and incremental learning.

Geometry-aware Feature Matching for Large-Scale Structure from Motion

Learning Geometric Feature Embedding with Transformers for Image Matching

Guided neighborhood affine subspace embedding for feature matching

Adaptive Assignment for Geometry Aware Local Feature Matching

A survey of feature matching methods

Structured Epipolar Matcher for Local Feature Matching.

Geometric Matching for Cross-Modal Retrieval

Semi-Dense Feature Matching with Transformers and Its Applications in Multiple-View Geometry

Semantic-assisted Unified Network for Feature Point Extraction and Matching

Semantic-Spatial Matching for Image Classification

A Geometric Reasoning Based Algorithm for Point Pattern Matching

An Improved Method for Stable Feature Points Selection in Structure-from-Motion Considering Image Semantic and Structural Characteristics

A Robust Algorithm For Feature Point Matching

Large-Scale Structure from Motion with Semantic Constraints of Aerial Images