Abstract:Recovering camera pose from two-view images is a critical problem in photogrammetry and computer vision. For complex scenarios, point correspondences that are constructed by off-the-shelf feature matcher such as SIFT, would be corrupted by heavy outliers. In this case, traditional sampling consensus- or motion/geometrical coherence-based methods would suffer a lot from ensuring their assumptions. To this end, we propose a deep technique to better extract underlying geometry information from high-dimensional feature space for two-view geometry estimation. Unlike existing deep methods that use distribution-based normalization or explicitly aggregate neighboring correspondences, we propose a graph attention operation with multi-head mechanism, termed as GANet, to latently capture fine-grain contextual/geometrical relations among these corrupted correspondences. This encourages our network to learn informative representation for ensuring high graph similarity thus focusing more on inliers and restraining outliers. On this basis, our network can more easily infer inliers that are best to recover camera pose. Moreover, we also observe that the calculation of graph similarity for each node is only supported by partial node features. In this regard, we further propose a lightweight implementation for graph attention, namely Sparse GANet, which is performed by learning a sparse attention map based on block-wise operation and Sinkhorn normalization. This sparse strategy can largely reduce the memory and computational requests while maintaining the performance. Extensive experiments of pose estimation, outlier rejection and image registration on different challenging datasets, and combinational tests with different descriptor matchers and robust estimators, demonstrate the superiority and great generalization of our method against the state-of-the-art. In particular, we achieve at least 1.5% and 0.6% mAP(%)@5° enhancement on YFCC and SUN3D data for pose estimation, respectively. And our sparse GANet can reduce the model size to only 0.28 MB and the time cost to 16 ms, which is significant superior than SuperGlue that requires 12.02 MB and 68 ms. (Source code is available at https://github.com/StaRainJ/Code-of-GANet.)

Learning to Match Features with Discriminative Sparse Graphneuralnetwork

ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching

Improving Sparse Graph Attention for Feature Matching by Informative Keypoints Exploration.

Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network

Learning to Match Features with Seeded Graph Matching Network

Learning Bipartite Graph Matching for Robust Visual Localization.

Deep Graphical Feature Learning for the Feature Matching Problem

Multi-scale Matching Networks for Semantic Correspondence

GLMNet: Graph Learning-Matching Networks for Feature Matching

Feature Matching via Graph Clustering with Local Affine Consensus

Learning for mismatch removal via graph attention networks

A Novel Neural Network for Remote Sensing Image Matching

MSGA-Net: Progressive Feature Matching via Multi-layer Sparse Graph Attention

Combinatorial Learning of Robust Deep Graph Matching: an Embedding Based Approach.

Joint Graph Learning and Matching for Semantic Feature Correspondence

Learn to Cluster Faces with Better Subgraphs

Learning Combinatorial Embedding Networks for Deep Graph Matching

Learning and Memory of Spatial Relationship by a Neural Network with Sparse Features

Shared Coupling-bridge for Weakly Supervised Local Feature Learning

Elastic Net Hypergraph Learning for Image Clustering and Semi-supervised Classification

LSV-ANet: Deep Learning on Local Structure Visualization for Feature Matching