Abstract:Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different receptive fields to realize complete image perception, hence limiting the matching accuracy. In addition, these methods utilize a conventional handcrafted encoding approach to integrate the positional information of keypoints into the visual descriptors, which limits the capability of the network to extract reliable positional encoding message. In this study, we propose Feature Matching with Reconciliatory Transformer (FMRT), a novel Transformer-based detector-free method that reconciles different features with multiple receptive fields adaptively and utilizes parallel networks to realize reliable positional encoding. Specifically, FMRT proposes a dedicated Reconciliatory Transformer (RecFormer) that consists of a Global Perception Attention Layer (GPAL) to extract visual descriptors with different receptive fields and integrate global context information under various scales, Perception Weight Layer (PWL) to measure the importance of various receptive fields adaptively, and Local Perception Feed-forward Network (LPFFN) to extract deep aggregated multi-scale local feature representation. Extensive experiments demonstrate that FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching.

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

LoFTR: Detector-Free Local Feature Matching with Transformers

Semi-Dense Feature Matching with Transformers and Its Applications in Multiple-View Geometry

PA-LoFTR: Local Feature Matching with 3D Position-Aware Transformer

LoFLAT: Local Feature Matching using Focused Linear Attention Transformer

ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses

UAV image matching from handcrafted to deep local features

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

Are Semi-Dense Detector-Free Methods Good at Matching Local Features?

Efficient Covisibility-based Image Matching for Large-Scale SfM

FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer

Adaptive Spot-Guided Transformer for Consistent Local Feature Matching.

ParaFormer: Parallel Attention Transformer for Efficient Feature Matching

HomoMatcher: Dense Feature Matching Results with Semi-Dense Efficiency by Homography Estimation

XFeat: Accelerated Features for Lightweight Image Matching

Efficient Linear Attention for Fast and Accurate Keypoint Matching

Guide Local Feature Matching by Overlap Estimation

LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching

LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

Face recognition via fast dense correspondence