Abstract:Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different receptive fields to realize complete image perception, hence limiting the matching accuracy. In addition, these methods utilize a conventional handcrafted encoding approach to integrate the positional information of keypoints into the visual descriptors, which limits the capability of the network to extract reliable positional encoding message. In this study, we propose Feature Matching with Reconciliatory Transformer (FMRT), a novel Transformer-based detector-free method that reconciles different features with multiple receptive fields adaptively and utilizes parallel networks to realize reliable positional encoding. Specifically, FMRT proposes a dedicated Reconciliatory Transformer (RecFormer) that consists of a Global Perception Attention Layer (GPAL) to extract visual descriptors with different receptive fields and integrate global context information under various scales, Perception Weight Layer (PWL) to measure the importance of various receptive fields adaptively, and Local Perception Feed-forward Network (LPFFN) to extract deep aggregated multi-scale local feature representation. Extensive experiments demonstrate that FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching.

PA-LoFTR: Local Feature Matching with 3D Position-Aware Transformer

LoFTR: Detector-Free Local Feature Matching with Transformers

Semi-Dense Feature Matching with Transformers and Its Applications in Multiple-View Geometry

UAV image matching from handcrafted to deep local features

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

LoFLAT: Local Feature Matching using Focused Linear Attention Transformer

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Geo-Localization with Transformer-Based 2D-3D Match Network

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer

ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses

Local Deep Feature Learning Framework for 3D Shape.

Adaptive Spot-Guided Transformer for Consistent Local Feature Matching.

LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching

Transformer-Based Local Feature Matching for Multimodal Image Registration

TP3M: Transformer-based Pseudo 3D Image Matching with Reference Image

LMAFormer: Local Motion Aware Transformer for Small Moving Infrared Target Detection

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D shape recognition

TransLO: A Window-Based Masked Point Transformer Framework for Large-Scale LiDAR Odometry

2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds

Learning Geometric Feature Embedding with Transformers for Image Matching