MambaLF: an Efficient Local Feature Extraction and Matching with State Space Model
Houqin Bian,Qifei Chen,Haolin Zhang,Lunming Qin,Liang Xue,Haoyang Cui,Xi Wang
DOI: https://doi.org/10.21203/rs.3.rs-5345658/v1
2024-01-01
Abstract:Local feature extraction and matching has lately attracted increasing attention due to its wide application, especially in real-time automated systems. However, existing image matching methods struggle to balance the global receptive field and the efficient computation, which limits the practical applications. Recently, the State Space Model (SSM) has shown great potential in linear complexity and long-range dependency modeling. Therefore, in this paper, a local feature extraction and matching method using the SSM is proposed, which aims to achieve the tradeoff between global information extraction and model complexity. Firstly, a Local and Global Information Fusion (LGIF) block is developed to integrate local and global information and reduce model parameters through parallel SSM. Secondly, a backbone based on Euclidean group E(2) equivariant steerable Convolution (E2Conv) is designed to improve the model's robustness against geometric transformations. Finally, a self-supervised learning framework is constructed, which optimizes the ability of the network in local feature detection and description by combining four loss functions: keypoint localization loss, keypoint confidence score loss, descriptor triplet loss, and keypoint correspondence loss. Experimental results on public benchmark datasets Hpatches and RDNIM demonstrate that the proposed method has a significant advantage over existing methods in homography estimation tasks. Notably, our method outperforms the end-to-end dense matching method LoFTR by 6.11% under the 1-pixel error threshold on the Hpatches dataset, simultaneously with a smaller number of parameters and less average matching time.