AbHE: All Attention-Based Homography Estimation

Mingxiao Huo,Zhihao Zhang,Xinyang Ren,Xianqiang Yang,Chao Ye
DOI: https://doi.org/10.1109/tim.2024.3374320
IF: 5.6
2024-03-27
IEEE Transactions on Instrumentation and Measurement
Abstract:Homography estimation is a fundamental task in computer vision that involves obtaining the transformation between multiview images for image alignment. Although convolutional neural network (CNN) has shown state-of-the-art performance in this task, few works have explored the use of transformer-based models that have demonstrated superiority in high-level vision tasks. In this article, we propose a strong baseline model for homography estimation that combines a Swin transformer feature representation for global features and a CNN feature representation for local features. Additionally, we introduce a cross-nonlocal layer to coarsely search for matched features within the feature maps. In the homography regression stage, we adopt an attention layer to drop weak correlation feature points from the channels of the correlation volume. Our experiments show that our method outperforms the state-of-the-art methods in eight-degree-of-freedom (DOF) homography estimation. The code is available at https://github.com/mingxiaohuo/ABHE.
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?