PT-Net: Pyramid Transformer Network for Feature Matching Learning

Zhepeng Gong,Guobao Xiao,Ziwei Shi,Shiping Wang,Riqing Chen
DOI: https://doi.org/10.1109/tim.2024.3369132
IF: 5.6
2024-03-09
IEEE Transactions on Instrumentation and Measurement
Abstract:In this article, we propose a novel pyramid transformer network (PT-Net) for feature matching problems. Recent studies have used the dense motion field to transform unordered correspondences into ordered motion vectors and have used convolutional neural networks (CNNs) to extract deep features. However, the limited receptive field of CNNs restricts the ability of the network to capture global information within the motion field. To tackle this limitation, we devise a pyramid transformer (PT) block to enhance the models ability to extract both local and global information from the motion field, which fuses multiscale motion field information by constructing a pyramid-structured motion field. Furthermore, to alleviate the high memory demands of spatial attention in the transformer, we introduce dilated sparse attention (DSA), a novel attention block that reduces the computational difficulty of multihead self-attention (MHSA) through regular interval sampling and deconvolution operations and focuses on the essential regions to establish long-range dependencies between the correct motion vectors. The proposed PT-Net is effective in inferring the probabilities of correspondences belonging to either inliers or outliers, while simultaneously estimating the essential matrix. Extensive experiments demonstrate that PT-Net network outperforms state-of-the-art methods for outlier removal tasks and camera pose estimation on different datasets, including YFCC100M and SUN3D. The code is available at https://github.com/gongzhepeng/PT-Net.
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?