Why and How: Knowledge-Guided Learning for Cross-Spectral Image Patch Matching

Chuang Yu,Yunpeng Liu,Jinmiao Zhao,Xiangyu Yue
2024-12-15
Abstract:Recently, cross-spectral image patch matching based on feature relation learning has attracted extensive attention. However, performance bottleneck problems have gradually emerged in existing methods. To address this challenge, we make the first attempt to explore a stable and efficient bridge between descriptor learning and metric learning, and construct a knowledge-guided learning network (KGL-Net), which achieves amazing performance improvements while abandoning complex network structures. Specifically, we find that there is feature extraction consistency between metric learning based on feature difference learning and descriptor learning based on Euclidean distance. This provides the foundation for bridge building. To ensure the stability and efficiency of the constructed bridge, on the one hand, we conduct an in-depth exploration of 20 combined network architectures. On the other hand, a feature-guided loss is constructed to achieve mutual guidance of features. In addition, unlike existing methods, we consider that the feature mapping ability of the metric branch should receive more attention. Therefore, a hard negative sample mining for metric learning (HNSM-M) strategy is constructed. To the best of our knowledge, this is the first time that hard negative sample mining for metric networks has been implemented and brings significant performance gains. Extensive experimental results show that our KGL-Net achieves SOTA performance in three different cross-spectral image patch matching scenarios. Our code are available at <a class="link-external link-https" href="https://github.com/YuChuang1205/KGL-Net" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the performance bottleneck problem in cross - spectral image patch matching. Specifically, existing methods gradually show the bottleneck of performance improvement when facing the matching between different spectral images. These methods need to deal with issues such as illumination changes, geometric changes, and pixel - level non - linear differences between cross - spectral image patches. Therefore, it is challenging to construct a high - performance cross - spectral image patch matching method. To solve this problem, the author proposes a stable and efficient bridge, connecting descriptor learning and metric learning, and constructs a knowledge - guided learning network (KGL - Net). KGL - Net achieves significant performance improvement in the following ways: 1. **Feature extraction consistency**: The author finds that there is feature extraction consistency between metric learning based on feature difference learning and descriptor learning based on Euclidean distance. This consistency provides the basis for building a bridge between the two. 2. **Explore multiple network architectures**: To ensure the stability and efficiency of the bridge, the author deeply explores 20 different combined network architectures and finally selects the C3 architecture. In the C3 architecture, the metric network adopts a pseudo - siamese structure, the descriptor network adopts a siamese structure, and the lower - level network layers share parameters to achieve more effective feature extraction. 3. **Hard negative sample mining strategy (HNSM - M)**: The author proposes a new hard negative sample mining strategy for metric learning. By using only positive sample pairs as input and randomly generating negative sample pairs for learning, this method can effectively improve the discriminative ability of the metric branch. 4. **Feature - guided loss**: To ensure that the hard negative sample positions obtained by the descriptor network can have strict guiding significance for the metric network, the author constructs a feature - guided loss function. This loss function ensures that the high - level feature maps of the two methods can guide each other's learning. Through these innovations, KGL - Net not only avoids the use of complex network structures but also achieves state - of - the - art performance in multiple cross - spectral image patch matching scenarios. Experimental results show that the FPR95 of KGL - Net on the VIS - NIR dataset is 36.5% lower than that of the latest FIL - Net, and the number of parameters is reduced by 30.1%. ### Formula summary - **Feature extraction consistency formula**: \[ f_m(V_p, N_p)=\phi_m(f(V), f(N)) \] \[ f_d(V_p, N_p)=\phi_d(f(V), f(N)) \] - **Feature difference learning and Euclidean distance formula**: \[ S_{\text{out}} = m(\phi'(f_V)-\phi'(f_N)) \] \[ \text{dist}_{\text{out}}=\| \phi_d'(f_V')-\phi_d'(f_N') \|^2 \] - **Feature distance matrix construction formula**: \[ M_{ij}=\| d_i^V - d_j^N \|^2 \] - **Feature - guided loss formula**: \[ L_{fg}^v=\frac{1}{N}\sum_{i = 1}^N\| f_v' - f_v \|^2 \] \[ L_{fg}^n=\frac{1}{N}\sum_{i = 1}^N\| f_n' - f_n \|^2 \] - **Total loss function formula**: \[ L = L_d+L_m+\alpha L_{fg}^v+\beta L_{fg}^n \] These formulas show how KGL - Net successfully solves the performance bottleneck problem in cross - spectral image patch matching through methods such as feature extraction consistency, hard negative sample mining, and feature - guided loss.