Integrating Multi-subspace Joint Learning with Multi-level Guidance for Cross-Modal Retrieval of Remote Sensing Images

Yaxiong Chen,Jirui Huang,Shengwu Xiong,Xiaoqiang Lu
DOI: https://doi.org/10.1109/tgrs.2024.3369042
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:In recent years, with the continuous advancement of remote sensing (RS) technology and text processing techniques, there has been a growing abundance of RS images and associated textual data. Combining RS images with their corresponding textual data allows for integrated analysis and retrieval, which holds significant practical implications across multiple application domains, including geographic information systems (GIS), environmental monitoring, and agricultural management. RS images have the characteristics of multitargets and multiscales, and the textual descriptions of these targets are not fully utilized, leading to a decrease in retrieval accuracy. Previous methods have struggled to balance intermodality information interaction and intramodality feature fusion, and they have paid little attention to the consistency of distribution within modalities. In light of this, this article proposes a symmetric multilevel guidance network (SMLGN) for cross-modal retrieval in RS. SMLGN first introduces fusion guidance between local and global within modalities and fine-grained bidirectional guidance between modalities, allowing for the learning of a common semantic space. Furthermore, to address the distribution differences of different modalities within the common semantic space, we design an adversarial joint learning framework and a multiobjective loss function to optimize the SMLGN method and achieve consistency in data distribution. The experimental results demonstrate that the SMLGN method performs well in the task of cross-modal retrieval between RS images and textual data. It effectively integrates the information from both modalities, improving the accuracy and reliability of the retrieval process.
What problem does this paper attempt to address?