A transformer-based cross-modal image-text retrieval method using feature decoupling and reconstruction

Huan Zhang,Yingzhi Sun,Yu Liao,SiYuan Xu,Rui Yang,Shuang Wang,Biao Hou,Licheng Jiao
DOI: https://doi.org/10.1109/IGARSS46834.2022.9883242
2022-01-01
Abstract:With the increasing application of remote sensing technology, the task of cross-modal retrieval of remote sensing images (CMRRS) has gradually attracted widespread attention. Existing methods often completely map the features of different modalities to a shared space and do not decouple between the modal-invariant information and modal-heterogeneous information, which leads to redundant information in feature mapping and usually gets sub-optimal retrieval performance. This paper proposes a Transformer-based CMRRS method using feature decoupling and reconstruction (TBFDR) to solve this problem. TBFDR achieves state-of-the-art performance in remote sensing image-text retrieval task on Sydney-Captions dataset.
What problem does this paper attempt to address?