Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning

Zhong Ji,Changxu Meng,Yan Zhang,Haoran Wang,Yanwei Pang,Jungong Han
DOI: https://doi.org/10.1145/3664647.3681270
2024-01-01
Abstract:Mountains of researches center around the Remote Sensing Image-Text Retrieval (RSITR), aiming at retrieving the corresponding targets based on the given query. Among them, the transfer of Foundation Models (FMs), such as CLIP, to remote sensing domain shows promising results. However, existing FM-based approaches neglect the negative impact of weakly correlated sample pairs and the key distinctions among remote sensing texts, leading to biased and superficial exploration of sample pairs. To address these challenges, we propose a novel Eliminate Before Align strategy with Keyword Explicit Reasoning framework (EBAKER) for RSITR. Specifically, we devise an innovative Eliminate Before Align (EBA) strategy to filter out the weakly correlated sample pairs to mitigate their deviations from optimal embedding space during alignment. Moreover, we introduce a Keyword Explicit Reasoning (KER) module to facilitate the positive role of subtle key concept differences. Without bells and whistles, our method achieves a one-step transformation from FM to RSITR task, obviating the necessity for extra pretraining on remote sensing data. Extensive experiments on three popular benchmark datasets validate that our proposed EBAKER method outperform the state-of-the-art methods with fewer training data. Our source code will be released soon.
What problem does this paper attempt to address?