Domain Alignment with Large Vision-language Models for Cross-domain Remote Sensing Image Retrieval

Yan Chen,Guocan Cai,Fufang Li,Yangtao Wang,Xin Tan,Xiaocui Li
DOI: https://doi.org/10.1145/3627673.3679612
2024-01-01
Abstract:Cross-domain remote sensing image retrieval has been a hotspot in the past few years. Most of the existing methods focus on combining semantic learning with domain adaptation on well-labeled source domain and unlabeled target domain. However, they face two serious challenges. (1) They cannot deal with practical scenarios where the source domain lacks sufficient label supervision. (2) They suffer from severe performance degradation when the data distribution between the source domain and target domain becomes highly inconsistent. To address these challenges, we propose D omain A lignment with L arge V ision-language models for cross-domain remote sensing image retrieval (termed as DALV). First, we design a dual-modality prototype guided pseudo-labeling mechanism, which leverages the pre-trained large vision-language model (i.e., CLIP) to assign pseudo-labels for all unlabeled source domain images and target domain images. Second, we compute the confidence scores for these pseudo-labels to distinguish their reliability. Next, we devise a loss reweighting strategy, which incorporates the confidence scores as weight values into the contrastive loss to mitigate the impact of noisy pseudo-labels. Finally, the low-rank adaptation fine-tuning means is adapted to update our model and achieve domain alignment to obtain class discriminative features. Extensive experiments on 12 cross-domain remote sensing image retrieval tasks show that our proposed DALV outperforms the state-of-the-art approaches. The source code is available at https://github.com/ptyy01/DALV.
What problem does this paper attempt to address?