ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning
Sijun Dong,Libo Wang,Bo Du,Xiaoliang Meng
DOI: https://doi.org/10.1016/j.isprsjprs.2024.01.004
IF: 12.7
2024-02-01
ISPRS Journal of Photogrammetry and Remote Sensing
Abstract:Remote sensing change detection (RSCD), which aims to identify surface changes from bitemporal images, is significant for many applications, such as environmental protection and disaster monitoring. In the last decade, driven by the wave of artificial intelligence, many change detection methods based on deep learning emerged and have achieved essential breakthroughs. However, these methods pay more attention to visual representation learning while ignoring the potential of multimodal data. Recently, the foundation vision-language model, i.e. CLIP, has provided a new paradigm for multimodal AI, demonstrating impressive performance on downstream tasks. Following this trend, in this study, we introduce ChangeCLIP, a novel framework that leverages robust semantic information from image-text pairs, specifically tailored for Remote Sensing Change Detection (RSCD). Specifically, we reconstruct the original CLIP to extract bitemporal features and propose a novel differential features compensation module to capture the detailed semantic changes between them. Besides, we proposed a vision-language-driven decoder by combining the results of image-text encoding with the visual features of the decoding stage, thereby enhancing the image semantics. The proposed ChangeCLIP achieved state-of-the-art IoU on 5 well-known change detection datasets, LEVIR-CD (85.20%), LEVIR-CD+ (75.63%), WHUCD (90.15%), CDD (95.87%) and SYSU-CD (71.41%). The code and the pretrained models of ChangeCLIP will be publicly available on https://github.com/dyzy41/ChangeCLIP.
imaging science & photographic technology,remote sensing,geography, physical,geosciences, multidisciplinary