ESAM-CD: Fine-Tuned EfficientSAM Network With LoRA for Weakly Supervised Remote Sensing Image Change Detection

Mengmeng Wang,Liang Zhou,Kaiyue Zhang,Xinghua Li,Ming Hao,Yuanxin Ye
DOI: https://doi.org/10.1109/tgrs.2024.3470808
IF: 8.2
2024-10-19
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Change detection (CD) has become an attractive research topic in the field of remote sensing imagery in recent years. Despite significant advancements driven by deep learning (DL) techniques, most current methods predominantly rely on fully supervised strategies. These methods require the collection of a large number of pixel-level labels, which is quite time consuming and label intensive. To address that, we propose a weakly supervised CD method with EfficientSAM (ESAM)-CD termed, which leverages multiscale class activation map (CAM) fusion and a fine-tuned EfficientSAM's image encoder. First, we construct a classification model employing image-level labels with a deep supervision strategy to generate high-quality multiscale CAM. Subsequently, a multiscale CAM fusion module is proposed to refine the boundaries of change targets by harnessing information from various scales. Then, we utilize EfficientSAM with powerful generalization capabilities as the backbone and fine-tune it using a low-rank adaptation (LoRA) strategy to establish a CD network. In such a network, bitemporal images and the generated pseudolabels are fed into the network. In addition, to overcome the reliance of EfficientSAM's decoder on prompts, we propose a prompt-free decoder based on the general convolutional layers to predict change maps. Finally, we validate the effectiveness of the proposed ESAM-CD using two publicly available CD datasets (i.e., WHU-CD and LEVIR-CD). Comprehensive experiments demonstrate that our method outperforms other weakly supervised CD methods, achieving outstanding performance on both datasets.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?