A Multi-Task Network and Two Large Scale Datasets for Change Detection and Captioning in Remote Sensing Images

Jingye Shi,Mengge Zhang,Yuewu Hou,Ruicong Zhi,Jiqiang Liu
DOI: https://doi.org/10.1109/tgrs.2024.3485740
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Remote Sensing Change Detection (RSCD) recognizes pixel-level change regions between images, while Remote Sensing Change Captioning (RSCC) describes the nature and properties of these changes in natural language. The past have studied two tasks individually. Resulting in a single interpretation content produced and limited application scenarios. And the complementarity of change regions and deep semantic information can further improve the performance and robustness of the model. Therefore, And we try simultaneously to solve both RSCD and RSCC tasks (i.e., RS-CDC) under a multi-task framework, namely a CNN-Transformer based multi-task Network (CTMTNet). Specifically, we design Multi-Attention Feature Enhancement Module (MAFEM) and Feature Fusion Block (FFB) to enhance local information and location perception of features from bi-temporal images. The MAFEM weights the channel and space separately to capture local information more accurately and enhance location perception. The FFB fuses bi-temporal features and uses multi-level residual connections to ensure that change information is not lost during transfer. Finally, we use two decoders to output the Change Maps (CMs) and change captioning, respectively. During training, we use an improved multi-task loss function for CTMTNet to balance the two tasks. For exploring the RS-CDC task, we construct two large-scale datasets named LEVIR-CDC and WHU-CDC dataset. We benchmark the existing state-of-the-art change detection and change captioning methods on these two datasets and a newly publicized LEVIR-MCI dataset, and the results show that the proposed CTMTNet significantly outperforms comparative methods.
What problem does this paper attempt to address?