Detection Assisted Change Captioning for Remote Sensing Image

Xiliang Li,Bin Sun,Shutao Li
DOI: https://doi.org/10.1109/igarss53475.2024.10640971
2024-01-01
Abstract:Remote sensing image change captioning is a crucial image interpretation technique that auto-generates language captions of differences between multi-temporal remote sensing images. Previous attention-based methods were difficult to generate accurate captions due to their inability to precisely locate crucial visual change areas. To address this challenge, this paper introduces a novel method that aims to leverage explicit visual change information to enhance its change description capabilities. Specifically, the proposed model comprises three key components: 1) the change-visual enhancement module leverages the change image containing object-level visual information to enhance the multi-temporal images at the image level; 2) the multi-temporal feature fusion module captures accurate visual change features through a meticulously designed feature fusion at feature level; 3) the caption generation module inputs the visual change features into transformer-based generator to produce desired captions of multi-temporal remote sensing images. Experimental results on LEVIR-CC dataset demonstrate that our method has achieved state-of-the-art performance.
What problem does this paper attempt to address?