Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning.

Wei Huang,Qi Wang,Xuelong Li
DOI: https://doi.org/10.1109/lgrs.2020.2980933
IF: 5.343
2021-01-01
IEEE Geoscience and Remote Sensing Letters
Abstract:With the benefits from deep learning technology, generating captions for remote sensing images has become achievable, and great progress has been made in this field in the recent years. However, a large-scale variation of remote sensing images, which would lead to errors or omissions in feature extraction, still limits the further improvement of caption quality. To address this problem, we propose a denoising-based multi-scale feature fusion (DMSFF) mechanism for remote sensing image captioning in this letter. The proposed DMSFF mechanism aggregates multiscale features with the denoising operation at the stage of visual feature extraction. It can help the encoder-decoder framework, which is widely used in image captioning, to obtain the denoising multiscale feature representation. In experiments, we apply the proposed DMSFF in the encoder-decoder framework and perform the comparative experiments on two public remote sensing image captioning data sets including UC Merced (UCM)-captions and Sydney-captions. The experimental results demonstrate the effectiveness of our method.
What problem does this paper attempt to address?