Remote Sensing Image Captioning with Multi-Scale Feature and Small Target Attention

Kangda Cheng,Zhilu Wu,Haiyan Jin,Xiaobao Li
DOI: https://doi.org/10.1109/igarss53475.2024.10642778
2024-01-01
Abstract:Remote sensing images encompass a multitude of targets with varying scales and lower resolutions, posing significant challenges for remote sensing image captioning tasks. To fully extract and leverage image features, this paper proposes a multi-scale feature extraction network that enhances the representational capacity of features by integrating different scales, enabling more accurate identification and description of targets. Additionally, we designed a Small Target Attention module to further enhance the network’s sensitivity to densely distributed and small-sized targets. Extensive experiments conducted on three publicly available datasets demonstrate that the proposed method outperforms the compared methods in capturing key information. Moreover, it shows better performance when processing remote sensing images with lower resolutions and small-sized targets.
What problem does this paper attempt to address?