Incorporating object counts into remote sensing image captioning

Zihao Ni Zhaoyun Zong Peng Ren a College of Oceanography and Space Informatics,China University of Petroleum (East China),Qingdao,People's Republic of Chinab National Key Laboratory of Deep Oil and Gas,China University of Petroleum (East China),Qingdao,People's Republic of China
DOI: https://doi.org/10.1080/17538947.2024.2392847
IF: 4.606
2024-08-23
International Journal of Digital Earth
Abstract:Existing methods for remote sensing image captioning tend to describe a remote sensing image using generic language that lacks specific information about object counts. To address this limitation, we propose a novel framework for generating a caption that includes object count information for the remote sensing image. Our proposed framework comprises three modules: object counting, preliminary captioning, and numeral editing. The object counting module identifies objects in a remote sensing image and determines object counts. The preliminary captioning module generates a caption that may lack object count information. The numeral editing module incorporates the object counts into the caption, resulting in a more precise caption. Our proposed framework outperforms existing methods, as demonstrated through evaluations on three remote sensing image datasets. Our proposed framework is a significant step toward more precise and informative remote sensing image captioning.
geography, physical,remote sensing
What problem does this paper attempt to address?