Can A Machine Generate Humanlike Language Descriptions for A Remote Sensing Image?

Zhenwei Shi,Zhengxia Zou
DOI: https://doi.org/10.1109/tgrs.2017.2677464
IF: 8.2
2017-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:This paper investigates an intriguing question in the remote sensing field: "can a machine generate humanlike language descriptions for a remote sensing image?" The automatic description of a remote sensing image (namely, remote sensing image captioning) is an important but rarely studied task for artificial intelligence. It is more challenging as the description must not only capture the ground elements of different scales, but also express their attributes as well as how these elements interact with each other. Despite the difficulties, we have proposed a remote sensing image captioning framework by leveraging the techniques of the recent fast development of deep learning and fully convolutional networks. The experimental results on a set of high-resolution optical images including Google Earth images and GaoFen-2 satellite images demonstrate that the proposed method is able to generate robust and comprehensive sentence description with desirable speed performance.
What problem does this paper attempt to address?