Multi-Label Remote Sensing Image Scene Classification by Combining a Convolutional Neural Network and a Graph Neural Network

Yansheng Li,Ruixian Chen,Yongjun Zhang,Mi Zhang,Ling Chen
DOI: https://doi.org/10.3390/rs12234003
IF: 5
2020-12-07
Remote Sensing
Abstract:As one of the fundamental tasks in remote sensing (RS) image understanding, multi-label remote sensing image scene classification (MLRSSC) is attracting increasing research interest. Human beings can easily perform MLRSSC by examining the visual elements contained in the scene and the spatio-topological relationships of these visual elements. However, most of existing methods are limited by only perceiving visual elements but disregarding the spatio-topological relationships of visual elements. With this consideration, this paper proposes a novel deep learning-based MLRSSC framework by combining convolutional neural network (CNN) and graph neural network (GNN), which is termed the MLRSSC-CNN-GNN. Specifically, the CNN is employed to learn the perception ability of visual elements in the scene and generate the high-level appearance features. Based on the trained CNN, one scene graph for each scene is further constructed, where nodes of the graph are represented by superpixel regions of the scene. To fully mine the spatio-topological relationships of the scene graph, the multi-layer-integration graph attention network (GAT) model is proposed to address MLRSSC, where the GAT is one of the latest developments in GNN. Extensive experiments on two public MLRSSC datasets show that the proposed MLRSSC-CNN-GNN can obtain superior performance compared with the state-of-the-art methods.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is multi - label remote sensing image scene classification (MLRSSC). Specifically, the paper focuses on how to effectively extract discriminative semantic representations to distinguish multiple categories, especially in remote sensing images. Compared with single - label remote sensing image scene classification, MLRSSC is a more realistic task, aiming to predict multiple semantic labels to describe a remote sensing image scene. Since there are often complex inter - relationships among multiple categories, it is very challenging to effectively extract these relationships. Existing methods can usually only perceive visual elements in the image, while ignoring the spatial topological relationships among these elements. To solve this problem, the paper proposes a new framework that combines convolutional neural network (CNN) and graph neural network (GNN), namely MLRSSC - CNN - GNN, for mining the appearance features of visual elements in the image scene and their spatial topological relationships, thereby improving the performance of multi - label classification.