Remote sensing scene classification based on high-order graph convolutional network

Yue Gao,Jun Shi,Jun Li,Ruoyu Wang
DOI: https://doi.org/10.1080/22797254.2020.1868273
IF: 4
2021-01-04
European Journal of Remote Sensing
Abstract:<span>Remote sensing scene classification has gained increasing interest in remote sensing image understanding and feature representation is the crucial factor for classification methods. Convolutional Neural Network (CNN) generally uses hierarchical deep structure to automatically learn the feature representation from the whole images and thus has been widely applied in scene classification. However, it may fail to consider the discriminative components within the image during the learning process. Moreover, the potential relationships of scene semantics are likely to be ignored. In this paper, we present a novel remote sensing scene classification method based on high-order graph convolutional network (H-GCN). Our method uses the attention mechanism to focus on the key components inside the image during CNN feature learning. More importantly, high-order graph convolutional network is applied to investigate the class dependencies. The graph structure is built where each node is described by the mean of attentive CNN features from each semantic class. The semantic class dependencies are propagated with mixing neighbor information of nodes at different orders and thus the more informative representation of nodes can be gained. The node representations of H-GCN and attention CNN features are finally integrated as the discriminative feature representation for scene classification. Experimental results on benchmark datasets demonstrate the feasibility and effectiveness of our method for remote sensing scene classification.</span>
remote sensing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the feature representation problem in remote sensing scene classification. Specifically: 1. **Limitations of traditional CNN methods**: Although convolutional neural networks (CNN) are excellent at automatically learning high - level features from images, they have two main weaknesses when dealing with remote sensing scene classification: - **Difficulty in distinguishing key components**: Complex remote sensing scene images may contain similar components (for example, both forests and parks may have trees), which makes CNN - based methods difficult to distinguish these key components. - **Ignoring semantic relationships**: Traditional CNN methods usually ignore the potential relationships between scene semantics, which may lead to poor classification results. 2. **Necessity of introducing the attention mechanism**: Although the attention mechanism can enhance the focus on key regions in the image to a certain extent, when the regions of interest in different scene categories have similar features, relying solely on the attention mechanism may not be sufficient to effectively distinguish these categories. For example, "dense residential areas", "medium - density residential areas" and "schools" may have similar objects of interest (such as houses), which makes it difficult to accurately classify only by attention features. 3. **Exploring the correlation of semantic categories**: In order to further improve the classification performance, the author believes that the correlation between different scene semantic categories should be explored. Theoretically, semantically similar scene categories should have more similar feature representations, and vice versa. Therefore, the author hopes to improve the ability of feature representation by modeling the correlation of these semantic categories. To solve the above problems, the author proposes a new remote sensing scene classification method based on the high - order graph convolutional network (H - GCN). This method combines the attention mechanism and CNN features, aiming to: - **Focus on key components**: Focus on the key components in the image through the attention mechanism, thereby enhancing the discriminative ability of CNN features. - **Model semantic correlations**: Build a graph structure to model the dependencies between different semantic categories, and use the high - order graph convolutional network to propagate these relationships, generating more informative node feature representations. Finally, this method improves the performance of remote sensing scene classification by integrating the attention - CNN features and H - GCN node representations. Experimental results show that this method outperforms existing methods on multiple public datasets. ### Formula summary 1. **Global average pooling in the attention mechanism**: \[ z_c=\frac{1}{H\times W}\sum_{i = 1}^{H}\sum_{j = 1}^{W}u_{c}(i,j) \] where \(z_c\) is the global average value of the \(c\) - th channel. 2. **Channel - dependency modeling**: \[ s=\sigma(f_c(\delta(f_c(z)))) \] where \(f_c(\cdot)\) represents a fully - connected layer, and \(\delta(\cdot)\) and \(\sigma(\cdot)\) are the ReLU and Sigmoid activation functions respectively. 3. **Weighted feature map**: \[ \bar{u}_{c}(i,j)=s_c\cdot u_{c}(i,j) \] 4. **Definition of the adjacency matrix in the graph structure**: \[ A_{ij}=\begin{cases}1, & \text{if }X_i\in\text{knn}(X_j)\text{ or }X_j\in\text{knn}(X_i)\\0, & \text{otherwise}\end{cases} \] 5. **Message - passing rule of the high - order graph convolutional network**: \[ H^{(l + 1)}=\bigoplus_{j\in P}\sigma(\tilde{A}^jH^{(l)}W_l^{(j)}) \] where \(\tilde{A}^j\) represents the \(j\) - th power of the adjacency matrix \(\tilde{A}\)