Efficient Deep Learning of Non-local Features for Hyperspectral Image Classification

Yu Shen,Sijie Zhu,Chen Chen,Qian Du,Liang Xiao,Jianyu Chen,Delu Pan
DOI: https://doi.org/10.1109/TGRS.2020.3014286
2020-08-03
Abstract:Deep learning based methods, such as Convolution Neural Network (CNN), have demonstrated their efficiency in hyperspectral image (HSI) classification. These methods can automatically learn spectral-spatial discriminative features within local patches. However, for each pixel in an HSI, it is not only related to its nearby pixels but also has connections to pixels far away from itself. Therefore, to incorporate the long-range contextual information, a deep fully convolutional network (FCN) with an efficient non-local module, named ENL-FCN, is proposed for HSI classification. In the proposed framework, a deep FCN considers an entire HSI as input and extracts spectral-spatial information in a local receptive field. The efficient non-local module is embedded in the network as a learning unit to capture the long-range contextual information. Different from the traditional non-local neural networks, the long-range contextual information is extracted in a specially designed criss-cross path for computation efficiency. Furthermore, by using a recurrent operation, each pixel's response is aggregated from all pixels of HSI. The benefits of our proposed ENL-FCN are threefold: 1) the long-range contextual information is incorporated effectively, 2) the efficient module can be freely embedded in a deep neural network in a plug-and-play fashion, and 3) it has much fewer learning parameters and requires less computational resources. The experiments conducted on three popular HSI datasets demonstrate that the proposed method achieves state-of-the-art classification performance with lower computational cost in comparison with several leading deep neural networks for HSI.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the problem of how to efficiently model long - distance contextual information in hyperspectral image (HSI) classification. Specifically, existing deep - learning - based methods, such as convolutional neural networks (CNN), can automatically extract spectral - spatial discriminative features from local regions, but the information extracted by these methods is still limited to a fixed small area and fails to fully utilize non - local information. For each pixel in HSI, it is not only related to its neighboring pixels, but also has a connection with distant pixels. For example, the same ground object type may appear in different positions in the scene. Therefore, in order to extract long - distance contextual information, the paper proposes an all - convolutional network with an efficient non - local module (ENL - FCN) to improve the performance and efficiency of HSI classification. ### Main contributions 1. **End - to - end trainable deep - learning framework**: By combining the all - convolutional network (FCN) with an efficient non - local module, this framework can simultaneously extract local and non - local information. 2. **Efficient non - local module**: Compared with traditional non - local modules, this module is more efficient in terms of computation and memory usage, and multiple modules can be stacked to further improve performance. 3. **Significantly reduce computational resources**: Compared with the original non - local module, the efficient non - local module saves more than three times the computational memory and one hundred times the learning parameters. Experimental results show that the proposed ENL - FCN achieves state - of - the - art classification performance on multiple HSI datasets with lower computational costs. ### Method overview 1. **All - convolutional network (FCN)**: As the backbone network, FCN is responsible for extracting local spectral - spatial information from the entire HSI. 2. **Efficient non - local module**: This module calculates the relationship between pixels through the criss - cross path, thereby efficiently capturing long - distance contextual information. Specific operations include: - Using 1×1 convolution kernels to generate feature maps \( Q \) and \( K \). - Calculating the non - local correlation between pixels through the criss - cross path to generate an attention map \( A \). - Applying the attention map to the feature map \( V \) to generate a new feature map \( E' \). 3. **Recursive operation**: Through recursive operation, the capture of contextual information is further enhanced, so that the response of each pixel can be aggregated from all pixels in the entire HSI. 4. **Loss function**: Use the cross - entropy loss function for training, and ensure that only labeled samples participate in loss calculation through the training mask. ### Experimental results The paper conducted experiments on three widely - used HSI datasets, including Indian Pines (IP), Pavia University (PU), and Kennedy Space Center (KSC). Experimental results show that the proposed ENL - FCN achieves state - of - the - art classification performance on these datasets with lower computational costs. ### Conclusion By introducing an efficient non - local module, the paper successfully solves the problem of long - distance contextual information modeling in HSI classification and improves classification performance and computational efficiency.