GLR-CNN: CNN-based Framework with Global Latent Relationship Embedding for High-resolution Remote Sensing Image Scene Classification
Li Liu,Yuebin Wang,Junhuan Peng,Liqiang Zhang
DOI: https://doi.org/10.1109/tgrs.2024.3434452
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:High-resolution remote sensing image (HRSI) scene classification often faces challenges; for example, the intraclass similarity is low, but the interclass similarity is high due to complex backgrounds and variable scene scales. Convolutional neural networks (CNNs), the leading methods for HRSI scene classification, offer excellent performance. However, traditional CNNs require fixed-size inputs, which are a limitation when dealing with HRSI that represent large image domains, potentially degrading classification performance. To overcome these problems, we propose a CNN-based model named GLR-CNN in this article. First, to capitalize on the information from large-scale scenes adequately, VGG16 is utilized to extract the deep representative features, fine-tuned by the target HRSI of any size. Furthermore, a multilayer feature fusion block based on the channel-spatial attention algorithm is integrated into the CNN to capture more discriminative features from arbitrary-size images. Finally, to enhance the consistency between image features and similarities, a global latent relationship is used to measure the similarities among image features, then embed it into the fully connected layers (FCLs), and construct the latent relationship constraint. The model is optimized by the joint objective function including the latent relationship constraint and cross-entropy loss with label smoothing. Extensive experiments on three HRSI datasets obtained improvements of 3.93%, 6.5%, and 2.2% in overall accuracy compared to the finetuned VGG16 model, proving the effectiveness of the GLR-CNN method.