SwinHCST: a deep learning network architecture for scene classification of remote sensing images based on improved CNN and Transformer

Jiayin Song,Yiming Fan,Wenlong Song,Hongwei Zhou,Liusong Yang,Qiqi Huang,Zhuoyuan Jiang,Chuangqi Wang,Ting Liao
DOI: https://doi.org/10.1080/01431161.2023.2285739
IF: 3.531
2023-12-07
International Journal of Remote Sensing
Abstract:Remote sensing image scene classification is a fundamental task in intelligent interpretation of remote sensing images. Although Transformers possess a powerful attention mechanism, they require lengthy training procedures to achieve good performance levels. To address this issue, this paper proposes a novel deep learning network model by combining CNN and Swin Transformer named SwinHCST. Firstly, the model uses Weighted Normalized CNN to quickly extract low-level features of the image. Secondly, the Receptive Field Block module facilitates multi-scale information fusion, Thirdly, the Information Fusion Transformer further excavates the deep-level features of the image. Furthermore, this paper has designed a plug-and-play Cross Spatial Information Fusion Block, which is used to encodes dimensional information and extracts global information to enhance information exchange. The scene classification experiments show that the proposed model outperforms other methods on the three selected datasets and can achieve excellent performance without requiring large amounts of data and training. Specifically, the classification accuracy of the proposed method on the three datasets is 93.76%, 93.60%, and 98.10%, which is 1.7% to 3.71% higher than ResNet50 and 3.7% to 5.7% higher than Swin Transformer.
imaging science & photographic technology,remote sensing
What problem does this paper attempt to address?