CCTNet: CNN and Cross-Shaped Transformer Hybrid Network for Remote Sensing Image Semantic Segmentation

Honglin Wu,Zhaobin Zeng,Peng Huang,Xinyu Yu,Min Zhang
DOI: https://doi.org/10.1109/jstars.2024.3487003
IF: 4.715
2024-01-01
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Abstract:Deep learning methods have achieved great success in the field of remote sensing image segmentation in recent years, but building a lightweight segmentation model with comprehensive local and global feature extraction capabilities remains a challenging task. In this paper, we propose a CNN and cross-shaped transformer hybrid network (CCTNet) for semantic segmentation of high-resolution remote sensing images. This model follows an encoder-decoder structure. It employs ResNet18 as an encoder to extract hierarchical feature information, and constructs a transformer decoder based on efficient cross-shaped self-attention (CSSA) to fully model local and global feature information and achieve lightweighting of the network. Moreover, the transformer block introduces a mixed-scale convolutional feedforward network (MCFFN) to further enhance multi-scale information extraction. Furthermore, a simplified and efficient feature aggregation module (FAM) is leveraged to gradually aggregate local and global information at different stages. Extensive comparison experiments on the ISPRS Vaihingen and Potsdam datasets reveal that our method obtains superior performance compared with state-of-the-art lightweight methods.
What problem does this paper attempt to address?