RCCT-ASPPNet: Dual-Encoder Remote Image Segmentation Based on Transformer and ASPP

Yazhou Li,Zhiyou Cheng,Chuanjian Wang,Jinling Zhao,Linsheng Huang
DOI: https://doi.org/10.3390/rs15020379
IF: 5
2023-01-08
Remote Sensing
Abstract:Remote image semantic segmentation technology is one of the core research elements in the field of computer vision and has a wide range of applications in production life. Most remote image semantic segmentation methods are based on CNN. Recently, Transformer provided a view of long-distance dependencies in images. In this paper, we propose RCCT-ASPPNet, which includes the dual-encoder structure of Residual Multiscale Channel Cross-Fusion with Transformer (RCCT) and Atrous Spatial Pyramid Pooling (ASPP). RCCT uses Transformer to cross fuse global multiscale semantic information; the residual structure is then used to connect the inputs and outputs. ASPP based on CNN extracts contextual information of high-level semantics from different perspectives and uses Convolutional Block Attention Module (CBAM) to extract spatial and channel information, which will further improve the model segmentation ability. The experimental results show that the mIoU of our method is 94.14% and 61.30% on the datasets Farmland and AeroScapes, respectively, and that the mPA is 97.12% and 84.36%, respectively, both outperforming DeepLabV3+ and UCTransNet.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?