Hybrid transformer-CNN networks using superpixel segmentation for remote sensing building change detection

Shike Liang,Zhen Hua,Jinjiang Li
DOI: https://doi.org/10.1080/01431161.2023.2208711
IF: 3.531
2023-05-10
International Journal of Remote Sensing
Abstract:Convolution in convolutional neural network(CNN) essentially uses a filter (kernel) with shared parameters to achieve feature extraction by computing the weighted sum of the centre pixel and adjacent pixels. The transformer divides the input image into patches and adds position encodings, then learns global semantic information and performs remote modelling through a self-attentive mechanism. However, CNNs are good at extracting local features but have difficulty in capturing global cues; the Transformer uses the self-attention mechanism for remote modelling. However, relative to CNN, local feature details are ignored to a certain extent. We believe that CNN and Transformer are complementary and will show better results if they are fused. Therefore, in this work, we propose a Hybrid Transformer-CNN Networks based on the fusion of CNN and Transformer branches for remote sensing change detection. In the CNN branch, we use the classical U-Net architecture to learn local semantic features. In the Transformer branch, we use Transformer-based progressive sampling to focus the model's attention on objects of interest and prevent corrupting object structure. Subsequently, we propose an adaptive feature merging module to fully fuse the features of CNN and Transformer to enhance feature representation. At the same time, we introduce a differentiable superpixel branch to take advantage of the superpixel segmentation algorithm to accurately identify object boundaries, preserve boundary information and reduce noise in pixel-level features. We supplement the fused enhanced features into the superpixel branch features using a feature refinement module. After our experiments, we demonstrate the superiority of our model over other State of the art methods.
imaging science & photographic technology,remote sensing
What problem does this paper attempt to address?