Abstract:Existing optical remote sensing image change detection (CD) methods aim to learn an appropriate discriminate decision by analyzing the feature information of bitemporal images obtained at the same place. However, the complex scenes in high-resolution (HR) remote images cause unsatisfied results, especially for some irregular and occluded objects. Although recent self-attention-driven change detection models with CNN achieve promising effects, the computational and consumed parameters costs emerge as an impassable gap for HR images. In this paper, we utilize a transformer structure replacing self-attention to learn stronger feature representations per image. In addition, concurrent vision transformer models only consider tokenizing single-dimensional image tokens, thus failing to build multi-scale long-range interactions among features. Here, we propose a hybrid multi-scale transformer module for HR remote images change detection, which fully models representation attentions at hybrid scales of each image via a fine-grained self-attention mechanism. The key idea of the hybrid transformer structure is to establish heterogeneous semantic tokens containing multiple receptive fields, thus simultaneously preserving large object and fine-grained features. For building relationships between features without embedding with token sequences from the Siamese tokenizer, we also introduced a hybrid difference transformer decoder (HDTD) layer to further strengthen multi-scale global dependencies of high-level features. Compared to capturing single-stream tokens, our HDTD layer directly focuses representing differential features without increasing exponential computational cost. Finally, we propose a cascade feature decoder (CFD) for aggregating different-dimensional upsampling features by establishing difference skip-connections. To evaluate the effectiveness of the proposed method, experiments on two HR remote sensing CD datasets are conducted. Compared to state-of-the-art methods, our Hybrid-TransCD achieved superior performance on both datasets (i.e., LEVIR-CD, SYSU-CD) with improvements of 0.75% and 1.98%, respectively.

Lightweight Structure-Aware Transformer Network for Remote Sensing Image Change Detection

Lightweight Structure-aware Transformer Network for VHR Remote Sensing Image Change Detection

TransUNetCD: A Hybrid Transformer Network for Change Detection in Optical Remote-Sensing Images

Relating CNN-Transformer Fusion Network for Remote Sensing Change Detection

CSTSUNet: A Cross Swin Transformer-Based Siamese U-Shape Network for Change Detection in Remote Sensing Images

A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images

Bidirectional-enhanced transformer network with channel weighting feature fusion for remote sensing image change detection

An attention-based multiscale transformer network for remote sensing image change detection

Relating CNN-Transformer Fusion Network for Change Detection

EATDer: Edge-Assisted Adaptive Transformer Detector for Remote Sensing Change Detection

Remote Sensing Image Change Detection Transformer Network Based on Dual-Feature Mixed Attention

A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images

Remote Sensing Image Change Detection Based on Deep Multi-Scale Multi-Attention Siamese Transformer Network

A Semi-Supervised Pyramid Cross-Temporal Attention Transformer for Change Detection in High-Resolution Remote Sensing Images

TransY-Net: Learning Fully Transformer Networks for Change Detection of Remote Sensing Images

TransY-Net:Learning Fully Transformer Networks for Change Detection of Remote Sensing Images

Semantic-aware transformer with feature integration for remote sensing change detection

Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation

MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images

A Transformer-Based Network for Change Detection in Remote Sensing Using Multiscale Difference-Enhancement

VisionTwinNet: Gated Clarity Enhancement Paired With Light-Robust CD Transformers