Abstract:Existing optical remote sensing image change detection (CD) methods aim to learn an appropriate discriminate decision by analyzing the feature information of bitemporal images obtained at the same place. However, the complex scenes in high-resolution (HR) remote images cause unsatisfied results, especially for some irregular and occluded objects. Although recent self-attention-driven change detection models with CNN achieve promising effects, the computational and consumed parameters costs emerge as an impassable gap for HR images. In this paper, we utilize a transformer structure replacing self-attention to learn stronger feature representations per image. In addition, concurrent vision transformer models only consider tokenizing single-dimensional image tokens, thus failing to build multi-scale long-range interactions among features. Here, we propose a hybrid multi-scale transformer module for HR remote images change detection, which fully models representation attentions at hybrid scales of each image via a fine-grained self-attention mechanism. The key idea of the hybrid transformer structure is to establish heterogeneous semantic tokens containing multiple receptive fields, thus simultaneously preserving large object and fine-grained features. For building relationships between features without embedding with token sequences from the Siamese tokenizer, we also introduced a hybrid difference transformer decoder (HDTD) layer to further strengthen multi-scale global dependencies of high-level features. Compared to capturing single-stream tokens, our HDTD layer directly focuses representing differential features without increasing exponential computational cost. Finally, we propose a cascade feature decoder (CFD) for aggregating different-dimensional upsampling features by establishing difference skip-connections. To evaluate the effectiveness of the proposed method, experiments on two HR remote sensing CD datasets are conducted. Compared to state-of-the-art methods, our Hybrid-TransCD achieved superior performance on both datasets (i.e., LEVIR-CD, SYSU-CD) with improvements of 0.75% and 1.98%, respectively.

Lighter and Robust: A Rotation-Invariant Transformer for VHR Image Change Detection

A VHR Bi-Temporal Remote-Sensing Image Change Detection Network Based on Swin Transformer

CDFormer: A Hyperspectral Image Change Detection Method Based on Transformer Encoders

Bidirectional-enhanced transformer network with channel weighting feature fusion for remote sensing image change detection

Lightweight Structure-aware Transformer Network for VHR Remote Sensing Image Change Detection

Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation

Interaction in Transformer for Change Detection in VHR Remote Sensing Images.

A Position-Temporal Awareness Transformer for Remote Sensing Change Detection

Remote Sensing Image-Change Detection with Pre-Generation of Depthwise-Separable Change-Salient Maps

Cross attention is all you need: relational remote sensing change detection with transformer

Remote Sensing Image Change Detection With Transformers

A Temporal-Reliable Method for Change Detection in High-Resolution Bi-Temporal Remote Sensing Images.

Fusion-Former: Fusion Features Across Transformer and Convolution for Building Change Detection

MFI-CD: a lightweight siamese network with multidimensional feature interaction for change detection

Remote Sensing Image Change Detection Transformer Network Based on Dual-Feature Mixed Attention

DAHT-Net: Deformable Attention-Guided Hierarchical Transformer Network Based on Remote Sensing Image Change Detection

DiFormer: A Difference Transformer Network for Remote Sensing Change Detection

EfficientCD: A New Strategy For Change Detection Based With Bi-temporal Layers Exchanged

UCDFormer: Unsupervised Change Detection Using a Transformer-Driven Image Translation

D2Former: Dual-Domain Transformer for Change Detection in VHR Remote Sensing Images