Abstract:In the change detection (CD) task, the UNet architecture has achieved superior results. However, due to the inherent limitation of convolution operations, UNet is inadequate in learning global context and long-range spatial relations. Transformers can capture long-range feature dependencies, but the lack of low-level details may result in limited localization capabilities. Therefore, this article proposes an end-to-end encoding-decoding hybrid transformer model for CD, TransUNetCD, which has the advantages of both transformers and UNet. The model encodes the tokenized image patches from the convolutional neural network (CNN) feature map to extract rich global context information. The decoder upsamples the encoded features, connects them with higher-resolution multiscale features through skip connections to learn local-global semantic features, and restores the full spatial resolution of the feature map to achieve precise localization. The model proposed in this article not only solves the problem that redundant information is generated when extracting low-level features under the UNet framework, but also solves the problem that the relationship between each feature layer cannot be fully modeled and the optimal feature difference representation cannot be obtained. On this basis, we introduce a difference enhancement module to generate a difference feature map containing rich change information. By weighting each pixel and selectively aggregating features, the effectiveness of the network and the accuracy of extracting changing features are improved. The results on multiple datasets demonstrate that, compared to state-of-the-art methods, the TransUNetCD can further reduce false alarms and missed alarms, and the edge of the changing area is more accurate. The model has the highest score in each metric than other baseline models and has a robust generalization ability.

MixCDNet: A Lightweight Change Detection Network Mixing Features Across CNN and Transformer

TransUNetCD: A Hybrid Transformer Network for Change Detection in Optical Remote-Sensing Images

ConvTransNet: A CNN–Transformer Network for Change Detection With Multiscale Global–Local Representations

IFTSDNet: An Interact-Feature Transformer Network With Spatial Detail Enhancement Module for Change Detection

MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images

Relating CNN-Transformer Fusion Network for Change Detection

Remote-Sensing Image Change Detection Based on Adjacent-Level Feature Fusion and Dense Skip Connections

MSFF-CDNet: A Multiscale Feature Fusion Change Detection Network for Bi-Temporal High-Resolution Remote Sensing Image

Mixed-former: multi-fusion remote sensing change detection

Remote Sensing Image Change Detection Transformer Network Based on Dual-Feature Mixed Attention

ICIF-Net: Intra-scale Cross-interaction and Inter-scale Feature Fusion Network For Bi-temporal Remote Sensing Images Change Detection

CD-CTFM: A Lightweight CNN-Transformer Network for Remote Sensing Cloud Detection Fusing Multiscale Features

VisionTwinNet: Gated Clarity Enhancement Paired With Light-Robust CD Transformers

Multiscale Fusion CNN-Transformer Network for High-Resolution Remote Sensing Image Change Detection

MVAFG: Multiview Fusion and Advanced Feature Guidance Change Detection Network for Remote Sensing Images

Multiscale Alignment and Progressive Feature Fusion Network for High Resolution Remote Sensing Images Change Detection

CLHF-Net: A Channel-Level Hierarchical Feature Fusion Network for Remote Sensing Image Change Detection

Asymmetric Cross-Attention Hierarchical Network Based on CNN and Transformer for Bitemporal Remote Sensing Images Change Detection

Ultralightweight Spatial–Spectral Feature Cooperation Network for Change Detection in Remote Sensing Images

3M-CDNet-V2: An Efficient Medium-Weight Neural Network for Remote Sensing Image Change Detection