Abstract:Current deep learning-based change detection approaches mostly produce convincing results by introducing attention mechanisms to traditional convolutional networks. However, given the limitation of the receptive field, convolution-based methods fall short of fully modelling global context and capturing long-range dependencies, thus insufficient in discriminating pseudo changes. Transformers have an efficient global spatio-temporal modelling capability, which is beneficial for the feature representation of changes of interest. However, the lack of detailed information may cause the transformer to locate the boundaries of changed regions inaccurately. Therefore, in this article, a hybrid CNN-transformer architecture named CTCANet, combining the strengths of convolutional networks, transformer, and attention mechanisms, is proposed for high-resolution bi-temporal remote sensing image change detection. To obtain high-level feature representations that reveal changes of interest, CTCANet utilizes tokenizer to embed the features of each image extracted by convolutional network into a sequence of tokens, and the transformer module to model global spatio-temporal context in token space. The optimal bi-temporal information fusion approach is explored here. Subsequently, the reconstructed features carrying deep abstract information are fed to the cascaded decoder to aggregate with features containing shallow fine-grained information, through skip connections. Such an aggregation empowers our model to maintain the completeness of changes and accurately locate small targets. Moreover, the integration of the convolutional block attention module enables the smoothing of semantic gaps between heterogeneous features and the accentuation of relevant changes in both the channel and spatial domains, resulting in more impressive outcomes. The performance of the proposed CTCANet surpasses that of recent certain state-of-the-art methods, as evidenced by experimental results on two publicly accessible datasets, LEVIR-CD and SYSU-CD.

MCANet: Hierarchical cross-fusion lightweight transformer based on multi-ConvHead attention for object detection

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

Multi-Dimension Compression of Feed-Forward Network in Vision Transformers

CNN-transformer mixed model for object detection

GhostFormer: Efficiently amalgamated CNN-transformer architecture for object detection

MFCANet: Multiscale Feature Context Aggregation Network for Oriented Object Detection in Remote-Sensing Images

Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

HA-Transformer: Harmonious aggregation from local to global for object detection

Combining transformer global and local feature extraction for object detection

MAFormer: A transformer network with multi-scale attention fusion for visual recognition

CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction

Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection

Cross-Modality Fusion Transformer for Multispectral Object Detection

A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images

TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images

Lightweight Vision Transformer with Cross Feature Attention

Two Cases of Sinusitis Induced by Immune Checkpoint Inhibition.

Unifying Global-Local Representations in Salient Object Detection with Transformer

MuTrans: Multiple Transformers for Fusing Feature Pyramid on 2D and 3D Object Detection