Abstract:Existing visual change detectors usually adopt CNNs or Transformers for feature representation learning and focus on learning effective representation for the changed regions between images. Although good performance can be obtained by enhancing the features of the change regions, however, these works are still limited mainly due to the ignorance of mining the unchanged background context information. It is known that one main challenge for change detection is how to obtain the consistent representations for two images involving different variations, such as spatial variation, sunlight intensity, etc. In this work, we demonstrate that carefully mining the common background information provides an important cue to learn the consistent representations for the two images which thus obviously facilitates the visual change detection problem. Based on this observation, we propose a novel Visual change Transformer (VcT) model for visual change detection problem. To be specific, a shared backbone network is first used to extract the feature maps for the given image pair. Then, each pixel of feature map is regarded as a graph node and the graph neural network is proposed to model the structured information for coarse change map prediction. Top-K reliable tokens can be mined from the map and refined by using the clustering algorithm. Then, these reliable tokens are enhanced by first utilizing self/cross-attention schemes and then interacting with original features via an anchor-primary attention learning module. Finally, the prediction head is proposed to get a more accurate change map. Extensive experiments on multiple benchmark datasets validated the effectiveness of our proposed VcT model.

What problem does this paper attempt to address?

This paper aims to solve two main problems in remote sensing image change detection: 1. **Differences in temporal, spatial, spectral, and radiometric resolutions of different remote sensing systems**: This makes the comparison and analysis between different images very challenging. Different sensors may capture the same area at different times and with different resolutions, resulting in significant differences between images. These differences may mask the real changes or introduce false change signals. 2. **The influence of environmental factors**: Environmental factors such as light intensity, atmosphere, and soil moisture can cause image degradation. The influence of these factors may cause the same object to show different spectral characteristics at different time points, thus increasing the difficulty of change detection. To address the above challenges, the paper proposes a new visual change detection framework - Visual change Transformer (VcT). This framework improves the accuracy of change detection by mining invariant background information. Specifically, the main contributions of the VcT framework are as follows: - **Introducing a reliable background token selection module (Reliable Token Mining, RTM)**: Using graph neural networks (Graph Neural Network, GNN) and K - means clustering algorithm to select reliable background tokens from the input images. These tokens represent the invariant regions in the images, which help to suppress irrelevant background changes and make the final results more reliable. - **Proposing self - attention and cross - attention modules**: Capturing the relationships within a single image through the self - attention mechanism and the relationships between two images through the cross - attention mechanism. These modules help to enhance the correlation between local and global features and improve the accuracy of change detection. - **Designing an Anchor - Primary Attention module**: This module further enhances the feature representation by fusing the selected tokens and the original feature maps, improving the detection performance. In conclusion, by introducing invariant background information and multi - stage attention mechanisms, this paper effectively solves the key problems in remote sensing image change detection and improves the accuracy and reliability of change detection.

VcT: Visual change Transformer for Remote Sensing Image Change Detection

VcT: Visual change Transformer for Remote Sensing Image Change Detection

TransUNetCD: A Hybrid Transformer Network for Change Detection in Optical Remote-Sensing Images

TCIANet: Transformer-Based Context Information Aggregation Network for Remote Sensing Image Change Detection

A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection

ChangeViT: Unleashing Plain Vision Transformers for Change Detection

TChange: A Hybrid Transformer-CNN Change Detection Network

DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images

A Divided Spatial and Temporal Context Network for Remote Sensing Change Detection

A Semi-Supervised Pyramid Cross-Temporal Attention Transformer for Change Detection in High-Resolution Remote Sensing Images

Relation Changes Matter: Cross-Temporal Difference Transformer for Change Detection in Remote Sensing Images

Changes-Aware Transformer: Learning Generalized Changes Representation

MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images

MVAFG: Multiview Fusion and Advanced Feature Guidance Change Detection Network for Remote Sensing Images

A CBAM Based Multiscale Transformer Fusion Approach for Remote Sensing Image Change Detection

Detail Enhanced Change Detection in VHR Images Using a Self-Supervised Multiscale Hybrid Network

Remote Sensing Image Change Detection With Transformers

MFATNet: Multi-Scale Feature Aggregation via Transformer for Remote Sensing Image Change Detection