EATDer: Edge-Assisted Adaptive Transformer Detector for Remote Sensing Change Detection

Jingjing Ma,Junyi Duan,Xu Tang,Xiangrong Zhang,Licheng Jiao
DOI: https://doi.org/10.1109/tgrs.2023.3344083
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Change detection (CD) is one of the important research topics in remote sensing (RS) image processing. Recently, convolutional neural networks (CNNs) have dominated the RSCD community. Many successful CNN-based models have been proposed, and they achieved cracking performance. Nevertheless, influenced by the limited receptive field, the CNN-based models are not good at capturing long-distance context dependencies within RS images, negatively impacting their performance. With the appearance of the visual transformer, the above problems have been mitigated. However, the high time costs of the transformer-based models limit their applicability. In addition, previous CD networks (whether CNN-based or transform-based) do not pay attention to the edges of changed areas, reducing the quality of change maps. To overcome the shortcomings discussed above, we propose a new CD method named edge-assisted adaptive transformer detector (EATDer). EATDer consists of a Siamese encoder and an edge-aware decoder. Each branch in the Siamese encoder encloses three self-adaption vision transformer (SAVT) blocks, which aim to capture the local and global information within RS images. Also, two branches are connected by full-range fusion modules (FRFMs), which focus on mining the temporal clues among bi-temporal RS images and pointing out the changed/unchanged messages. The edge-aware decoder first integrates the multiscale features obtained by the encoder using a restoring block. Then, it enhances the combined features by a refining block. Finally, based on the refined features, both the change and edge detection results can be produced. Along with a joint loss function, we can get high-quality change maps in which the changed areas are correct and have clear and smooth edges. The usefulness of our EATDer is validated by extensive experiments conducted on three popular RSCD datasets. Our source codes are available at https://github.com/TangXu-Group/Remote-Sensing-Image-Change-Detection/tree/main/EATDer
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?
This paper attempts to solve several key problems in remote sensing image change detection (RSCD). Specifically: 1. **Capturing long - distance context - dependent relationships**: Convolutional neural networks (CNNs), due to their limited receptive fields, have difficulty in capturing long - distance context - dependent relationships in remote sensing images, which affects their performance. To solve this problem, the author introduced the Visual Transformer, which can better capture global information. 2. **High time cost**: Although the Transformer - based model alleviates the above problems, its high time cost limits its practicality and generalization ability. For this reason, the author designed an Adaptive Multi - Head Self - Attention mechanism (SAMSA) to reduce the computational complexity. 3. **Ignoring edge information**: Previous CD networks (whether based on CNN or Transformer) did not pay sufficient attention to the edges of the change areas, thus reducing the quality of the change maps. To solve this problem, the author proposed an Edge - aware Decoder, which can identify the edges of the change areas while generating the change maps, ensuring that the change maps have clear and smooth edges. In summary, this paper aims to overcome the shortcomings of existing methods in remote sensing image change detection by proposing a new method - Edge - Assisted Adaptive Transformer Detector (EATDer). EATDer combines the Siamese encoder and the Edge - aware Decoder, and is able to highlight the changed and unchanged areas while capturing local and global information, and improve the quality of the change maps. ### Formula summary - **Output formula of SA VT block**: \[ P_{\text{new}}^1=\text{MLP}(\text{LN}(P_{\text{SAMSA}}^1)) + P_{\text{SAMSA}}^1 \] where, \[ P_{\text{SAMSA}}^1=\text{SAMSA}(\text{LN}(P_1')) + P_1' \] - **Calculation formula of SAMSA part**: \[ \text{MSA}(Q_1, K_a^1, V_a^1)=\text{concat}(\text{head}_1,\ldots,\text{head}_n)W_O \] where, \[ \text{head}_j = \text{Attention}(Q_1W_Q^j, K_a^1W_K^j, V_a^1W_V^j) \] \[ \text{Attention}(Q, K, V)=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V \] - **Time complexity comparison**: \[ T(\text{MSA}) = 4\frac{hw}{s^2}c'^2+ 2\left(\frac{hw}{s^2}\right)^2 c' \] \[ T(\text{SAMSA}) = 2\left(\frac{hw}{s^2}+\sqrt{\frac{hw}{s^2}}\right)c'^2+ 2\left(\frac{hw}{s^2}\right)^{3/2} c' \] These formulas show the improvement in computational efficiency of EATDer and how to reduce the amount of calculation through the adaptive mechanism.