MixCDNet: A Lightweight Change Detection Network Mixing Features Across CNN and Transformer

Linlin Wang,Junping Zhang,Lorenzo Bruzzone
DOI: https://doi.org/10.1109/tgrs.2024.3438228
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Convolutional neural networks (CNNs) have performed notably in change detection (CD) tasks due to their superior learning and automatic feature extraction capabilities. However, they suffer from the limited receptive field and the weak modeling of long-range dependencies. Vision transformers (ViTs) excel in modeling long-range contexts and have been recently introduced in CD. Some works have combined CNN and transformers to obtain local-global information. However, these works do not fully consider the guidance and interactions from both local features (LFs) and global features (GFs). Most importantly, most of them involve a very large number of parameters and computational costs. To address these issues, in this article, we propose a lightweight CD network that mixes features across CNN and transformer (MixCDNet). We use EfficientNet as the backbone and design a novel mixing features block (MFB). First, we employ hierarchical feature extraction blocks, where local feature blocks (LFBs) and global feature blocks (GFBs) are utilized for extracting information at different spatial resolutions. Second, we propose to exploit bidirectional interactions across LFBs and GFBs branches to provide complementary clues while capturing LFs and GFs. Moreover, a skip-connection and fusion separable self-attention layer (SFSSL) is designed to obtain GFs with low complexity. Comprehensive experiments are conducted on three high-resolution remote sensing (HRRS) images CD datasets: LEVIR-CD, WHU-CD, and CDD. The results show the effectiveness of the proposed MixCDNet in improving the performance of existing CD methods with fewer parameters (0.32 M) and lower computation costs (1.59G FLOPs).
What problem does this paper attempt to address?