SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection

Shuai Yuan,Hanlin Qin,Xiang Yan,Naveed AKhtar,Ajmal Mian
DOI: https://doi.org/10.1109/TGRS.2024.3383649
2024-04-30
Abstract:Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models. However, largely overlooking effective global information modeling, existing techniques struggle when the target has high similarities with the background. We present a Spatial-channel Cross Transformer Network (SCTransNet) that leverages spatial-channel cross transformer blocks (SCTBs) on top of long-range skip connections to address the aforementioned challenge. In the proposed SCTBs, the outputs of all encoders are interacted with cross transformer to generate mixed features, which are redistributed to all decoders to effectively reinforce semantic differences between the target and clutter at full scales. Specifically, SCTB contains the following two key elements: (a) spatial-embedded single-head channel-cross attention (SSCA) for exchanging local spatial features and full-level global channel information to eliminate ambiguity among the encoders and facilitate high-level semantic associations of the images, and (b) a complementary feed-forward network (CFN) for enhancing the feature discriminability via a multi-scale strategy and cross-spatial-channel information interaction to promote beneficial information transfer. Our SCTransNet effectively encodes the semantic differences between targets and backgrounds to boost its internal representation for detecting small infrared targets accurately. Extensive experiments on three public datasets, NUDT-SIRST, NUAA-SIRST, and IRSTD-1k, demonstrate that the proposed SCTransNet outperforms existing IRSTD methods. Our code will be made public at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper proposes a new solution to the problem of Infrared Small Target Detection (IRSTD). Specifically, the paper addresses the following key issues: 1. **Limitations of Existing Technologies**: Although methods based on U-shaped neural networks have made significant progress in infrared small target detection, they often neglect effective modeling of global information, leading to performance degradation when the background and target are highly similar. 2. **Proposed New Method**: The authors propose a new architecture called "Spatial-channel Cross Transformer Network" (SCTransNet), which enhances the interaction between different levels of features by utilizing Spatial-channel Cross Transformer Blocks (SCTBs) on long-distance skip connections, thereby improving detection performance. 3. **Core Components**: - **SCTBs**: Comprising two key parts, namely Spatial-embedded Single-head Channel-cross Attention (SSCA) and Complementary Feedforward Network (CFN). SSCA is used to exchange local spatial features and global channel information, while CFN enhances feature discrimination through a multi-scale strategy and promotes cross-spatial-channel information interaction. 4. **Experimental Validation**: Extensive experiments on three public datasets (NUDT-SIRST, NUAA-SIRST, and IRSTD-1K) demonstrate that the proposed SCTransNet outperforms existing infrared small target detection methods. In summary, this paper aims to address the current challenge of distinguishing between background and target in infrared small target detection by introducing a novel spatial-channel cross Transformer structure to achieve this goal.